Article citationsMore >>

Alsentzer, E., et al. “Publicly available clinical BERT embeddings for clinical NLP.” Proceedings of the Clinical NLP Workshop, 2019.

has been cited by the following article:

Article

Towards Domain-Aware Language Models for Low-Resource Healthcare: Fine-Tuning LLMs and SLMs with Parallel English–Yoruba Maternal Health Data

Elizabeth Ogunseye¹, Ezekiel Oladejo^1,, Isaac Olaleye¹, Adesesan B. Adeyemo²

¹Department of Computer Science, University of Ibadan, Ibadan, Nigeria

²Faculty of Computing, University of Ibadan, Ibadan, Nigeria

Journal of Computer Sciences and Applications. 2025, Vol. 13 No. 2, 54-58
DOI: 10.12691/jcsa-13-2-3
Copyright © 2025 Science and Education Publishing

Cite this paper:
Elizabeth Ogunseye, Ezekiel Oladejo, Isaac Olaleye, Adesesan B. Adeyemo. Towards Domain-Aware Language Models for Low-Resource Healthcare: Fine-Tuning LLMs and SLMs with Parallel English–Yoruba Maternal Health Data. Journal of Computer Sciences and Applications. 2025; 13(2):54-58. doi: 10.12691/jcsa-13-2-3.

Correspondence to: Ezekiel Oladejo, Department of Computer Science, University of Ibadan, Ibadan, Nigeria. Email: eoladejo184@stu.ui.edu.ng

Abstract

Healthcare communication barriers significantly impact maternal health outcomes in multilingual communities, particularly in Sub-Saharan Africa where indigenous languages dominate daily communication while medical resources remain primarily in colonial languages. This study presents a systematic approach to developing domain-aware language models specifically tailored for maternal health communication in low-resource settings. We introduce a comprehensive parallel English–Yoruba maternal health dataset comprising 7,000 translated and verified sentence pairs covering prenatal care, childbirth and postnatal support. Our methodology involves fine-tuning both Large Language Models (LLMs) including GPT-3.5-turbo and LLaMA-2-7B, and Small Language Models (SLMs) such as DistilBERT, mBERT, and XLM-R across multiple evaluation dimensions including translation quality, domain-specific terminology accuracy, and clinical relevance metrics. Results demonstrate that domain-specific fine-tuning significantly improves performance over general-purpose models, with the GPT-3.5-turbo variant achieving a BLEU score of 0.78 on the held-out test set and medical terminology accuracy of 89.3%. Fine-tuned models demonstrate substantial improvements in handling culture-specific maternal health concepts and traditional medicine terminology. This work contributes to bridging the digital health divide in low-resource settings and provides a replicable framework for developing multilingual healthcare AI systems that can effectively serve diverse linguistic communities while maintaining cultural and clinical sensitivity.

Keywords

Low-resource NLP, Healthcare AI, Maternal Health, Yoruba Language, Domain Adaptation, Multilin- gual Models

Article citationsMore >>

Article

Towards Domain-Aware Language Models for Low-Resource Healthcare: Fine-Tuning LLMs and SLMs with Parallel English–Yoruba Maternal Health Data

Abstract

Keywords

Pages

Partners

Help & Contacts

Follow Us