Towards Domain-Aware Language Models for Low-Resource Healthcare: Fine-Tuning LLMs and SLMs with Parallel English–Yoruba Maternal Health Data

Elizabeth Ogunseye; Ezekiel Oladejo; Isaac Olaleye; Adesesan B. Adeyemo

Journal of Computer Sciences and Applications. 2025, 13(2), 54-58
DOI: 10.12691/jcsa-13-2-3

Open AccessArticle

Towards Domain-Aware Language Models for Low-Resource Healthcare: Fine-Tuning LLMs and SLMs with Parallel English–Yoruba Maternal Health Data

Elizabeth Ogunseye¹, Ezekiel Oladejo^1,, Isaac Olaleye¹ and Adesesan B. Adeyemo²

¹Department of Computer Science, University of Ibadan, Ibadan, Nigeria

²Faculty of Computing, University of Ibadan, Ibadan, Nigeria

Pub. Date: December 04, 2025

View Full Text Full Text PDF (129 KB) Full Text ePUB(56 KB)

Cite this paper:
Elizabeth Ogunseye, Ezekiel Oladejo, Isaac Olaleye and Adesesan B. Adeyemo. Towards Domain-Aware Language Models for Low-Resource Healthcare: Fine-Tuning LLMs and SLMs with Parallel English–Yoruba Maternal Health Data. Journal of Computer Sciences and Applications. 2025; 13(2):54-58. doi: 10.12691/jcsa-13-2-3

Abstract

Healthcare communication barriers significantly impact maternal health outcomes in multilingual communities, particularly in Sub-Saharan Africa where indigenous languages dominate daily communication while medical resources remain primarily in colonial languages. This study presents a systematic approach to developing domain-aware language models specifically tailored for maternal health communication in low-resource settings. We introduce a comprehensive parallel English–Yoruba maternal health dataset comprising 7,000 translated and verified sentence pairs covering prenatal care, childbirth and postnatal support. Our methodology involves fine-tuning both Large Language Models (LLMs) including GPT-3.5-turbo and LLaMA-2-7B, and Small Language Models (SLMs) such as DistilBERT, mBERT, and XLM-R across multiple evaluation dimensions including translation quality, domain-specific terminology accuracy, and clinical relevance metrics. Results demonstrate that domain-specific fine-tuning significantly improves performance over general-purpose models, with the GPT-3.5-turbo variant achieving a BLEU score of 0.78 on the held-out test set and medical terminology accuracy of 89.3%. Fine-tuned models demonstrate substantial improvements in handling culture-specific maternal health concepts and traditional medicine terminology. This work contributes to bridging the digital health divide in low-resource settings and provides a replicable framework for developing multilingual healthcare AI systems that can effectively serve diverse linguistic communities while maintaining cultural and clinical sensitivity.

Keywords:
Low-resource NLP Healthcare AI Maternal Health Yoruba Language Domain Adaptation Multilin- gual Models

This work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

References:

[1]	World Health Organization. Trends in Maternal Mor- tality: 2000 to 2023. WHO Report, 2023.

[2]	Adebayo, T., et al. “Language barriers in maternal healthcare delivery in Nigeria: A qualitative study.” BMC Health Services Research, 22(1): 112–126, 2022.

[3]	Kocmi, T., and Federmann, C. “Large language models in low-resource translation.” arXiv preprint arXiv: 2305. 12345, 2023.

[4]	Conneau, A., et al. “Unsupervised cross-lingual repre- sentation learning at scale.” ACL, 2020.

[5]	Bamgbose, A. Language and the Nation: The Lan- guage Question in Sub-Saharan Africa. Cambridge University Press, 2021.

[6]	Hedderich, M. A., et al. “A survey on recent ap- proaches for natural language processing in low- resource scenarios.” Proceedings of the ACL, 2021.

[7]	Ruder, S., et al. “Transfer learning in natural language processing.” Proceedings of NAACL, 2019.

[8]	Pfeiffer, J., et al. “AdapterFusion: Non-destructive task composition for transfer learning.” arXiv preprint arXiv: 2005. 00247, 2020.

[9]	Dossou, B., and Emezue, C. “AfroLM: A multilin- gual language model for African languages.” arXiv preprint arXiv: 2108. 00054, 2021.

[10]	Nekoto, W., et al. “Participatory research for low- resourced machine translation: The Masakhane project.” Findings of EMNLP, 2020.

[11]	Lee, J., et al. “BioBERT: a pre-trained biomedical lan- guage representation model for biomedical text min- ing.” Bioinformatics, 36(4): 1234–1240, 2020.

[12]	Alsentzer, E., et al. “Publicly available clinical BERT embeddings for clinical NLP.” Proceedings of the Clinical NLP Workshop, 2019.

[13]	Gu, Y., et al. “PubMedBERT: Towards biomedi- cal domain-specific BERT models.” Proceedings of NAACL, 2021.

[14]	Kanoulas, E., et al. “Overview of the CLEF eHealth Evaluation Lab 2019.” CLEF 2019, 2019.

[15]	Neveol, A., et al. “Multilingual resources for biomed- ical text processing.” Language Resources and Evalu- ation, 52(3): 895–920, 2018.

[16]	Lund, S., et al. “Mobile health for maternal health: A review of the evidence.” International Journal of Women’s Health, 6: 451–460, 2014.

[17]	Watterson, J. L., et al. “mHealth for maternal health: A systematic review of the literature.” BMC Preg- nancy and Childbirth, 15(1): 109, 2015.

[18]	Jafari, F., et al. “Cultural competence in maternal healthcare: A systematic review.” Midwifery, 97: 102939, 2021.

[19]	Oyebode, O., et al. “Traditional medicine practices in maternal health care among Yoruba women.” African Health Sciences, 19(4): 2952–2962, 2019.