Journal of Computer Sciences and Applications
ISSN (Print): 2328-7268 ISSN (Online): 2328-725X Website: https://www.sciepub.com/journal/jcsa Editor-in-chief: Minhua Ma, Patricia Goncalves
Open Access
Journal Browser
Go
Journal of Computer Sciences and Applications. 2025, 13(2), 54-58
DOI: 10.12691/jcsa-13-2-3
Open AccessArticle

Towards Domain-Aware Language Models for Low-Resource Healthcare: Fine-Tuning LLMs and SLMs with Parallel English–Yoruba Maternal Health Data

Elizabeth Ogunseye1, Ezekiel Oladejo1, , Isaac Olaleye1 and Adesesan B. Adeyemo2

1Department of Computer Science, University of Ibadan, Ibadan, Nigeria

2Faculty of Computing, University of Ibadan, Ibadan, Nigeria

Pub. Date: December 04, 2025

Cite this paper:
Elizabeth Ogunseye, Ezekiel Oladejo, Isaac Olaleye and Adesesan B. Adeyemo. Towards Domain-Aware Language Models for Low-Resource Healthcare: Fine-Tuning LLMs and SLMs with Parallel English–Yoruba Maternal Health Data. Journal of Computer Sciences and Applications. 2025; 13(2):54-58. doi: 10.12691/jcsa-13-2-3

Abstract

Healthcare communication barriers significantly impact maternal health outcomes in multilingual communities, particularly in Sub-Saharan Africa where indigenous languages dominate daily communication while medical resources remain primarily in colonial languages. This study presents a systematic approach to developing domain-aware language models specifically tailored for maternal health communication in low-resource settings. We introduce a comprehensive parallel English–Yoruba maternal health dataset comprising 7,000 translated and verified sentence pairs covering prenatal care, childbirth and postnatal support. Our methodology involves fine-tuning both Large Language Models (LLMs) including GPT-3.5-turbo and LLaMA-2-7B, and Small Language Models (SLMs) such as DistilBERT, mBERT, and XLM-R across multiple evaluation dimensions including translation quality, domain-specific terminology accuracy, and clinical relevance metrics. Results demonstrate that domain-specific fine-tuning significantly improves performance over general-purpose models, with the GPT-3.5-turbo variant achieving a BLEU score of 0.78 on the held-out test set and medical terminology accuracy of 89.3%. Fine-tuned models demonstrate substantial improvements in handling culture-specific maternal health concepts and traditional medicine terminology. This work contributes to bridging the digital health divide in low-resource settings and provides a replicable framework for developing multilingual healthcare AI systems that can effectively serve diverse linguistic communities while maintaining cultural and clinical sensitivity.

Keywords:
Low-resource NLP Healthcare AI Maternal Health Yoruba Language Domain Adaptation Multilin- gual Models

Creative CommonsThis work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

References:

[1]  World Health Organization. Trends in Maternal Mor- tality: 2000 to 2023. WHO Report, 2023.
 
[2]  Adebayo, T., et al. “Language barriers in maternal healthcare delivery in Nigeria: A qualitative study.” BMC Health Services Research, 22(1): 112–126, 2022.
 
[3]  Kocmi, T., and Federmann, C. “Large language models in low-resource translation.” arXiv preprint arXiv: 2305. 12345, 2023.
 
[4]  Conneau, A., et al. “Unsupervised cross-lingual repre- sentation learning at scale.” ACL, 2020.
 
[5]  Bamgbose, A. Language and the Nation: The Lan- guage Question in Sub-Saharan Africa. Cambridge University Press, 2021.
 
[6]  Hedderich, M. A., et al. “A survey on recent ap- proaches for natural language processing in low- resource scenarios.” Proceedings of the ACL, 2021.
 
[7]  Ruder, S., et al. “Transfer learning in natural language processing.” Proceedings of NAACL, 2019.
 
[8]  Pfeiffer, J., et al. “AdapterFusion: Non-destructive task composition for transfer learning.” arXiv preprint arXiv: 2005. 00247, 2020.
 
[9]  Dossou, B., and Emezue, C. “AfroLM: A multilin- gual language model for African languages.” arXiv preprint arXiv: 2108. 00054, 2021.
 
[10]  Nekoto, W., et al. “Participatory research for low- resourced machine translation: The Masakhane project.” Findings of EMNLP, 2020.
 
[11]  Lee, J., et al. “BioBERT: a pre-trained biomedical lan- guage representation model for biomedical text min- ing.” Bioinformatics, 36(4): 1234–1240, 2020.
 
[12]  Alsentzer, E., et al. “Publicly available clinical BERT embeddings for clinical NLP.” Proceedings of the Clinical NLP Workshop, 2019.
 
[13]  Gu, Y., et al. “PubMedBERT: Towards biomedi- cal domain-specific BERT models.” Proceedings of NAACL, 2021.
 
[14]  Kanoulas, E., et al. “Overview of the CLEF eHealth Evaluation Lab 2019.” CLEF 2019, 2019.
 
[15]  Neveol, A., et al. “Multilingual resources for biomed- ical text processing.” Language Resources and Evalu- ation, 52(3): 895–920, 2018.
 
[16]  Lund, S., et al. “Mobile health for maternal health: A review of the evidence.” International Journal of Women’s Health, 6: 451–460, 2014.
 
[17]  Watterson, J. L., et al. “mHealth for maternal health: A systematic review of the literature.” BMC Preg- nancy and Childbirth, 15(1): 109, 2015.
 
[18]  Jafari, F., et al. “Cultural competence in maternal healthcare: A systematic review.” Midwifery, 97: 102939, 2021.
 
[19]  Oyebode, O., et al. “Traditional medicine practices in maternal health care among Yoruba women.” African Health Sciences, 19(4): 2952–2962, 2019.