Towards Domain-Aware Language Models for Low-Resource Healthcare: Fine-Tuning LLMs and SLMs with Parallel English�CYoruba Maternal Health Data

eng Science and Education Publishing Journal of Computer Sciences and Applications 2328-725X 2025-12-04 13 2 54 58 10.12691/jcsa-13-2-3 JCSA20251323 article Towards Domain-Aware Language Models for Low-Resource Healthcare: Fine-Tuning LLMs and SLMs with Parallel English�CYoruba Maternal Health Data Elizabeth O. Ogunseye 1 Ezekiel A. Oladejo eoladejo184@stu.ui.edu.ng 1 Isaac Olaleye 1 Adesesan B. Adeyemo 2 Department of Computer Science, University of Ibadan, Ibadan, Nigeria Faculty of Computing, University of Ibadan, Ibadan, Nigeria Healthcare communication barriers significantly impact maternal health outcomes in multilingual communities, particularly in Sub-Saharan Africa where indigenous languages dominate daily communication while medical resources remain primarily in colonial languages. This study presents a systematic approach to developing domain-aware language models specifically tailored for maternal health communication in low-resource settings. We introduce a comprehensive parallel English�CYoruba maternal health dataset comprising 7,000 translated and verified sentence pairs covering prenatal care, childbirth and postnatal support. Our methodology involves fine-tuning both Large Language Models (LLMs) including GPT-3.5-turbo and LLaMA-2-7B, and Small Language Models (SLMs) such as DistilBERT, mBERT, and XLM-R across multiple evaluation dimensions including translation quality, domain-specific terminology accuracy, and clinical relevance metrics. Results demonstrate that domain-specific fine-tuning significantly improves performance over general-purpose models, with the GPT-3.5-turbo variant achieving a BLEU score of 0.78 on the held-out test set and medical terminology accuracy of 89.3%. Fine-tuned models demonstrate substantial improvements in handling culture-specific maternal health concepts and traditional medicine terminology. This work contributes to bridging the digital health divide in low-resource settings and provides a replicable framework for developing multilingual healthcare AI systems that can effectively serve diverse linguistic communities while maintaining cultural and clinical sensitivity. https://pubs.sciepub.com/jcsa/13/2/3/jcsa-13-2-3.pdf Low-resource NLP Healthcare AI Maternal Health Yoruba Language Domain Adaptation Multilin- gual Models