<?xml version="1.0" encoding="UTF-8"?>
<records>
<record>
<language>eng</language>
<publisher>Science and Education Publishing</publisher>
<journalTitle>Journal of Computer Sciences and Applications</journalTitle>
<eissn>2328-725X</eissn>
<publicationDate>2025-12-04</publicationDate>
<volume>13</volume>
<issue>2</issue>
<startPage>54</startPage>
<endPage>58</endPage>
<doi>10.12691/jcsa-13-2-3</doi>
<publisherRecordId>JCSA20251323</publisherRecordId>
<documentType>article</documentType>
<title language="eng">Towards Domain-Aware Language Models for Low-Resource Healthcare: Fine-Tuning LLMs and SLMs with Parallel English¨CYoruba Maternal Health Data</title>
<authors>
<author>
<name>Elizabeth O. Ogunseye</name>
<affiliationId>1</affiliationId>
</author>
<author>
<name>Ezekiel A. Oladejo</name>
<email>eoladejo184@stu.ui.edu.ng</email>
<affiliationId>1</affiliationId>
</author>
<author>
<name>Isaac Olaleye</name>
<affiliationId>1</affiliationId>
</author>
<author>
<name>Adesesan B. Adeyemo</name>
<affiliationId>2</affiliationId>
</author>

</authors>
<affiliationsList>
<affiliationName affiliationId="1">Department of Computer Science, University of Ibadan, Ibadan, Nigeria</affiliationName>


<affiliationName affiliationId="2">Faculty of Computing, University of Ibadan, Ibadan, Nigeria</affiliationName>
</affiliationsList>
<abstract language="eng">Healthcare communication barriers significantly impact maternal health outcomes in multilingual communities, particularly in Sub-Saharan Africa where indigenous languages dominate daily communication while medical resources remain primarily in colonial languages. This study presents a systematic approach to developing domain-aware language models specifically tailored for maternal health communication in low-resource settings. We introduce a comprehensive parallel English¨CYoruba maternal health dataset comprising 7,000 translated and verified sentence pairs covering prenatal care, childbirth and postnatal support. Our methodology involves fine-tuning both Large Language Models (LLMs) including GPT-3.5-turbo and LLaMA-2-7B, and Small Language Models (SLMs) such as DistilBERT, mBERT, and XLM-R across multiple evaluation dimensions including translation quality, domain-specific terminology accuracy, and clinical relevance metrics. Results demonstrate that domain-specific fine-tuning significantly improves performance over general-purpose models, with the GPT-3.5-turbo variant achieving a BLEU score of 0.78 on the held-out test set and medical terminology accuracy of 89.3%. Fine-tuned models demonstrate substantial improvements in handling culture-specific maternal health concepts and traditional medicine terminology. This work contributes to bridging the digital health divide in low-resource settings and provides a replicable framework for developing multilingual healthcare AI systems that can effectively serve diverse linguistic communities while maintaining cultural and clinical sensitivity.</abstract>
<fullTextUrl format="pdf">https://pubs.sciepub.com/jcsa/13/2/3/jcsa-13-2-3.pdf</fullTextUrl>
<keywords language="eng"><keyword>Low-resource NLP</keyword>
<keyword>Healthcare AI</keyword>
<keyword>Maternal Health</keyword>
<keyword>Yoruba Language</keyword>
<keyword>Domain Adaptation</keyword>
<keyword>Multilin- gual Models</keyword>
</keywords>
</record>
</records>
