1Mining Engineering Department, University Yembila-Abdoulaye-TOGUYENI, BP 54 Fada N' Gourma, Burkina Faso
2Geosciences and Environment Laboratory (LaGE). Joseph KI-ZERBO University, BP 7021, Ouagadougou, Burkina Faso
3Direction de la Qualité des Eaux/Ministère de l'Environnement de l'Eau et de l'Assainissement, Burkina Faso
4Laboratoire Sciences et Technologie (LaST), Université Thomas SANKARA, 12 BP 417 Ouagadougou 12, Burkina Faso
American Journal of Water Resources.
2025,
Vol. 13 No. 3, 86-96
DOI: 10.12691/ajwr-13-3-3
Copyright © 2025 Science and Education PublishingCite this paper: Issoufou OUEDRAOGO, W. J. P. SANDWIDI, Fatoumata KABORE, Mahamadou KONARE, Cheick Abdramane OUATTARA. Predictive Modelling of Groundwater Quality in the Nakanbé River Basin Using Machine Learning Techniques.
American Journal of Water Resources. 2025; 13(3):86-96. doi: 10.12691/ajwr-13-3-3.
Correspondence to: Issoufou OUEDRAOGO, Mining Engineering Department, University Yembila-Abdoulaye-TOGUYENI, BP 54 Fada N' Gourma, Burkina Faso. Email:
ouedraogo.issoufou03@gmail.comAbstract
Continuous monitoring of groundwater quality is essential for protecting public health and the environment, particularly in vulnerable regions such as Burkina Faso’s Nakanbé Basin, where groundwater serves as a primary source of potable water. This study aimed to develop and evaluate machine learning (ML) models to predict two key water quality parameters: Total Dissolved Solids (TDS) and Total Alkalinity (TA), using data provided by the General Directorate of Water Resources (DGRE). A total of 1,765 groundwater samples were analyzed, encompassing nineteen physicochemical parameters. Prior to modelling, multicollinearity analysis was conducted to ensure the reliability of the input variables. Three regression algorithms Random Forest Regression (RFR), Multiple Linear Regression (MLR), and Decision Tree Regression (DTR) were compared for their predictive performance. Among them, Random Forest demonstrated the highest accuracy, with the highest R² and lowest error metrics (MAE, RMSE) across both training and testing datasets for both TDS and TA. While MLR offered consistent and interpretable results, particularly for TA, DTR exhibited strong overfitting, with lower generalizability on test data. The results highlight the superiority of ensemble learning approaches, particularly RFR, in capturing complex, nonlinear relationships within groundwater quality datasets. ML application in this context provides a cost-effective and scalable alternative to conventional laboratory-based monitoring methods. It also enables the identification of influential water quality parameters, supports risk assessment of contamination, and contributes to evidence-based water resource management strategies. These findings demonstrate the potential of ML tools to enhance groundwater monitoring and advance sustainable water governance in arid and semi-arid regions.
Keywords