1Department of Physical Sciences, Chuka University, P.o.Box. 109-60400 Chuka, Kenya
2Department of Plant Sciences, Chuka University, P.o. Box. 109-60400, Chuka, Kenya
3Department of Mathematics and Statistics, United States International University Africa, P.o. Box. 14634-00800, Nairobi, Kenya
American Journal of Applied Mathematics and Statistics.
2025,
Vol. 13 No. 1, 1-13
DOI: 10.12691/ajams-13-1-1
Copyright © 2025 Science and Education PublishingCite this paper: Dominic Obare, Moses Muraya, Gladys Njoroge. Advanced Statistical Modelling of Maize Phenotypes Using Compressed Linear Mixed Models in Genome-Wide Association Studies.
American Journal of Applied Mathematics and Statistics. 2025; 13(1):1-13. doi: 10.12691/ajams-13-1-1.
Correspondence to: Gladys Njoroge, Department of Mathematics and Statistics, United States International University Africa, P.o. Box. 14634-00800, Nairobi, Kenya. Email:
Obaredominic87@gmail.comAbstract
Maize breeding and genetic studies are highly dependent on linking genetic markers such as single nucleotide polymorphisms (SNPs) to phenotypes of interest, with Genome-Wide Association Studies (GWAS) serving as a crucial tool in this process. However, traditional statistical methods for analyzing these phenotypes in GWAS can be computationally intensive and struggle to efficiently handle the high dimensionality of the phenotypic data. This study proposes an advanced statistical approach using Compressed Linear Mixed Model (CLMM) to enhance the analysis of maize phenotypes in GWAS, with focus on image-derived traits such as plant volume, plant height and surface area. This method employs data compression techniques to reduce the dimensionality of the phenotypic data, combined with a linear mixed model framework to capture complex genetic associations more effectively. The phenotypic data was obtained from the Leibniz Institute of Plant Genetics and Crop Plant Research, Gatersleben, Germany. The modelling was done in R-statistical software using the Gapit tool guidelines. The models were compared using AIC and BIC metrics. The results showed that the model based on plant volume fits the data more effectively than the model based on plant surface area and height. This is evidenced by lower Akaike Information Criterion (AIC) value of 2314.301 and Bayesian Information Criterion (BIC) value of 2345.720) for the plant volume model, compared to the AIC of 2372.312 and BIC of 2399.693 and AIC of 2404.506 and BIC of 2430.904 for the plant surface area and height model, respectively. In the GWAs analysis, plant volume revealed a greater number of detected SNPs, with a total of 8 SNPs identified. In comparison, 6 SNPs and 4 SNPS were identified using plant surface area and 4 SNPs for plant height, respectively. The analysis revealed a higher number of single nucleotide polymorphisms (SNPs) associated with plant volume, underscoring the importance of selecting appropriate phenotypic traits in genetic studies. This study demonstrates the effectiveness of employing Compressed Linear Mixed Model (CLMM) for analysing phenotypic traits in GWAS, demonstrating its suitability for identifying significant associations.
Keywords