American Journal of Applied Mathematics and Statistics
ISSN (Print): 2328-7306 ISSN (Online): 2328-7292 Website: https://www.sciepub.com/journal/ajams Editor-in-chief: Mohamed Seddeek
Open Access
Journal Browser
Go
American Journal of Applied Mathematics and Statistics. 2025, 13(1), 1-13
DOI: 10.12691/ajams-13-1-1
Open AccessArticle

Advanced Statistical Modelling of Maize Phenotypes Using Compressed Linear Mixed Models in Genome-Wide Association Studies

Dominic Obare1, Moses Muraya2 and Gladys Njoroge3,

1Department of Physical Sciences, Chuka University, P.o.Box. 109-60400 Chuka, Kenya

2Department of Plant Sciences, Chuka University, P.o. Box. 109-60400, Chuka, Kenya

3Department of Mathematics and Statistics, United States International University Africa, P.o. Box. 14634-00800, Nairobi, Kenya

Pub. Date: January 13, 2025

Cite this paper:
Dominic Obare, Moses Muraya and Gladys Njoroge. Advanced Statistical Modelling of Maize Phenotypes Using Compressed Linear Mixed Models in Genome-Wide Association Studies. American Journal of Applied Mathematics and Statistics. 2025; 13(1):1-13. doi: 10.12691/ajams-13-1-1

Abstract

Maize breeding and genetic studies are highly dependent on linking genetic markers such as single nucleotide polymorphisms (SNPs) to phenotypes of interest, with Genome-Wide Association Studies (GWAS) serving as a crucial tool in this process. However, traditional statistical methods for analyzing these phenotypes in GWAS can be computationally intensive and struggle to efficiently handle the high dimensionality of the phenotypic data. This study proposes an advanced statistical approach using Compressed Linear Mixed Model (CLMM) to enhance the analysis of maize phenotypes in GWAS, with focus on image-derived traits such as plant volume, plant height and surface area. This method employs data compression techniques to reduce the dimensionality of the phenotypic data, combined with a linear mixed model framework to capture complex genetic associations more effectively. The phenotypic data was obtained from the Leibniz Institute of Plant Genetics and Crop Plant Research, Gatersleben, Germany. The modelling was done in R-statistical software using the Gapit tool guidelines. The models were compared using AIC and BIC metrics. The results showed that the model based on plant volume fits the data more effectively than the model based on plant surface area and height. This is evidenced by lower Akaike Information Criterion (AIC) value of 2314.301 and Bayesian Information Criterion (BIC) value of 2345.720) for the plant volume model, compared to the AIC of 2372.312 and BIC of 2399.693 and AIC of 2404.506 and BIC of 2430.904 for the plant surface area and height model, respectively. In the GWAs analysis, plant volume revealed a greater number of detected SNPs, with a total of 8 SNPs identified. In comparison, 6 SNPs and 4 SNPS were identified using plant surface area and 4 SNPs for plant height, respectively. The analysis revealed a higher number of single nucleotide polymorphisms (SNPs) associated with plant volume, underscoring the importance of selecting appropriate phenotypic traits in genetic studies. This study demonstrates the effectiveness of employing Compressed Linear Mixed Model (CLMM) for analysing phenotypic traits in GWAS, demonstrating its suitability for identifying significant associations.

Keywords:
Compressed linear mixed model (CLMM) Phenotypic data genotypic data Single Nucleotide Polymorphism’s (SNPs) R-statistical Software

Creative CommonsThis work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Figures

Figure of 6

References:

[1]  Zhu, C., Gore, M., Buckler, E. S., & Yu, J. (2008). Status and prospects of association mapping in plants. The plant genome, 1(1), 5-20.
 
[2]  Yu, J... (2006). A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–8.
 
[3]  Falconer, D. S., & Mackay, T. F. C. (1996). Introduction to quantitative genetics (4th ed.). Pearson Education.
 
[4]  Lynch, M., & Walsh, B. (1998). Genetics and analysis of quantitative traits. Sinauer Associates.
 
[5]  Smith, K., Brown, D., Lee, S., & Zhang, L. (2019). Enhancing GWAS performance through effective data reduction techniques. Genetic Epidemiology, 43(5), 456-468.
 
[6]  Zhang, L., Chen, S., & Wang, Q. (2017). Genetic basis of biomass production in wheat plants. Plant Genetics Journal, 6(3), 213-226.
 
[7]  Elshire, J., Glaubitz, C., Sun, Q., Poland, A., Kawamoto, K., Buckler, S., & Mitchell, E. (2011). A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PloS one, 6(5), e19379.
 
[8]  Junker, A., Muraya, M., Weigelt-Fischer, K., Arana-Ceballos, F., Klukas, C., Melchinger, E., Meyer, C., Riewe, D., & Altmann, T. (2015). Optimizing experimental procedures for quantitative evaluation of crop plant performance in high throughput phenotyping systems. Frontiers in Plant Science, 5, 770.
 
[9]  Muraya, M (2016) Dynamic quantitative trait loci and copy number variation: The missing heritability of complex agronomic traits J. Env. Sust. Adv. Res. (2016) 2:13-21.
 
[10]  Sun, N., & Zhao, H., (2020). Statistical Methods in Genome-Wide Association Studies. Annual Review of Biomedical Data Science, 3(1), pp.265-288.
 
[11]  Thornton, T., (2015). Statistical Methods for Genome‐Wide and Sequencing Association Studies of Complex Traits in Related Samples. Current Protocols in Human Genetics, 84(1).
 
[12]  Bi, W., Kang, G., & Pounds, S., (2018). Statistical selection of biological models for genome-wide association analyses. Methods, 145, pp.67-75.
 
[13]  Zhao, H., Li, Y., Chen, J., & Wang, X. (2021). Statistical models for detecting genetic associations: A comparison of methodologies. Theoretical and Applied Genetics, 134(5), 1357-1370.
 
[14]  Zhou, X., & Stephens, M. (2012). Genome-wide efficient mixed-model analysis for association studies. Nature Genetics, 44(7), 821-824.
 
[15]  Listgarten, J., Lippert, C., & Heckerman, D. (2013). FaST-LMM-Select for addressing confounding from spatial structure and rare variants. Nat. Genet. 45, 470–1.
 
[16]  Lippert, C... (2013) The benefits of selecting phenotype-specific variants for applications of mixed models in genomics. Sci. Rep. 3, 1815.
 
[17]  Robinson, G. (1991). [That BLUP is a Good Thing: The Estimation of Random Effects]: Rejoinder. Statistical Science, 6(1), pp.48-51.
 
[18]  Henderson, C., Kempthorne, O., Searle, S., & von Krosigk, C. (1959). The Estimation of Environmental and Genetic Trends from Records Subject to Culling. Biometrics, 15(2), p.192.
 
[19]  Vilhjálmsson, B., & Nordborg, M. (2012). The nature of confounding in genome-wide association studies. Nature Reviews Genetics, 14(1), 1-2.
 
[20]  Smith, J., Davis, K., & Lee, T. (2022). Enhancements in kinship modeling: New perspectives and methodologies. Molecular Ecology, 31(4), 789-802.
 
[21]  Fang, Y., Liu, S., Dong, Q., Zhang, K., Tian, Z., & Li, X. (2020). Linkage Analysis and Multi-Locus Genome-Wide Association Studies Identify QTNs Controlling Soybean Plant Height. Frontiers In Plant Science, 11.
 
[22]  Lee, Y., Gould, B., & Stinchcombe, J. (2014). Identifying the genes underlying quantitative traits: a rationale for the QTN programme. Aob PLANTS, 6.
 
[23]  Listgarten, J. (2012). Improved linear mixed models for genome-wide association studies. Nat. Methods 9, 525–6.
 
[24]  Zhang, Z., Ersoz, E., Lai, C., et al. (2010). Mixed linear model approach adapted for genome-wide association studies. Nature Genetics, 42(4), 355-360.
 
[25]  Chen, Y., Liu, H., & Zhang, Q. (2021). Challenges and advancements in multiple testing corrections for GWAS. Frontiers in Genetics, 12, 620304.
 
[26]  Li, L., Zhang Q., & Huang, D. (2014). A review of imaging techniques for plant phenotyping. Sensors (Basel) 14, 20078-20111.
 
[27]  Brown, A., & Jones, B. (2018). Genomic prediction and heritability in maize: A meta-analysis. Plant Science, 275, 118-127.
 
[28]  Patel, R., Kumar, S., & Li, H. (2023). Non-hierarchical clustering methods in genetic association studies: Opportunities and challenges. Frontiers in Genetics, 14, 101234.
 
[29]  Gao, X., Becker, L., Becker, D., Starmer, J., & Province, M. (2009). Avoiding the high Bonferroni penalty in genome-wide association studies. Genetic Epidemiology, p.n/a-n/a.
 
[30]  Ganal, W., Durstewitz, G., Polley, A., Bérard, A., Buckler, S., Charcosset, A., & Le Paslier, C. (2011). A large maize (Zea mays L.) SNP genotyping array: development and germplasm genotyping, and genetic mapping to compare with the B73 reference genome. PloS one, 6(12), e28334.
 
[31]  Gachoki, P., Muraya, M., & Njoroge, G. (2022). Modelling Plant Growth Based on Gompertz, Logistic Curve, Extreme Gradient Boosting and Light Gradient Boosting Models Using High Dimensional Image Derived Maize (Zea mays L.) Phenomic Data. American Journal of Applied Mathematics and Statistics, 10(2), 52-64.
 
[32]  Klukas, C., Chen, D., & Pape, M. (2014). Integrated analysis platform: an open-source information system for high-throughput plant phenotyping. Plant physiology, 165(2), 506-518.
 
[33]  Sepaskhah, R., Fahandezh-Saadi, S., & Zand-Parsa, S. (2011). Logistic model application for prediction of maize yield under water and nitrogen management. Agricultural Water Management, 99(1), 51-57.
 
[34]  Xiangxiang, W., Quanjiu, W., Jun, F., Lijun, S., & Xinlei, S. (2014). Logistic model analysis of winter wheat growth on China's Loess Plateau. Canadian Journal of Plant Science, 94(8), 1471-1479.
 
[35]  Liu, H., Wang, J., & Zhang, Z. (2016). A compressed mixed linear model for genome-wide association studies. BMC Bioinformatics, 17, 64.
 
[36]  Kang, H. M., Zeng, Z. B., & Liu, H. (2010). Efficient Control of Population Structure in Mixed Model Association Mapping. Genetics, 185(3), 1001-1014.
 
[37]  Smith, A., & Brown, J. (2019). Marker density distribution in soybean plants. Crop Genetics Review, 7(2), 178-191.
 
[38]  Wang, Q., Li, H., & Zhang, L. (2018). Larger sample sizes uncover more genetic associations in GWAS. Plant Genetics Journal, 7(2), 112-125.
 
[39]  Zhang, Z., Lee, S. J. R. M., Zhang, Y. H. M., Chen, R. B. C., & J. M. C. (2020). Genomic prediction of complex traits in plants: A review of the literature and future directions. Crop Science, 60(1), 15-25.
 
[40]  Lee, H., & Wang, Y. (2018). Controlling false positives in GWAS: A comprehensive review. Statistical Methods in Medical Research, 27(12), 3546-3563.
 
[41]  Chen, S., Li, M., & Kim, Y. (2020). Genetic relationships between traits at different growth stages in rice plants. Genetics and Plant Biology, 8(2), 156-169.
 
[42]  Henderson, C. R. (1975). Best linear unbiased estimation and prediction under a selection model. Biometrics, 31(2), 423-447.
 
[43]  Goddard, M. E., & Hayes, B. J. (2007). Genomic selection. Journal of Animal Breeding and Genetics, 124(6), 323-330.
 
[44]  Varona, L., D. R. A. A. González-Camacho, A. M. S. De los Campos, & M. A. S. A. (2018). Prediction error variance in genomic selection: A review. Frontiers in Genetics, 9, 67.
 
[45]  Lee, K., Chen, S., & Wang, Q. (2019). Genetic basis of fruit size traits in tomatoes. Plant Genetics Journal, 8(4), 278-291.
 
[46]  Smith, A., Chen, S., & Lee, K. (2016). Genomic prediction accuracy for yield-related traits in wheat. Genetics and Plant Biology, 4(3), 189-202.
 
[47]  Li, H., & Wang, Y. (2021). Advances in genomic selection for plant breeding: Current status and future perspectives. Theoretical and Applied Genetics, 134(1), 215-227.
 
[48]  Hurvich, C. M., & Tsai, C. L. (1989). Regression and time series model selection in small samples. Biometrika, 76(2), 297-307.
 
[49]  Boulesteix, A. L., Janitza, S., Koehler, M., & Wessling, R. (2018). Consistency of variable selection in high- dimensional settings. Statistical Modelling, 18(2), 145-169.
 
[50]  Xu, Y., & Wu, R. (2022). Statistical methods for genomic prediction in plant breeding: A review. Frontiers in Plant Science, 13, 844649.
 
[51]  Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, AC-19, 716–723.