Article citationsMore >>

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2014). An introduction to statistical learning: with applications in R. New York: Spriger.

has been cited by the following article:


Modelling a Multilevel Data Structure Using a Composite Index

1Department of Statistics, University of Colombo, Colombo, Sri Lanka

American Journal of Applied Mathematics and Statistics. 2021, Vol. 9 No. 3, 75-82
DOI: 10.12691/ajams-9-3-1
Copyright © 2021 Science and Education Publishing

Cite this paper:
Prabath Badullahewage, Dilhari Attygalle. Modelling a Multilevel Data Structure Using a Composite Index. American Journal of Applied Mathematics and Statistics. 2021; 9(3):75-82. doi: 10.12691/ajams-9-3-1.

Correspondence to: Prabath  Badullahewage, Department of Statistics, University of Colombo, Colombo, Sri Lanka. Email:


When modelling complexed data structures related to a certain social aspect, there could be various hierarchical levels where data units are nested within each other. There could also be several variables in each level, and those variables may not be unique for each case or record, making the data structure even more complexed. Multilevel modelling has been used for decades, to handle such data structures, but may not be effective at all times to capture the structure fully, due to the extent of complexities of the data structure and the inherent issues of the procedure. On the contrary, ignoring the multilevel data structure when modelling, can lead to incorrect estimations and thereby may not achieve acceptable accuracies from the model. This research explains a simple approach where a complexed multilevel structure is compressed to a single level by combining higher level variables to form a composite index. Moreover, this composite index, also reduces the number of variables considered in the entire modelling process, substantially. The process is exemplified, using a primary data set gathered on household education expenditure using a systematic sampling survey. Several variables are collected on each household and another set of variables relating to each school going child in the household, creating a multilevel data structure. The composite index, named as, ¡°Household Level Education Index¡± is developed through a factor analysis where the detailed process of its construction is explained. The LASSO regression was performed to illustrate the use of the proposed composite index by predicting the monthly household education expenditure through a single level regression model. Finally, a Random Forest model was used to examine the feature importance, where the proposed composite index ¡°Household level education index¡± was the most important feature in predicting the monthly household educational expenditure.