American Journal of Applied Mathematics and Statistics
ISSN (Print): 2328-7306 ISSN (Online): 2328-7292 Website: Editor-in-chief: Mohamed Seddeek
Open Access
Journal Browser
American Journal of Applied Mathematics and Statistics. 2021, 9(3), 75-82
DOI: 10.12691/ajams-9-3-1
Open AccessArticle

Modelling a Multilevel Data Structure Using a Composite Index

Prabath Badullahewage1, and Dilhari Attygalle1

1Department of Statistics, University of Colombo, Colombo, Sri Lanka

Pub. Date: July 23, 2021

Cite this paper:
Prabath Badullahewage and Dilhari Attygalle. Modelling a Multilevel Data Structure Using a Composite Index. American Journal of Applied Mathematics and Statistics. 2021; 9(3):75-82. doi: 10.12691/ajams-9-3-1


When modelling complexed data structures related to a certain social aspect, there could be various hierarchical levels where data units are nested within each other. There could also be several variables in each level, and those variables may not be unique for each case or record, making the data structure even more complexed. Multilevel modelling has been used for decades, to handle such data structures, but may not be effective at all times to capture the structure fully, due to the extent of complexities of the data structure and the inherent issues of the procedure. On the contrary, ignoring the multilevel data structure when modelling, can lead to incorrect estimations and thereby may not achieve acceptable accuracies from the model. This research explains a simple approach where a complexed multilevel structure is compressed to a single level by combining higher level variables to form a composite index. Moreover, this composite index, also reduces the number of variables considered in the entire modelling process, substantially. The process is exemplified, using a primary data set gathered on household education expenditure using a systematic sampling survey. Several variables are collected on each household and another set of variables relating to each school going child in the household, creating a multilevel data structure. The composite index, named as, ¡°Household Level Education Index¡± is developed through a factor analysis where the detailed process of its construction is explained. The LASSO regression was performed to illustrate the use of the proposed composite index by predicting the monthly household education expenditure through a single level regression model. Finally, a Random Forest model was used to examine the feature importance, where the proposed composite index ¡°Household level education index¡± was the most important feature in predicting the monthly household educational expenditure.

composite index multilevel modeling factor analysis educational expenditure

Creative CommonsThis work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit


[1]  Schwab, K. (2016). The fourth industrial revolution: What it means and how to respond. Retrieved from Accessed 25 March 2021.
[2]  Greco, S., Ishizaka, A., Tasiou, M. et al. On the Methodological Framework of Composite Indices: A Review of the Issues of Weighting, Aggregation, and Robustness. Soc Indic Res. 141, 61-94 (2019).
[3]  Bandura, R. (2011). Composite indicators and rankings: Inventory 2011. Technical report, Office of Development Studies, United Nations Development Programme (UNDP), New York.
[4]  Yang, L., (2014). An inventory of composite measures of human progress, Technical report, United Nations Development Programme Human Development Report Office.
[5]  Saisana, M., & Tarantola, S. (2002). State-of-the-art report on current methodologies and practices for composite indicator development. European Commission, Joint Research Centre, Institute for the Protection and the Security of the Citizen, Technological and Economic Risk Management Unit, Ispra, Italy.
[6]  Steenbergen, M., & Jones, B. (2002). Modeling Multilevel Data Structures. American Journal of Political Science, 46(1), 218-237.
[7]  Dedrick, R., Ferron, J., Hess, M., Hogarty, K., Kromrey, J., & Lang, T. et al. (2009). Multilevel Modeling: A Review of Methodological Issues and Applications. Review Of Educational Research, 79(1), 69-102.
[8]  Mazziotta, M., & Pareto, A. (2013). Methods For Constructing Composite Indices: One For All Or All For One? Rivista Italiana Di Economia Demografia e Statistica, 67(02), 67-80.
[9]  OECD. (2008). Handbook on constructing composite indicators: methodology and user guide. Paris.
[10]  Sharma, S. (1996). Applied multivariate techniques. New York: J. Wiley.
[11]  Fernando, M., Samita, S., & Abeynayake, R. (2012). Modified Factor Analysis to Construct Composite Indices: Illustration on Urbanization Index. Tropical Agricultural Research, 23(4), 327.
[12]  James, G., Witten, D., Hastie, T., & Tibshirani, R. (2014). An introduction to statistical learning: with applications in R. New York: Spriger.
[13]  Bartholomew, D. (2010). Analysis and Interpretation of Multivariate Data. International Encyclopedia Of Education, 12-17.
[14]  De Leeuw, J., & Meijer, E. (2010). Handbook of multilevel analysis (pp. 1-75). New York: Springer.