American Journal of Educational Research
ISSN (Print): 2327-6126 ISSN (Online): 2327-6150 Website: Editor-in-chief: Ratko Pavlović
Open Access
Journal Browser
American Journal of Educational Research. 2019, 7(11), 878-887
DOI: 10.12691/education-7-11-19
Open AccessArticle

Effectiveness of Mantel-Haenszel And Logistic Regression Statistics in Detecting Differential Item Functioning Under Different Conditions of Sample Size, Ability Distribution and Test Length

Ferdinand Ukanda1, Lucas Othuon1, , John Agak1 and Paul Oleche2

1Department of Educational Psychology, Maseno University, Private Bag, Maseno, Kenya

2Department of Pure and Applied Mathematics, Maseno University, Private Bag, Maseno, Kenya

Pub. Date: November 27, 2019

Cite this paper:
Ferdinand Ukanda, Lucas Othuon, John Agak and Paul Oleche. Effectiveness of Mantel-Haenszel And Logistic Regression Statistics in Detecting Differential Item Functioning Under Different Conditions of Sample Size, Ability Distribution and Test Length. American Journal of Educational Research. 2019; 7(11):878-887. doi: 10.12691/education-7-11-19


Differential Item Functioning (DIF) is a statistical method that determines if test measurements distinguish abilities by comparing two sub-population outcomes on an item. The Mantel-Haenszel (MH) and Logistic Regression (LR) statistics provide effect size measures that quantify the magnitude of DIF. The purpose of the study was to investigate through simulation the effects of sample size, ability distribution and test length on the number of DIF detections using MH and LR methods. A Factorial research design was used in the study. The population of the study consisted of 2000 examinee responses. A stratified random sampling technique was used with the stratifying criteria as the reference (r) and focal (f) groups. Small sample sizes (20r/20f), (60r/60f) and a large sample size (1000r/1000f) were established. WinGen3 statistical software was used to generate dichotomous item response data. The average effect sizes were obtained for 1000 replications. The number of DIF items were used to draw statistical graphs. The findings of the study showed that MH statistic detected more type A and B DIF items than LR regardless of the nature of Ability Distribution, Sample size and Test length. However MH statistic detected more type C DIF items than LR regardless of Ability Distribution, Sample size and Test length. The number of type C DIF items detected depended on the sample size, test length and ability distribution. Selective use of LR was therefore necessary for detecting type A and B DIF items while MH for detecting Type C DIF items. The findings of the study are of great significance to teachers, educational policy makers, test developers and test users.

Differential Item Functioning (DIF) Mante-Haenszel (MH) Logistic Regression (LR) effect size (ES) sample size ability distribution test length WinGen3

Creative CommonsThis work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit


Figure of 3


[1]  Salubayba, T. M. Differential item functioning detection in reading comprehension test using Mantel-Haenszel, Item response Theory, and logical data analysis. The international Journal of social sciences, 2013, 14(1), 76-82.
[2]  Schumacker, R. Test bias and differential item functioning,2005. Retrieved on 2nd March 2011 from Papers/TEST Bias and Differential Item Functioning.pdf.
[3]  Wang, W., & Su, Y. Factors influencing the Mantel and generalized Mantel-Haenszel methods for the assessment of Differential Item Functioning in polytomous items. Applied Psychological Measurement, 2004, 28(6), 450-480.
[4]  Mantel, N., & Haenszel, W. Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 1959, 22, 719-748.
[5]  Holland, P.W., & Thayer, H. Differential item performance and the Mantel-Haenszel procedure. In Weiner, H. & Braun, H. (Eds,). Test Validity. 1988, 129-145. Hillsdale, NJ: Laurence Erlbaum Associates. Retrieved in 2009 from
[6]  Zwick, R.& Erickan, K. Analysis of Differential Item Functioning in the NAEP History Assessment. Journal of Educational Measurement, 1989, 26(1), 55-66.
[7]  Zieky, M. Practical questions in the use of DIF statistics in test development. In P. Holland & H. Wainer (Eds.), Differential item functioning 1993, 337-348 . Hillsdale, NJ: Erlbaum.
[8]  Zumbo, B.D., & Thomas, D.R. A measure of DIF effect size using logistic regression procedures. Paper presented at the National Board of Medical Examiners, Philadelphia, 1996. Retrieved on 19th September 2012 from
[9]  Jodoin, M. G., & Gierl, M.J. Evaluating type I error and power rates using an effect size measure with the Logistic Regression procedure for DIF detection. Applied Measurement in Education. 2002, 14, 329-349.
[10]  Hernández, A., & González-Romá, V. Evaluating the multiple-group mean and covariance structure model for the detection of differential item functioning in polytomous ordered items. Psichtema, 2003, 15, 322-327.
[11]  Khalid, N, M. The performance of Mantel-Haenszel procedures in the identification of DIF items. International Journal of Educational Sciences 2011, 3(2), 435-447.
[12]  French, B. F., & Maller, S. J. Iterative purification and effect size use with Logistic Regression for differential item functioning. Educational Psychological Measurement, 2007, 67(3), 373-393.
[13]  Han, K. T., & Hambleton, R. K. User’s manual for WinGen: Windows software that generates IRT model parameters and item responses. Center for Educational Assessment Research Report No. 642. University of Massachusetts, 2009.
[14]  Harrison, J. R., Zhiang, L. I. N., Carrol, G. R. & Carley, K. M. Simulation modeling in organizational and management research. Academy of Management Review, 2007, 32(4), 1229-1245.
[15]  Davies, J. P., Eisenhardt, K. M. & Bingham, C. B. Developing theory through simulation methods. Academy of Management Review, 2007. 32(2), 480-499.
[16]  Othuon, L. O. A. The accuracy of parameter estimates and coverage probability of population values in regression models upon different treatments of systematically missing data. Unpublished PhD thesis. University of British Columbia (1998).
[17]  Hidalgo, M. D., & Lopez-Pina, J.A. Differential item functioning detection and effect size: a comparison between Logistic Regression and Mantel-Haenszel 137 procedures. Educational and Psychological Measurement, 2004, 64(6), 903-915.
[18]  González-Romá, V., Hernández, A., & Gόmez-Benito, J. Power and Type I error of the mean and covariance structure analysis model for detecting differential item functioning in graded response items.Multivariate Behavioral Research, (2006), 41(1), 29-53.
[19]  Pedrajita, Q J., & Talisayon, V.M. Identifying Biased Test Items by Differential Item Functioning Analysis Using Contingency Table Approaches: A Comparative Study. Education Quarterly, University of the Philippines College of Education, 2009, Vol. 67 (1), 21-43.
[20]  Adedoyin, O.O. IRT approach to detect gender biased items in public examinations: A case study from the Botswana junior certificate examination in Mathematics. Educational Research and Reviews 2010, Vol. 5 (7), pp. 385-399.
[21]  Fidalgo, Á. M., Ferreres, D. & Muñiz, J. Liberal and conservative Differential Item Functioning detection using Mantel-Hanszel and SIBTEST: Implications for Type I and Type II error rates. Journal of Experimental Education, 2004, 73(1), 23-39.