Article citationsMore >>

Zumbo, B.D., & Thomas, D.R. A measure of DIF effect size using logistic regression procedures. Paper presented at the National Board of Medical Examiners, Philadelphia, 1996. Retrieved on 19th September 2012 from http://www.educ.ubc.ca/faculty/zumbo/cv.htm.

has been cited by the following article:

Article

Effectiveness of Mantel-Haenszel And Logistic Regression Statistics in Detecting Differential Item Functioning Under Different Conditions of Sample Size, Ability Distribution and Test Length

1Department of Educational Psychology, Maseno University, Private Bag, Maseno, Kenya

2Department of Pure and Applied Mathematics, Maseno University, Private Bag, Maseno, Kenya


American Journal of Educational Research. 2019, Vol. 7 No. 11, 878-887
DOI: 10.12691/education-7-11-19
Copyright © 2019 Science and Education Publishing

Cite this paper:
Ferdinand Ukanda, Lucas Othuon, John Agak, Paul Oleche. Effectiveness of Mantel-Haenszel And Logistic Regression Statistics in Detecting Differential Item Functioning Under Different Conditions of Sample Size, Ability Distribution and Test Length. American Journal of Educational Research. 2019; 7(11):878-887. doi: 10.12691/education-7-11-19.

Correspondence to: Lucas  Othuon, Department of Educational Psychology, Maseno University, Private Bag, Maseno, Kenya. Email: lothuonus2013@gmail.com

Abstract

Differential Item Functioning (DIF) is a statistical method that determines if test measurements distinguish abilities by comparing two sub-population outcomes on an item. The Mantel-Haenszel (MH) and Logistic Regression (LR) statistics provide effect size measures that quantify the magnitude of DIF. The purpose of the study was to investigate through simulation the effects of sample size, ability distribution and test length on the number of DIF detections using MH and LR methods. A Factorial research design was used in the study. The population of the study consisted of 2000 examinee responses. A stratified random sampling technique was used with the stratifying criteria as the reference (r) and focal (f) groups. Small sample sizes (20r/20f), (60r/60f) and a large sample size (1000r/1000f) were established. WinGen3 statistical software was used to generate dichotomous item response data. The average effect sizes were obtained for 1000 replications. The number of DIF items were used to draw statistical graphs. The findings of the study showed that MH statistic detected more type A and B DIF items than LR regardless of the nature of Ability Distribution, Sample size and Test length. However MH statistic detected more type C DIF items than LR regardless of Ability Distribution, Sample size and Test length. The number of type C DIF items detected depended on the sample size, test length and ability distribution. Selective use of LR was therefore necessary for detecting type A and B DIF items while MH for detecting Type C DIF items. The findings of the study are of great significance to teachers, educational policy makers, test developers and test users.

Keywords