Parallelism of Test Items: Estimating the Means (µ), Variances (σ2) and Covariances (Cσ2) of Alternate Test Forms

Simon Ntumi; Sheilla Agbenyo; Tapela Bulala

International Journal of Data Envelopment Analysis and *Operations Research*. 2022, 3(1), 1-7
DOI: 10.12691/ijdeaor-3-1-1

Open AccessArticle

Parallelism of Test Items: Estimating the Means (µ), Variances (σ²) and Covariances (Cσ²) of Alternate Test Forms

Simon Ntumi^1,, Sheilla Agbenyo² and Tapela Bulala³

¹Department of Educational Foundations, Faculty of Educational Studies, University of Education, Winneba (UEW), West Africa, Ghana

²Bia Lamplighter College of Education, West Africa, Ghana

³Botswana University of Agriculture and Natural Resources (BUAN), Southern Africa, Botswana

Pub. Date: February 23, 2022

View Full Text Full Text PDF (227 KB) Full Text ePUB(137 KB)

Cite this paper:
Simon Ntumi, Sheilla Agbenyo and Tapela Bulala. Parallelism of Test Items: Estimating the Means (µ), Variances (σ²) and Covariances (Cσ²) of Alternate Test Forms. International Journal of Data Envelopment Analysis and *Operations Research*. 2022; 3(1):1-7. doi: 10.12691/ijdeaor-3-1-1

Abstract

Background: Within the space of classical test theory (CTT), alternate test forms are needed so that they can be applied to different groups or at different testing occasions. This CTT theoretical assumption urged the researchers to construct alternate test forms and estimate their parameters (µ, σ² and Cσ²). Methods: To obtain the parameter estimates (µ, σ² and Cσ²), three (3) alternate test forms (X1, X2 and X3) were carefully constructed and administrated to fifty-eight (58) business students at University Practice Senior High School in the Cape Coast metropolis, Ghana. One psychological test scale (DASS21) was also adopted as the form Y. The tests were administered to the students under suitable and conductive examination conditions and this ensured validity and reliability of the scores. Findings: After the statistical estimations, the study found that mean parameter of the four forms (X1, X2, X3 and Form Y) were unequal (µX1 ≠µX2 ≠µX3 ≠ µY). That is X1 (µ=7.23, n=58), X2 (µ=7.14, n=58), X3 (µ= 8.01, n=58) and Form Y-DASS21 (µ=7.92, n=58) p (0.306, CI95%) > 0.05. On the variance parameter, similar results were accrued as the test forms are not equal in their variances (σ²X1X2≠σ²X1X3≠σ²X2 X3≠σ²Y). This was reported as X1 (σ² =6.120, n=58), X2 (σ²=9.007, n=58), X3 (σ²=8.040, n=58) and Form “Y” DASS21 recorded a variance of (σ²=8.034, n=58) (p-value 0.121>0.05). Finally, on the covariance parameter, we found that the test forms were not equal (Cσ²X1Y≠Cσ²X2 Y≠Cσ²X3Y). The result is reported as (X1= Cσ² =5.338, n=58, p= 0.846), (X2= Cσ²=6.023, n=58, p= 0.831) (X3= Cσ²=7.898, n=58, p= 0.783). Conclusions: The study concluded that the constructed alternate test forms met the congeneric parallelism conditions. The estimated parameters were similar in content, where the µ, σ² and Cσ² were similar across all the test forms (X1, X2, X3 and Form Y).

Keywords:
parallelism alternate test forms means variances and covariances

This work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

References:

[1]	Clause, C.A., Mullins, M. E., Nee, M. T. Pulakos, E. & Schmitt, N. (2016). Parallel test form development: A procedure for alternate predictors and an example. Personnel Psychology, 6(51), 1-287.

[2]	Cronbach, L. J. (1947). Test “reliability”: Its meaning and determination. Psychometrika, 12(1), 1-16

[3]	Drasgow, F. (2016). Technology and testing: Improving educational and psychological measurement. New York: Routledge.

[4]	Gierl, M., Daniels, L., & Zhang, X. (2017). Creating parallel forms to support on-demand testing for undergraduate students in psychology. Journal of Measurement and Evaluation in Education and Psychology, 8(3), 288-302.

[5]	Hilger, N., & Beauducel, A. (2017). Parallel-forms reliability. In Encyclopedia of Personality and Individual Differences (pp. 1-3). Springer, Cham.

[6]	Kowalski, I. M., Protasiewicz-Fałdowska, H., Dwornik, M., Pierożyński, B., Raistenskis, J., & Kiebzak, W. (2014). Objective parallel-forms reliability assessment of 3-dimension real time body posture screening tests. BMC Pediatrics, 14(1), 1-8.

[7]	Lord, F. M & Novick, R. M. (2000). Statistical theories of mental test scores. Educational testing services: New York University.

[8]	Lovibond, S.H. & Lovibond, P.F. (2014). Manual for the depression anxiety & stress scales. (2^nd Ed.) Sydney: Psychology Foundation.

[9]	Luecht, R. M. (2016). Computer-based test delivery models, data, and operational implementation issues. In F. Drasgow (Ed.), Technology and testing: Improving educational and psychological measurement (pp. 179-205). New York: Routledge.

[10]	Miller, J., & Ulrich, R. (2003). Simple reaction time and statistical facilitation: A parallel grains model. Cognitive Psychology, 46(2), 101-151.

[11]	Raykov, T. (2015). Estimation of composite reliability for congeneric measures. Applied Psychological Measurement, 21(2), 173-184.

[12]	Raykov, T., Patelis, T., & Marcoulides, G. A. (2011). Examining parallelism of sets of psychometric measures using latent variable modeling. Educational and Psychological Measurement, 71(6), 1047-1064.

[13]	Scully, D. (2017). Constructing multiple-choice items to measure higher-order thinking. Practical Assessment, Research & Evaluation, 22(4), 4-13.

[14]	Sharma, P., Dunn, R. L., Wei, J. T., Montie, J. E., & Gilbert, S. M. (2016). Evaluation of point-of-care PRO assessment in clinic settings: integration, parallel-forms reliability, and patient acceptability of electronic QOL measures during clinic visits. Quality of Life Research, 25(3), 575-583.

[15]	Singhal, S. P., & Sridevi, M. (2019). Comparative study of performance of parallel Alpha Beta Pruning for different architectures. In 2019 IEEE 9th International Conference on Advanced Computing (IACC) (pp. 115-119). IEEE.

[16]	Sireci, S., & Zenisky, A. (2016). Computerized innovative item formats: Achievement and credentialing. In S. Lane, M. Raymond, & T. Haladyna (Eds.), handbook of test development (2nd ed., 313-334). New York: Routledge.

[17]	Thompson, B., & Vacha-Haase, T. (2000). Psychometrics is datametrics: The test is not reliable. Educational and Psychological Measurement, 60(2), 174-195.

[18]	Wolfinger, R. D. (2014). Heterogeneous variance: covariance structures for repeated measures. Journal of Agricultural, Biological, And Environmental Statistics, 8(7), 205-230.

[19]	Wu, S. L., Tio, Y. P., & Ortega, L. (2021). Elicited imitation as a measure of L2 proficiency: New insights from a comparison of two L2 English parallel forms. Studies in Second Language Acquisition, 8(7), 1-30.

[20]	Yarnold, P. R. (2014). How to Assess the Inter-Method (Parallel-Forms) Reliability of Ratings Made on Ordinal Scales: Emergency Severity Index (Version 3) and Canadian Triage Acuity Scale. Optimal Data Analysis, 3(4), 50-54.