The Theoretical and Practical Implications of OpenAI System Rubric Assessment and Feedback on Higher Education Written Assignments

LaJuan Perronoski Fuller; Christa Bixby

American Journal of Educational Research. 2024, 12(4), 147-158
DOI: 10.12691/education-12-4-4

Open AccessArticle

The Theoretical and Practical Implications of OpenAI System Rubric Assessment and Feedback on Higher Education Written Assignments

LaJuan Perronoski Fuller^1, and Christa Bixby²

¹College of Business and Information Technology, South University, Savanah, USA

²Dissertation Department, Westcliff University, Irvine, USA

Pub. Date: April 14, 2024

View Full Text Full Text PDF (276 KB) Full Text ePUB(112 KB)

Cite this paper:
LaJuan Perronoski Fuller and Christa Bixby. The Theoretical and Practical Implications of OpenAI System Rubric Assessment and Feedback on Higher Education Written Assignments. American Journal of Educational Research. 2024; 12(4):147-158. doi: 10.12691/education-12-4-4

Abstract

Integrating artificial intelligence (AI) in teaching and assessment is becoming increasingly common. However, concern exists regarding the reliability and consistency of generative AI grading and feedback in higher education. This study aims to investigate AI chatbots, such as ChatGPT and Claude, and their ability to apply consistent grading and feedback. The research revealed implications of applying these OpenAI systems outside their intended purpose as language models for generating human-like text. The data collected reveal significant discrepancies in grading patterns, feedback rationale, and formatting of responses. These inconsistencies challenge some traditional theories of learning and assessment. For example, ChatGPT applied a 24-point difference between the lowest and highest scores (74%-98%) on the same assignment. Claude's lowest and highest scores revealed a 33-point difference. Each OpenAI system provided feedback that was less likely to promote learning due to inconsistent rationale per rubric item. Educators are encouraged to exercise caution when utilizing OpenAI as a grading and feedback system. Educators should rely on the expertise of a subject matter expert to ensure accuracy and fairness in assessment practices. By understanding the limitations of OpenAI grading and feedback, educators can mitigate potential unfair and inconsistent assessments to optimize student success and learning outcomes.

Keywords:
OpenAI rubric assessments feedback grading

This work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

References:

[1]	Scharth, M. (2022). The ChatGPT chatbot is blowing people away with its writing skills. The University of Sydney. https://www.sydney.edu.au/news-opinion/news/2022/12/08/the-chatgpt-chatbot-is-blowing-people-away-with-its-writing-skil.html.

[2]	Motlagh, N. Y., Khajavi, M., Sharifi, A., & Ahmadi, M. (2023). The impact of artificial intelligence on the evolution of digital education: A comparative study of openAI text generation tools including ChatGPT, Bing Chat, Bard, and Ernie. arXiv preprint arXiv:2309.02029.

[3]	Gleason, N. (2022). ChatGPT and the rise of AI writers: How should higher education respond? Times Higher Education. https://www.timeshighereducation.com/campus/chatgpt-and-rise-ai-writers-how-should-higher-education-respond

[4]	Davis, F. D., Bagozzi, R. P. & Warshaw, P. R. (1989). User acceptance of computer technology: a comparison of two theoretical models. Management Science, 35, 982–1003.

[5]	Hsu, H. H., & Chang, Y. Y. (2013). Extended TAM model: Impacts of convenience on acceptance and use of Moodle. Online Submission, 3(4), 211–218.

[6]	Skinner, B. F. (1984). An operant analysis of problem solving. Behavioral and brain sciences, 7(4), 583-591.

[7]	Clark, S. M., Leonard, M. T., Cano, A., & Pester, B. (2018). Beyond operant theory of observer reinforcement of pain behavior. Social and Interpersonal Dynamics in Pain: We Don't Suffer Alone, 273-293.

[8]	Rawas, S. (2023). ChatGPT: Empowering lifelong learning in the digital age of higher education. Education and Information Technologies, 1–14.

[9]	Celik, I. (2023). Towards Intelligent-TPACK: An empirical study on teachers’ professional knowledge to ethically integrate artificial intelligence (AI)-based tools into education. Computers in Human Behavior, 138, 107468.

[10]	Javaid, M., Haleem, A., Singh, R. P., Khan, S., & Khan, I. H. (2023). Unlocking the opportunities through ChatGPT tool towards ameliorating the education system. BenchCouncil Transactions on Benchmarks, Standards, and Evaluations, 3(2), 100115.

[11]	Al Aziz M.M., Ahmed T., Faequa T., Jiang X., Yao Y., Mohammed N. (2021). Differentially private medical texts generation using generative neural networks. ACM Transactions on Computing for Healthcare;3(1):1–27.

[12]	Manodnya K.H., Giri A. (2022). IEEE 4th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA) IEEE; 2022. GPT-K: A GPT-based model for text generation in Kannada; pp. 534–539.

[13]	Saini, N. (2023). ChatGPT Becomes Fastest Growing App in the World, Records 100mn Users in 2 Month.

[14]	Menon, D., & Shilpa, K. (2023). “Chatting with ChatGPT”: Analyzing the factors influencing users' intention to use the open AI's ChatGPT using the UTAUT model. Heliyon, 9(11), e20962-e20962.

[15]	Mogaji E., Balakrishnan J., Nwoba A.C., Nguyen N.P. Emerging-market consumers' interactions with banking chatbots. Telematics Inf. 2021;65.

[16]	Anghelescu, A., Ciobanu, I., Munteanu, C., Anghelescu, L. A. M., & Onose, G. (2023). ChatGPT: "to be or not to be." in academic research. The human mind's analytical rigor and capacity to discriminate between AI bots' truths and hallucinations. Balneo and PRM Research Journal (Online. English Ed.), 14(Vol. 14, no. 4), 614.

[17]	Venkatesh V., Morris M.G., Davis G.B., Davis F.D. (2003). User acceptance of information technology: Toward a unified view. MIS quarterly. 425–478.

[18]	Ho, T. (2022). Moral difference between humans and robots: Paternalism and human-relative reason. AI & Society, 37(4), 1533-1543.

[19]	Li, Y., Chen, D., & Deng, X. (2024). The impact of digital educational games on student's motivation for learning: The mediating effect of learning engagement and the moderating effect of the digital environment. PloS One, 19(1), e0294350-e0294350.

[20]	Fokides E. (2018). Digital educational games and mathematics. Results of a case study in primary school settings. Education and Information Technologies, 23(2), 851–867.

[21]	Kirschner, P. A. (2002). Cognitive load theory: Implications of cognitive load theory on learning design. Learning and Instruction, 12(1), 1-10.

[22]	Baddeley, A. (1992). Working Memory. Science, 255(5044), 556–559.

[23]	Baddeley, A. (2020). Working Memory. In Memory (pp. 71–111). Routledge.

[24]	Sweller, J., & Chandler, P. (1991). Evidence for cognitive load theory. Cognition and Instruction, 8(4), 351–362.

[25]	Sweller, J. (2020). Cognitive load theory and educational technology. Educational Technology Research and Development, pp. 68, 1–16.

[26]	Kennedy, M. J., & Romig, J. E. (2021). Cognitive load theory: An applied reintroduction for special and general educators. Teaching Exceptional Children, 4005992110482.

[27]	Swanson, H. L., Lussier, C. M., & Orosco, M. J. (2015). Cognitive strategies, Working Memory, and growth in word problem-solving in children with math difficulties. Journal of Learning Disabilities, pp. 48, 339–358.

[28]	Windschitl, M., Thompson, J., & Braaten, M. (2020). Ambitious science teaching. Harvard Education Press.

[29]	Drew, S. V., Thomas, J. D., & Nagle, C. (2023). Rock out the rubric: Self-regulated strategy development to revise science writing. TEACHING Exceptional Children, 00400599231185846.

[30]	Jescovitch, L. N., Scott, E. E., Cerchiara, J. A., Merrill, J., Urban-Lurain, M., Doherty, J. H., & Haudek, K. C. (2021). Comparison of machine learning performance using analytic and holistic coding approaches across constructed response assessments aligned to a science learning progression. Journal of Science Education and Technology, 30(2), 150–167.

[31]	Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational research review, 2(2), 130-144.

[32]	Gundlach, H., & Dawborn-Gundlach, M. (2020). Teacher perceptions of quality criterion referenced rubrics in practice. Literacy Learning: The Middle Years, 28(3), 64-75.

[33]	Jescovitch, L. N., Scott, E. E., Cerchiara, J. A., Doherty, J. H., Wenderoth, M. P., Merrill, J. E., ... & Haudek, K. C. (2019). Deconstruction of holistic rubrics into analytic rubrics for large-scale assessments of students’ reasoning of complex science concepts. Practical Assessment, Research, and Evaluation, 24(1), 7.

[34]	Kennedy, E., & Shiel, G. (2022). Writing assessment for communities of writers: rubric validation to support formative assessment of writing in Pre-K to grade 2. Assessment in Education: Principles, Policy & Practice, 29(2), 127–149.

[35]	Weigle, S. C. (2002). Assessing writing. Cambridge University Press.

[36]	Babin, E., & Harrison, K. (1999). Contemporary composition studies a guide to theorist and terms. Portsmouth: Greenwood Publishing.

[37]	Gunning, J. W. (2006). Budget support, conditionality, and impact evaluation. Budget Support as More Effective Aid? 295.

[38]	Crehan, K. D. (1997). A Discussion of Analytic Scoring for Writing Performance Assessments.

[39]	Cope, B., Kalantzis, M., Searsmith, D. (2021). Artificial intelligence for education: Knowledge and its assessment in AI-enabled learning ecologies, Educational Philosophy and Theory, 53:12, 1229–1245.

[40]	García Ros, R. (2011). Analysis and validation of a rubric to assess oral presentation skills in university context.

[41]	Aydin, Ö., & Karaarslan, E. (2023). Is ChatGPT leading generative AI? What is beyond expectations? Academic Platform Journal of Engineering and Smart Systems, 11(3), 118-134.

[42]	Zhai, X., C Haudek, K., Shi, L., H Nehm, R., & Urban‐Lurain, M. (2020). From substitution to redefinition: A framework of machine learning‐based science assessment. Journal of Research in Science Teaching, 57(9), 1430-1459.

[43]	Kim, J., Merrill Jr, K., Xu, K., & Sellnow, D. D. (2021). I like my relational machine teacher: An AI instructor’s communication styles and social presence in online education. International Journal of Human–Computer Interaction, 37(18), 1760-1770.

[44]	Kim, J., Merrill, K., Kun, X., & Sellnow, D. D. (2022). Embracing AI-based education: Perceived social presence of human teachers and expectations about machine teachers in online education. Human-Machine Communication, 4, 169–184.

[45]	Gerard, L. F., & Linn, M. C. (2016). Using automated scores of student essays to support teacher guidance in classroom inquiry. Journal of Science Teacher Education, 27(1), 111-129.

[46]	Chowdhary, K. (2020). Natural Language Processing. In: Fundamentals of Artificial Intelligence. Springer, New Delhi.

[47]	Korteling, J.E.; van de Boer-Visschedijk, G.C.; Blankendaal, R.A.M.; Boonekamp, R.C.; Eikelboom, A.R. (2021). Human-versus Artificial Intelligence. Sec. AI for Human Learning and Behavior Change (4).

[48]	Chen, J. (2021). Refining the teacher emotion model: Evidence from a review of literature published between 1985 and 2019. Cambridge Journal of Education, 51(3), 327–357.

[49]	Schneider, C., & Boyer, M. (2020). Design and implementation for automated scoring systems. In Handbook of Automated Scoring (pp. 217-240). Chapman and Hall/CRC.

[50]	Ragupathi, K., & Lee, A. (2020). Beyond fairness and consistency in grading: The role of rubrics in higher education. Diversity and inclusion in global higher education: Lessons from across Asia, 73–95.

[51]	Chassignol, M., Khoroshavin, A., Klimova, A., & Bilyatdinova, A. (2018). Artificial Intelligence trends in education: a narrative overview. Procedia Computer Science, 136, 16-24.

[52]	Sudheesh, R., Mujahid, M., Rustam, F., Mallampati, B., Chunduri, V., de la Torre Díez, I., & Ashraf, I. (2023). Bidirectional encoder representations from transformers and deep learning model for analyzing smartphone-related tweets. PeerJ Computer Science, 9, e1432.

[53]	Goel, A. K., & Joyner, D. A. (2017). Using AI to teach AI: Lessons from an online AI class. Ai Magazine, 38(2), 48-59.

[54]	Braun, D., Rogetzer, P., Stoica, E., & Kurzhals, H. (2023). Students' Perspective on AI-Supported Assessment of Open-Ended Questions in Higher Education. In CSEDU (2)(pp. 73-79).

[55]	Saplacan, D.; Herstad, J.; Pajalic, Z. (2018). Feedback from digital systems used in higher education: An inquiry into triggered emotions two universal design-oriented solutions for a better user experience. In Transforming Our World through Design, Diversity, and Education: Proceedings of Universal Design and Higher Education in Transformation Congress (256) 421–430.