Search
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Committee or SIG
Browse By Session Type
Browse By Keywords
Browse By Geographic Descriptor
Search Tips
Personal Schedule
Change Preferences / Time Zone
Sign In
The integration of generative AI (GenAI) into educational assessment has sparked considerable debate among educators and policymakers. As Cope et al. (2021) highlighted, AI technologies, particularly generative models like ChatGPT, have the potential to revolutionize assessment by introducing new tools and processes that challenge traditional practices. However, the introduction of AI in education has been met with mixed reactions. Some educators view this technology as a valuable resource for enhancing student learning, while others see it as a disruptive force that may compromise educational integrity (Baidoo-Anu & Owusu Ansah, 2023; Kortemeyer, 2023). This divide has led to various responses from educational institutions. For instance, the New York City Department of Education banned ChatGPT from school devices and networks to prevent its use by students and teachers (Elsen-Rooney, 2023). However, such restrictions may only be temporary, as future iterations of generative AI, such as GPT-5, are likely to exceed current limitations, making these bans less effective.
The ongoing discourse surrounding the role of GenAI in assessment seems largely driven by emotional reactions and speculative concerns rather than evidence-based arguments. To address this gap, this scoping review aims to provide a comprehensive analysis of existing literature on the use of ChatGPT in educational assessments. The goal is to identify the opportunities and challenges associated with this technology and offer evidence-based recommendations to inform teaching and learning practices in the context of AI integration. The review contributes to the growing body of literature on AI-supported assessment and provides guidelines for effectively using ChatGPT to design assessments that foster learning and equity. The following sections present the methodology, key findings, and policy implications based on the review.
Methods
This scoping review follows the framework outlined by Arksey and O'Malley (2005), which involves five key stages: (i) defining the research questions and core constructs, (ii) identifying relevant studies, (iii) selecting studies based on inclusion and exclusion criteria, (iv) charting the data, and (v) summarizing and reporting the results. The review specifically focuses on how ChatGPT is currently used in assessment practices and examines the challenges and opportunities it presents for educators.
Stage One: Defining Research Questions and Core Constructs
Two primary research questions guided the review:
How is ChatGPT being innovatively used to enhance assessment practices?
What are the main challenges of using ChatGPT as an assessment tool in classrooms?
Stage Two: Identifying Relevant Studies
A systematic search was conducted across multiple databases, including Education Source, ERIC, PSYCINFO, and Web of Science. Keywords such as "ChatGPT," "assessment," and "education" were used to find studies that explored the application of generative AI in educational assessment contexts.
Stage Three: Selecting Studies
To ensure relevance and quality, inclusion and exclusion criteria were established. Only studies published in English between 2022 and 2024, focusing on educational settings (PreK-Higher Ed) and available in full text, were included. Non-peer-reviewed and unpublished studies, as well as literature that did not specifically address ChatGPT's use in assessment, were excluded.
Stage Four: Charting the Data
The selected studies were "charted," with key findings organized into thematic categories. Information on the author(s), publication year, geographical location, and a summary of the study's findings were recorded to help synthesize the data.
Findings and Discussion
The review's findings are categorized into four main themes: (a) creating diverse assessment tasks, (b) the quality of feedback provided by ChatGPT compared to human educators, (c) the role of student-generated questions in fostering autonomous learning, and (d) the accuracy of ChatGPT in grading student submissions.
Creating Diverse Assessment Tasks
One of the key benefits of ChatGPT highlighted by the review is its ability to generate a wide range of assessment tasks that cater to different student learning needs. For example, Indran et al. (2023) demonstrated how ChatGPT could create multiple-choice questions (MCQs) tailored to medical students in Singapore, incorporating diverse patient scenarios to reflect various clinical presentations. This aligns with the modern goal of personalized and inclusive education, as ChatGPT can quickly generate tasks for different learning levels and subjects (Nisar & Aslam, 2023; Schaper, 2024). However, the review also notes that using ChatGPT for assessment requires careful prompt formulation to ensure that the tasks align with learning objectives and cognitive demands.
Quality of Feedback: ChatGPT vs. Human Educators
A significant area of comparison between ChatGPT and human educators is the quality of feedback provided. Studies by Baidoo-Anu & Owusu Ansah (2023) and Steiss et al. (2024) show that human educators generally offer more accurate, personalized, and actionable feedback. Human feedback is often more precise in identifying areas for improvement and is delivered in a supportive tone. Nevertheless, ChatGPT has proven useful in providing formative feedback, particularly in e-learning environments, where its ability to generate personalized responses can facilitate learning. For instance, Bernal's (2024) Learnix project demonstrated how integrating GPT-4 improved the interactivity and responsiveness of online learning platforms.
Student-Generated Questions and Autonomous Learning
Research also suggests that ChatGPT can foster autonomous learning by enabling students to generate their own assessment tools, such as rubrics or questions (Herft, 2023). This approach encourages self-assessment and reflection, which are critical for deep learning. In healthcare education, ChatGPT has been used to simulate patient interactions and generate clinical case studies, helping students practice essential clinical skills (Qi et al., 2024). By engaging students in question formulation, ChatGPT can encourage critical thinking and help them align their efforts with the intended learning outcomes.
Accuracy of ChatGPT in Grading Student Submissions
The accuracy of ChatGPT in grading written submissions produced mixed results. Studies like those by Fuller & Bixby (2024) found inconsistencies in ChatGPT 3.5’s grading, with notable variations in the quality and consistency of feedback. However, more recent studies using ChatGPT 4.0, such as Alnajashi (2024), reported greater accuracy and consistency in grading compared to human evaluators. These findings indicate that improvements in AI models could enhance grading reliability, though human oversight remains necessary for high-stakes assessments.