AERA Annual Meeting: Evaluating ChatGPT-4 and ChatGPT-4o: Performance Insights From NAEP Mathematics Problem Solving

Information Menu
Search Tips

Navigation and Settings Menu
Change Preferences / Time Zone
Sign In

Back Home

Refresh: Off

Paper Summary

Share...

Direct link:

Evaluating ChatGPT-4 and ChatGPT-4o: Performance Insights From NAEP Mathematics Problem Solving

In Event: AERA Roundtable Session Saturday 1:30 pm Four Seasons Ballroom 2-3
In Roundtable Session: Advanced Methodologies for NAEP Operations and Analyses (Table 15)

Sat, April 26, 1:30 to 3:00pm MDT (1:30 to 3:00pm MDT), The Colorado Convention Center, Floor: Ballroom Level, Four Seasons Ballroom 2-3

Abstract

This study assesses the capabilities of ChatGPT-4 and ChatGPT-4o in solving mathematics problems from the National Assessment of Educational Progress across grades 4, 8, and 12. Results indicate that ChatGPT-4o slightly outperform ChatGPT-4 and both models generally surpass U.S. students' performance. However, both models perform worse on geometry and measurement than on algebra and face more difficulties with high-difficulty mathematics items. This investigation highlights the strengths and limitations of AI as a supplementary educational tool, pinpointing areas for improvement in spatial intelligence and complex mathematical problem-solving. These findings suggest that while AI has the potential to support instruction in specific mathematical areas like algebra, there remains a need for careful integration and teacher-mediated strategies in areas where AI is less effective.

Author

Xin Wei, Digital Promise