Paper Summary
Share...

Direct link:

Evaluating ChatGPT-4 and ChatGPT-4o: Performance Insights From NAEP Mathematics Problem Solving

Sat, April 26, 1:30 to 3:00pm MDT (1:30 to 3:00pm MDT), The Colorado Convention Center, Floor: Ballroom Level, Four Seasons Ballroom 2-3

Abstract

This study assesses the capabilities of ChatGPT-4 and ChatGPT-4o in solving mathematics problems from the National Assessment of Educational Progress across grades 4, 8, and 12. Results indicate that ChatGPT-4o slightly outperform ChatGPT-4 and both models generally surpass U.S. students' performance. However, both models perform worse on geometry and measurement than on algebra and face more difficulties with high-difficulty mathematics items. This investigation highlights the strengths and limitations of AI as a supplementary educational tool, pinpointing areas for improvement in spatial intelligence and complex mathematical problem-solving. These findings suggest that while AI has the potential to support instruction in specific mathematical areas like algebra, there remains a need for careful integration and teacher-mediated strategies in areas where AI is less effective.

Author