Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
Bluesky
Threads
X (Twitter)
YouTube
The current study compares the capabilities of various large language models (LLMs) on moral reasoning. For DIT-2, Claude had the highest post-conventional score (P-score) of 72. This was followed by Gemini Advanced (P-score 64) and Gemini (P-score 58). The other LLMs' performance was listed as follows: Grok (P-score 48), ChatGPT-4O (P-score 44), ChatGPT-4 (P-score 44), and ChatGPT-3.5, which had the lowest scores (P-score 18). The results indicate that LLMs can simulate high levels of moral reasoning. For the ICM Educational Leaders version, Gemini Advanced had the highest total ICM score of 0.90. It was followed by Gemini (0.86), Claude (0.86), ChatGPT-4O (0.78), ChatGPT-4 (0.78), Grok (0.61), and ChatGPT-3.5 (0.32).