Paper Summary
Share...

Direct link:

Predicting Medical Exam Question Difficulty: Embedding, Machine Learning, and Feature Impact

Fri, April 10, 3:45 to 5:15pm PDT (3:45 to 5:15pm PDT), InterContinental Los Angeles Downtown, Floor: 6th Floor, Broadway

Abstract

Background: Accurate item difficulty prediction is crucial for high-stakes medical examinations. However, prior research shows inconsistent findings regarding the most effective modeling components.
Objective: This study systematically investigates the impact of domain-specific embeddings, input content granularity, and choice of machine learning regressor on predictive performance.
Method: This research analyzed 2,815 medical multiple-choice questions, comparing various embedding models, feature combinations, and machine learning regressors (e.g., XGBoost).
Results: XGBoost outperformed other models. A domain-specific embedding model consistently improved accuracy. Using only the item stem and correct answer provided the optimal balance between predictive accuracy and model parsimony.
Conclusion: These findings offer significant, actionable insights for developing data-driven measurement practices in medical education.

Authors