Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
Bluesky
Threads
X (Twitter)
YouTube
Background: Accurate item difficulty prediction is crucial for high-stakes medical examinations. However, prior research shows inconsistent findings regarding the most effective modeling components.
Objective: This study systematically investigates the impact of domain-specific embeddings, input content granularity, and choice of machine learning regressor on predictive performance.
Method: This research analyzed 2,815 medical multiple-choice questions, comparing various embedding models, feature combinations, and machine learning regressors (e.g., XGBoost).
Results: XGBoost outperformed other models. A domain-specific embedding model consistently improved accuracy. Using only the item stem and correct answer provided the optimal balance between predictive accuracy and model parsimony.
Conclusion: These findings offer significant, actionable insights for developing data-driven measurement practices in medical education.