Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
Bluesky
Threads
X (Twitter)
YouTube
The objective structured clinical examination (OSCE) is a pivotal assessment tool in medical education, evaluating a wide range of clinical skills through standardized scenarios. Ensuring the fairness and validity of OSCEs is critical, particularly when these assessments involve hand-washing and glove-wearing, identified by AI models, and small sample sizes. This study proposes an innovative approach to examine and enhance measurement invariance in OSCEs using AI models, specifically within hand-washing and glove-wearing action detection.
This project integrates AI-based action detection to evaluate hand-washing and glove-wearing procedures in medical students. The AI model utilizes deep learning techniques to accurately detect and assess these critical actions, ensuring compliance with clinical standards. Given the small sample size and the multidimensional nature of OSCE items, traditional psychometric methods may struggle to ensure measurement invariance across diverse demographic groups. Therefore, we employ a combination of advanced statistical and AI techniques tailored for small datasets to ensure the reliability and fairness of our assessments.
Methodology
Measurement Invariance and Fairness Analysis in AI Models
1.Differential Item Functioning (DIF) Analysis: Use logistic regression to analyze whether the probabilities of correctly identifying hand-washing and glove-wearing actions differ significantly between demographic groups (e.g., gender, ethnicity), controlling for overall skill level. This helps identify potential biases in how the AI model assesses these items.
2.Fairness Metrics:(1) Demographic Parity: Evaluate whether different demographic groups have similar probabilities of being correctly identified as performing the hand-washing and glove-wearing actions. This can be assessed by comparing the proportion of correct identifications across groups. (2) Equalized Odds: Assess whether the true and false positive rates are similar across demographic groups for both hand-washing and glove-wearing actions.
3.Regularization and Resampling Techniques: (1) Regularization: Implement L1 (Lasso) and L2 (Ridge) regularization techniques during the training of the AI model to prevent overfitting and ensure more stable and generalizable predictions, especially with a small sample size. (2) Bootstrap Aggregating (Bagging): Use bootstrapping to create multiple resampled datasets and train several models. This ensemble approach helps to improve the robustness and fairness of the final AI model.
4.Adversarial Training: Train the AI model with adversarial networks to minimize the dependency of model predictions on demographic attributes. This involves an adversary network attempting to predict the demographic group of the student, while the main model is trained to prevent this prediction, ensuring invariant feature learning.
5.Invariant Risk Minimization (IRM): Apply IRM to train the AI model to find invariant representations across different demographic groups, ensuring the model’s performance is consistent and fair regardless of group membership.
This study demonstrates a comprehensive framework for ensuring measurement invariance in OSCEs using AI models tailored for small sample sizes. By integrating advanced psychometric techniques and fairness-aware AI methodologies, we aim to enhance the fairness and validity of medical education assessments, ensuring that all students are evaluated equitably.