Paper Summary
Share...

Direct link:

Leveraging AI to Proactively Identify Item Bias in Medical Certification Exams

Thu, April 24, 3:35 to 5:05pm MDT (3:35 to 5:05pm MDT), The Colorado Convention Center, Floor: Ballroom Level, Four Seasons Ballroom 2-3

Abstract

Objective:
Differential Item Functioning (DIF) analysis has been a cornerstone in identifying potentially biased test items in medical certification exams for decades[1]. However, this approach is inherently reactive, as it requires administering potentially biased items before they can be detected and removed. In recent years, the imperative for diversity, equity, and inclusion (DEI) in medical education and certification has grown increasingly urgent [2], necessitating more proactive approaches to mitigating bias in assessment.
This proposal presents a novel methodology that harnesses the power of artificial intelligence to preemptively identify potentially biased language patterns in exam items before they are administered. By training an AI copilot system on a decade's worth of DIF-identified biased questions [3], we aim to create a predictive tool capable of flagging new items that exhibit similar linguistic patterns associated with bias.
Procedure:
1. Data Collection and Preprocessing: We will compile a comprehensive dataset of exam items identified as biased through DIF analysis over the past ten years from participating medical certification boards with the problematic questions’ stems, options, and rationales from DIF panelists.
2. AI Model Training: We will provide this dataset in addition to the Microsoft AI copilot (based on large language models GPT-4o) and prompt this copilot[4] to learn to recognize subtle linguistic patterns and structures associated with item bias.
3. Bias Detection Tool Development: This Microsoft AI copilot will have a user-interactive window where item writers and editors can input new exam questions for real-time bias detection. This bias detection tool will assess each input item and provide feedback on potential bias indicators similar to the pattern in the training data.
4. Validation and Refinement: The copilot’s predictions will be cross-validated against traditional DIF analyses on pilot items to assess their accuracy using the F1 score and refine its prompts iteratively.
Expected Results and Implications:
This proactive approach is expected to significantly reduce the number of biased items that reach the exam administration stage. By identifying potentially problematic language patterns early in the item development process, certification boards can address issues before they impact test-takers, thereby enhancing the fairness and equity of their assessments.
Implementing this AI-driven bias detection system aligns closely with the broader DEI goals of medical certification boards. By minimizing the inclusion of biased items, we can help ensure that certification exams are equitable for all candidates, regardless of their demographic background. Beyond its immediate application in bias detection, this project opens avenues for further research into the linguistics of bias in assessment. It also sets the stage for developing more sophisticated AI tools to support various exam development and administration aspects.
This innovative approach represents a significant step in enhancing medical certification exams' fairness and equity. Combining historical DIF data with cutting-edge AI technology can create a powerful tool for proactively identifying and mitigating bias in assessment items. This symposium will provide attendees with insights into the methodology, potential applications, and implications of this novel approach, fostering discussion on its role in advancing DEI initiatives in medical education and certification.

Author