Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
Bluesky
Threads
X (Twitter)
YouTube
This study examines how demographic inputs and rubric-based criteria affect the fairness of AI-generated writing scores. Using data from 652 adolescents in a randomized trial, we analyzed writing scored by ChatGPT-4o and DeepSeek-V2 under four input conditions: response-only, response+demographics, response+rubric, and response+both. A five-criteria rubric guided scoring; simulated socioeconomic and religious data allowed broader bias testing. Demographic cues led to higher scores for several groups, including transgender and LGBQ+ youth, suggesting potential leniency. Rubric-based scoring lowered scores, reflecting increased consistency. When both inputs were used, effects offset each other. ChatGPT was more input-sensitive; DeepSeek more reactive to demographics. These findings highlight the evolving nature of AI bias and underscore the need to separate scoring tools from identity-sensitive inputs for equitable assessment.