Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
Bluesky
Threads
X (Twitter)
YouTube
As generative AI enters assessment development, many anticipate gains in efficiency and innovation—but questions of fairness remain. This session examines WestEd’s work using customized and fine-tuned large language models to create assessment passages, revealing both progress and persistent challenges. A comparative study of baseline and fine-tuned GPT models shows that light customization alone fails to address readability, grade level, and cultural relevance. Fine-tuning, paired with human review and rubric-based evaluation, yields measurable improvement, yet limitations persist. The session probes whose voices AI preserves, how engagement is defined, and why human judgment remains essential. Attendees will gain practical insights for implementing, evaluating, and governing AI responsibly within assessment design and policy decision-making. (120 words)