Search
Browse By Day
Browse By Time
Browse By Person
Browse By Policy Area
Browse By Session Type
Browse By Keyword
Program Calendar
Personal Schedule
Sign In
Search Tips
Introduction/Background
High school transcripts are increasingly central to admissions and accountability systems, particularly in the context of rising test-optional admissions. Yet the core metric—grade point average (GPA)—is vulnerable to serious selection bias, as students choose courses of varying difficulty based on their academic ability. This selection mechanism means that observed GPA conflates both performance and course-taking decisions: a high GPA may reflect strong ability or strategic enrollment in easier classes. This paper reconceptualizes the transcript as an item response matrix and proposes a solution from psychometrics—using item response theory (IRT) to recover a latent measure of “true” transcript strength. In this framework, each course functions as an item and each grade as an ordinal response, enabling joint estimation of student ability (θ) and course difficulty. This approach corrects for selection bias and uncovers hidden dimensions of student achievement.
Research Questions
We address three research questions:
(1) Under what conditions is IRT robust to nonrandom course selection (i.e., when are estimates of θ unbiased)?
(2) How do estimated θ scores differ from observed GPA, and what do they reveal about subgroup inequality?
(3) Do θ scores predict college access and selectivity more effectively than GPA or SAT scores?
Methods
We apply the Partial Credit Model (PCM) to administrative transcript data from five Delaware student cohorts (N ≈ 43,000), treating courses as items and letter grades as ordinal responses. We show analytically that IRT estimation yields unbiased θ scores if two conditions are met: (1) missingness is conditionally independent of unobserved grades given θ (i.e., MAR given θ), and (2) the bipartite student–course network is connected. We test these assumptions via simulation, varying the extent of ability-based course selection to identify when GPA becomes biased and when IRT recovers true ability. Empirically, we compare θ and GPA across demographic subgroups, assess divergences between θ and SAT scores, and evaluate predictive validity for postsecondary outcomes using logistic regression and model fit comparisons.
Results
RQ1: Simulations confirm that θ estimates remain unbiased under realistic patterns of course-taking, while observed GPA becomes increasingly biased as selection intensifies.
RQ2: θ scores break GPA ties, distinguishing students with similar GPAs but different course rigor. They also capture discrepancies between SAT scores and transcript performance. Subgroup analyses show that racial/ethnic gaps in θ are about 30% smaller than in SAT scores, while gender gaps are larger in θ than in SAT.
RQ3: θ scores outperform GPA and SAT in predicting college selectivity, particularly for students from underrepresented groups.
Conclusion/Implications
This study bridges psychometrics and education policy by applying IRT to transcript data to estimate a latent “true GPA.” The method offers a scalable, theoretically grounded alternative to observed GPA, improving accuracy in student comparisons and mitigating selection bias. By recovering θ despite nonrandom missingness, this approach has the potential to inform more equitable admissions, scholarship decisions, and accountability metrics. More broadly, it provides a formal framework for addressing selection-induced missing data in any context where performance is entangled with opportunity.