Paper Summary
Share...

Direct link:

AI Model Evaluation With the Proportional Reduction in Mean Squared Error

Sat, April 13, 1:15 to 2:45pm, Pennsylvania Convention Center, Floor: Level 200, Exhibit Hall B

Abstract

Use of artificial intelligence (AI) to score responses is growing in popularity and likely to increase. Evidence of the validity of scores relies on quadratic weighted kappa (QWK) to demonstrate agreement with human ratings. QWK has several known shortcomings. The proportional reduction in mean squared error (PRMSE) estimates the prediction accuracy of the automated scoring model, with respect to prediction of the human true scores. Analysis of operational test data demonstrates QWK and PRMSE can lead to different conclusions about AI scores. Extensive simulation study results show PRMSE is robust to many of the factors to which QWK is sensitive and should replace or complement QWK for evaluating AI scores. We investigate sample size requirements for accurate estimation of PRMSE

Authors