Individual Submission Summary
Share...

Direct link:

Using machine learning to improve education intervention targeting: Predicting non-enrollment of girls in India

Thu, April 29, 1:45 to 3:15pm PDT (1:45 to 3:15pm PDT), Zoom Room, 125

Proposal

Machine learning algorithms have been heralded as the latest tool to help policymakers target programs and reduce poverty. In international education, one of the most promising applications of these tools is to improve the cost-effectiveness of programs that rely on scarce funding and resources. We demonstrate a high-value policy application of prediction using public data and machine learning algorithms.

In India, before the COVID-19 pandemic, over 4 million girls were out of school, and girls were more likely than boys to be non-enrolled. Given the dynamics of past health and economic crises in India and elsewhere, girls education will likely be disproportionately affected by COVID-19, further worsening gender inequities in education outcomes. To address these inequities, Educate Girls, a large education NGO in India, works with families to overcome obstacles to girl enrollment and provides remedial tutoring to help newly enrolled students catch up to their peers. Educate Girls seeks to reach as many non-enrolled girls as possible with their programming. However, they do not have the resources to expand to all of India’s villages.

To help Educate Girls determine where to expand in order to reach the most out-of-school girls, we built an ensemble of machine learning algorithms using Educate Girls’ existing administrative data and publicly-available government census and educational data. We use these algorithms to predict the number of out-of-school girls in potential expansion villages across four states. We assess the performance of these algorithms by defining a Gini-like metric that quantifies the returns to prediction compared to naïve targeting. We also assess the portability of predictions across new regions and populations. Although the algorithms do not perfectly predict the number of non-enrolled girls in new villages, our predictions substantially improve on predictions generated from non-ML methods. Using our predictions, Educate Girls can reach 50% to 100% more non-enrolled girls at approximately the same cost as current programming. We show that various logistical constraints on expansion, such as fixed costs to expanding to new regions, imposes penalties on the model but does not eliminate the benefits of prediction. We discuss practical lessons for working with implementers to operationalize and calibrate predictions as new information on the performance of the algorithm becomes available.

While ML-based targeting has improved the cost-effectiveness of programs across a range of applications, a key concern is that these algorithms are more likely to exclude vulnerable groups due to biases in historical data. We assess the performance of our algorithm on vulnerable groups, including marginalized castes, religious groups, and students with disabilities. Although our algorithm does not systematically exclude individuals from these groups, we describe updates that can be made to ML-algorithms to further increase inclusion of priority groups if desired. We conclude with best practices for other researchers who use machine learning to improve targeting in development programs.

Author