Individual Submission Summary
Share...

Direct link:

The Last Mile in Remote Sensing Poverty Prediction

Saturday, November 15, 10:15 to 11:45am, Property: Hyatt Regency Seattle, Floor: 7th Floor, Room: 709 - Stillaguamish

Abstract

Introduction Innovative advancements in artificial intelligence (AI) and machine learning (ML) have improved the ability to predict poverty in small areas, using remote sensing data. These technologies have proven critical in data-sparse settings. However, despite acceptable average performance, the model consistently underperforms in certain areas. However, when making high-stakes decisions, such as subnational allocation of social transfers, near-perfect predictions across all regions are essential. We refer to this issue as the Last Mile problem in poverty prediction. To address this challenge, our study aims to develop a novel pipeline for improving remote-sensing and AI/ML-based poverty predictions. Using the case of Africa, this research underscores the importance of incorporating non-visual features, leveraging various model architectures, and focusing on hard-to-predict village clusters. 



Methods Three leading vision architectures (VGG-16, ResNet 101, and Vision Transformer-B/16) are compared within the baseline Nighttime lights(NTL) transfer learning approach. In our diagnostic stage, key regions of satellite imagery, which contribute to poverty predictions, are highlighted through explainability techniques (Integrated Gradients and Guided Gram-CAM). These heatmaps were then qualitatively assessed by human reviewers for intuitiveness. We also develop multimodal models with non-visual features (social media, points of interest, network connectivity) and identify important features through SHAP values. In the training and testing stage, we develop ten combinations using the three architectures: pre-trained weights, non-image features, and stack ensembling, which combines multiple base learners. We use out-of-sample evaluation metrics and count the number of problem clusters whose prediction errors are off by more than one level of the wealth index. In the final inspection stage, human coders qualitatively analyze the satellite imagery of remaining problem clusters to identify themes.  


 


Results Our diagnostic analysis shows that vision models tend to focus on features brightly lit at night, such as roads and buildings, and tend to fail in areas where NTL poorly reflects wealth. We also find that non-visual features, such as Twitter activity, distance from residential roads, and internet speed, can enhance predictive performance. We thus incorporate these features into our models, leveraging various ML algorithms. Among ten trained architectures, the Stack Ensemble model, which combines pre-trained weights, multimodal data, and NTL transfer learning, achieves the best performance with high accuracy (R² = 84.4%) and low error variance. Our top-performing model decreases the number of "last-mile clusters" by one-third. Nonetheless, seven villages with persistent prediction inaccuracies highlight challenges related to spatial inequality in small areas and the presence of rare features, such as irrigation circles, that the models have not learned. 


 



Conclusions By integrating complementary data sources and ensemble stacking, our pipeline achieves high prediction accuracy and improves performance in challenging-to-predict areas. In doing so, our model reduces the misclassification of wealthy villages as poor, and vice versa, demonstrating the potential to improve aid allocation.  Qualitative analysis of difficult-to-predict regions highlights the challenges of modeling wealth index due to inherent aleatoric uncertainty. Future work should include a principled approach to cluster segmentation and community inputs to address spatial inequality and detect uncommon features.  

Author