
Search

Program Calendar

Browse By Day

Browse By Time

Browse By Subject Area

Browse By Session Type
Search Tips
Conference
Virtual Exhibit Hall
Location
About NTA

Personal Schedule

Sign In
This paper presents a novel approach to constructing comprehensive microdata for US tax-benefit analysis by integrating the Current Population Survey Annual Social and Economic Supplement (CPS ASEC) with the IRS Public Use File (PUF). The absence of a single dataset containing both rich demographic information and accurate administrative tax data limits the precision of policy analysis, particularly at sub-national levels. We address this gap through a two-stage framework.
First, we employ Quantile Regression Forests to impute conditional income distributions from the PUF onto CPS households, preserving complex relationships between demographics and tax variables while capturing non-linear patterns. Second, we implement dropout-regularized gradient descent to optimize a two-dimensional weight matrix that assigns separate weights to each household-geographic area pair. This approach maintains national representativeness while enabling precise local-level analysis across all 435 congressional districts.
Our methodology accounts for state-level policy variation by first constructing household-state files that incorporate differences in state income taxes and federal programs administered at the state level. We then prune this file for efficiency and optimize district-level weights within each state.
We demonstrate the framework's utility by analyzing provisions from HR1, the reconciliation bill passed by the House of Representatives in May 2025. The resulting Enhanced CPS dataset and accompanying PolicyEngine US platform--both open source--provide researchers and policymakers with unprecedented capability for open and granular distributional analysis. This work establishes a scalable template for integrating administrative and survey data to support evidence-based policy evaluation at both national and local levels.