APPAM Fall Research Conference: Mitigating Misinformation in Health Policy Data Extraction with Large Language Models

Navigation and Settings Menu
Personal Schedule
Sign In

Information Menu
Search Tips

Back Home

Refresh: Off View Personal Schedule

Individual Submission Summary

Share...

Direct link:

Mitigating Misinformation in Health Policy Data Extraction with Large Language Models

In Event: AI, (Dis)information, and Public Accountability

Thursday, November 13, 1:45 to 3:15pm, Property: Grand Hyatt Seattle, Floor: 1st Floor/Lobby Level, Room: Princess 1

Abstract

The growing adoption of Large Language Models (LLMs) in automating policy data extraction offers significant opportunities for scaling policy research and decision-making. However, existing work has overlooked the health policy domain, where LLMs are particularly prone to misinformation, such as hallucinations, misclassifications, and omissions, due to the diversity of policy types and the inconsistency and complexity of policy data formats collected from various sources. These errors can severely impact the quality and trustworthiness of extracted information. To address these limitations, we propose a role-based LLM framework that can help mitigate misinformation in health policy information extraction. By assigning specialized roles, i.e., Policy Analyst, and employing role-specific prompts informed by structured domain knowledge, our framework mimics expert analysis workflows to reduce misinformation and enhance interpretability. For each health policy, the LLM is given both the policy title and an accompanying summary to assist in information extraction. The framework is designed to extract essential attributes from health policies, including the state of implementation, the year, and the policy type. We evaluate our approach using data from the Healthy Food Policy Project (HFPP) Database, which includes 608 healthy food policies formally adopted by municipal governments across 51 U.S. states and territories, covering seven policy types. As a baseline, we first apply an LLM without any role-specific prompting. Preliminary results show high extraction accuracy for factual fields such as the state of implementation (99.18%) and the year (99.34%), while more complex categorical information, i.e., policy type based on legal mechanism, yields lower accuracy (66.45%). When recategorizing policies into three broader types based on legal authority, the accuracy of policy type extraction improves to 84.87%. By implementing our proposed role-related framework, extraction performance further improves with the accuracy for the original seven policy types increasing by 13.81% and the three-type categorization increasing by 4.44%. The extraction accuracy for both the state and the year remains above 99%, consistent with the baseline. These results demonstrate that structured role assignment reduces the misinformation and enhances the reliability in health policy information extraction. Our empirical findings show that the proposed framework can achieve both high extraction accuracy and factual consistency. By enhancing the reliability of automated health policy analysis, this work advances more transparent and trustworthy applications of LLMs in the public health domain, which can ultimately inform evidence-based policymaking and increase democratic accountability.

Mitigating Misinformation in Health Policy Data Extraction with Large Language Models

Abstract

Authors