Search
Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Session Type
Personal Schedule
Sign In
Access for All
Exhibit Hall
Hotels
WiFi
Search Tips
Annual Meeting App
Onsite Guide
The recent advancement of AI such as large language models (LLMs) introduce new opportunities to improve decision-making processes. Before practically and legally integrating LLMs into child welfare systems, it is crucial to evaluate whether their generated content on child maltreatment accurately reflects the complexities and realities of the issue. Given that child maltreatment often intersects with poverty and racial inequities, understanding and addressing the biases LLMs may introduce is essential to ensure these tools do not unintentionally exacerbate harm or perpetuate systemic injustices. Moreover, the issue of homogeneity in the content generated by large language models (LLMs) has become increasingly evident. We developed a set of prompts designed to generate hypothetical child maltreatment cases by LLMs, including GPT3.5, GPT4, and Llama3. We extracted key variables from those cases and analyzed representational biases. These variables were then compared with national datasets, serving as benchmarks, with statistical analyses such as chi-square tests employed to quantify potential discrepancies. To assess the level of homogeneity in the outputs, we constructed a separate corpus of news articles as a proxy of human-generated content on child maltreatment and applied sentence embeddings and cosine similarity metrics to compare the diversity of LLM-generated content with that of human-generated content. The results show that LLMs tend to over-represent certain demographics and maltreatment forms. Notably, while most child maltreatment cases involve neglect, LLMs often depict more severe forms such as physical and psychological abuse, potentially skewing public and professional perceptions of child maltreatment. The narratives are highly homogeneous in LLM-generated content, even with the change of prompts. Thus, LLMs represent a biased but homogeneous narrative of child maltreatment. Furthermore, LLMs have learnt the association between poverty and child maltreatment, and consistently predict higher risk scores for children from low-income families, reinforcing associations between poverty and maltreatment. The findings underscore the importance of addressing both representational and allocative biases in AI applications for public sectors like child maltreatment.