Individual Submission Summary
Share...

Direct link:

Does Language Matter? AI Responses to Ageism Scales in English and Chinese

Mon, August 10, 4:00 to 5:30pm, TBA

Abstract

This study explores whether generative artificial intelligence (AI) models, trained on textual data that are inherently cultural, exhibit systematically different responses to established measures of ageism when used in different human languages. Building on research showing that generative AI systems can reproduce social biases and exhibit language-linked cultural tendencies, we compare the outputs from GPT and DeepSeek to English- and Chinese-language prompts across multiple validated ageism scales, capturing ageism scales, measuring ambivalent ageism, work-related age-based stereotypes, positive/negative attitudes toward old people, and filial responsibility expectation. For each item-language-model combination, we conducted 100 independent API calls. Preliminary results indicate that merely switching prompt language, without adding cultural cues, produces systematic differences. Under Chinese prompts, GPT-5-mini generated higher benevolent and hostile ageism scores on the ambivalent ageism scale relative to English prompts. On work-related stereotypes, a substantial language difference emerged for competence. Chinese prompts yielded substantially lower competence evaluations of older workers, while differences in adaptability and warmth were small. Chinese prompts produced slightly higher positive and slightly lower negative attitudes. Filial responsibility expectations were also higher under Chinese prompts. DeepSeek shows similar patterns, though effect sizes vary. Ongoing work will expand the analysis to additional LLMs and apply multivariate modeling to more rigorously estimate language effects.

Authors