Paper Summary
Share...

Direct link:

High School Students as Designers of Generative Language Models

Wed, April 8, 3:45 to 5:15pm PDT (3:45 to 5:15pm PDT), JW Marriott Los Angeles L.A. LIVE, Floor: 3rd Floor, Atrium II

Abstract

Objectives: As generative language models (GLMs) have gained popularity, high school students are increasingly encountering them in their everyday lives. While most research has focused on examining youth as productive users of GLM-powered systems, far fewer efforts have focused on how to engage high school students as designers of these models to foster better understandings of how these systems work. We explore the data practices and ethical considerations that teens engaged with when designing very small-scale GLMs, which we call babyGPTs.
Theoretical framework:
Efforts on AI/ML literacies vary in defining what being AI/ML literate means. Some emphasize the importance of preparing young people to know how to use AI/ML systems responsibly (Kalantzis & Cope, 2025; Mills et al., 2024), and others highlight that young people must understand how AI/ML systems work by learning key ideas of AI/ML so that, beyond being knowledgeable users, they can have opportunities to envision themselves as contributors to the development of AI/ML (Grover, 2024; Touretzky et al., 2023). We argue that we must think about AI/ML literacies as part of computational literacies (Kafai & Proctor 2022) and provide learners with opportunities to participate in the design of AI/ML models and “develop the skills, insights and reflexivity needed to understand digital technology and its effect on their lives and society at large, and their capacity to engage critically and curiously with the construction and deconstruction of technology” (Dindler et al., 2020).
Methods:
We conducted a five-day school workshop with 35 high school students (ages 14-15) in a school located in the Northeastern US, where they designed very small GLMs, which we call babyGPTs, using the nanoGPT framework (Karpathy, 2024). Building on research on AI/ML data practices (Olari & Romeike, 2024) and ethics (Veldhuis, 2025), we present a case study (Yin, 2018) of one group of three students in which we examine their construction process, discussions, artifacts, and interviews to address the following question: What data practices and ethical considerations did participants engage with in the construction process of their own GLMs?
Results:
The team developed functional understandings related to how the quality and quantity of the training data and the training process influenced the quality of the outputs while considering ethical issues. When developing their model, the team iteratively engaged with data practices such as defining success criteria, exploring datasets, examining data quality, preparing data, implementing a solution, and evaluating performance. At the same time, they considered ethical issues related to copyright, authorship, trustworthiness of outputs, environmental impact, algorithmic bias, and reliance on AI.
Significance:
This work contributes a case study showing how students engaged in ethical considerations in the construction of generative language models and outlines directions for future research to support young people’s engagement with data practices, ethics and societal implications in model design activities.

Authors