CIES 2025 “Envisioning Education in a Digital Society”: Self-directed (or Gamified) Assessment using Speech Processing Systems in South Africa

Information Menu
Search Tips

Back Home

Refresh: Off View Personal Schedule

Individual Submission Summary

Share...

Direct link:

Self-directed (or Gamified) Assessment using Speech Processing Systems in South Africa

In Event: Leveraging Voice AI (Artificial Intelligence) to assess early grade reading skills and support learning

Mon, March 24, 9:45 to 11:00am, Palmer House, Floor: 3rd Floor, Salon 6

Proposal

Most South African children struggle to learn to read. According to the latest PIRLS (2021), 81% of the country’s Grade 4 learners are unable to read for meaning in any language, with learning outcomes trending down in recent years. These struggles are felt most acutely by learners tested in African languages and those from poorer schools.

Like many countries, South Africa relies mostly on observational assessment in its early grades. However, these assessments may not accurately discriminate between children within a classroom or skills within a child, thus limiting teachers’ ability to effectively target their support. Our solution is to introduce child self-directed (or gamified) assessments that are both more accurate and more equitable.
To this end, we recently collaborated with education researchers from Stellenbosch University, New York University, and Wordworks on the development of a new assessment game that measures kindergartners’ oral narrative skills (ONS). Research in the U.S. has shown ONS to be good predictors of later literacy skills in communities with strong oral narrative traditions, such as African-American communities. Given the strong storytelling traditions among South Africa’s African communities, we anticipated ONS to be important skills to assess locally.

The first step of our game development process was to identify the specific characteristics of ONS in South Africa. We then designed a tablet-based game centered on a series of short animated stories accompanied by comprehension and retelling tasks that captured children’s spoken responses. These tasks examined whether children’s narratives included elements such as temporal coherence, causality, and the presence of ‘goal, attempt, and outcome’ sequences.

To analyze our captured child speech data, we partnered with machine learning experts from Stellenbosch University who specialize in developing speech processing systems for low-resourced languages.

Their first step was to develop an automatic speech recognition (ASR) system for our targeted languages by updating the Whisper foundation model. Specifically, they trained the model on a sequence of generalized adult speech data, followed by adult speech data resembling oral narratives, followed by child speech data consisting of actual oral narratives. Notably, with as little as five minutes of labeled child ONS data, they found that this sequence was able to produce a coarse but usable ASR system.

The Stellenbosch team then sent the outputs of their adapted ASR system in two directions. First, they used the ASR-generated transcripts to train simple linear models to identify specific keywords indicative of the ONS metrics described above. Second, they fed the ASR-generated transcripts into a large language model using an in-context-learning strategy whereby the language model was shown previous stories with their associated ONS scores and then asked to predict the scores for an unseen story.

In this panel session, we will discuss how, despite the promising capabilities of speech processing and analysis strategies, there has been relatively limited exposure to speech data from South Africa's indigenous languages in models such as Whisper. We will highlight the need for targeted data collection of children’s voice data and optimal ways of doing this to better support these languages.

Author

Luke Crowley, Trackosaurus