CIES 2025 “Envisioning Education in a Digital Society”: Automating Oral Reading Fluency Assessments using Artificial Intelligence (AI) for Indic Languages

Information Menu
Search Tips

Back Home

Refresh: Off View Personal Schedule

Individual Submission Summary

Share...

Direct link:

Automating Oral Reading Fluency Assessments using Artificial Intelligence (AI) for Indic Languages

In Event: Leveraging Voice AI (Artificial Intelligence) to assess early grade reading skills and support learning

Mon, March 24, 9:45 to 11:00am, Palmer House, Floor: 3rd Floor, Salon 6

Proposal

Despite having a 98.4% enrolment rate for 6-14 year olds, large-scale assessment studies in India indicate a wide gap in the achievement of minimum proficiency in foundational literacy and numeracy skills. The ASER (Annual State of Education Report) Report 2022 reveals that 57.2% of Grade 5 children cannot read a Standard II text in their local language, up from 49.7% in 2018.

Recognizing that it is crucial for children to first 'learn to read' with understanding in order to 'read to learn' any subject, MoE-India launched the NIPUN Bharat mission in 2021 with the aim of making Grade 3 children fluent in reading their local language by 2026-27. The mission explicitly recommends that fluency be taught in the local language (more than 20 in India) in the foundational years. We are a nonprofit institute aiming to support this mission by building teacher tools that automate reading assessments and diagnose the reading level of each child accurately in multiple Indic languages.

Accurate reading fluency assessments enable teachers to: (1) Pinpoint Proficiency Levels, further helping in grouping students for targeted interventions, (2) Inform Instructional Decisions, such as selecting appropriate reading materials, decodable texts and leveled practice strategies, (3) Monitor Progress for each student over time, allowing for timely adjustments in teaching strategies and interventions.

With support from the state government of Gujarat and others, we have pioneered an Oral Reading Fluency assessment tool (in Gujarati language), enabling accurate evaluation of students' reading abilities, scalable in diverse educational settings. Our model is able to accurately transcribe the voice recording of a child, and identify not just correctly read words but also incorrectly read words, extra words and missed words - these granular metrics are aggregated to calculate the correct words per minute (CWPM) and to diagnose the reading level of each student. We are now using a similar approach to build models for Hindi, Marathi, Tamil and Punjabi.

We will showcase how we harness open-source Automatic Speech Recognition (ASR) corpora to create base models for various Indic languages, then fine-tune each model using annotated student voice data which is collected from public schools or low-income community settings using a network of partners. Through the annotation and fine-tuning process we ensure that our model has higher precision than off-the-shelf models as we want to be able to identify mistakes as accurately as possible.

Additionally, we will share challenges presented by low-resource languages and our approach of using pseudo-labelled data and Low Rank Adaptation techniques (LoRA) to finetune ASR models for these languages. Lastly, we will discuss some ongoing experiments to capture phoneme level accuracy.

By leveraging this additional output along with pause times we can provide accurate diagnostics for students at various reading levels, including lexical, sub-lexical, and fluent levels. This comprehensive strategy will support teachers in tailoring their instruction to meet the diverse needs of their students, ultimately fostering improved literacy outcomes across different contexts.

Author

Manjari Sheel, Wadhwani Institute of Artificial Intelligence