Individual Submission Summary
Share...

Direct link:

A Machine Learning and Life-Course Approach to Protecting Educational Trajectories.

Sun, March 23, 8:00 to 9:15am, Virtual Rooms, Virtual Room #106

Proposal

Turning points in students' educational trajectories are crucial moments that can significantly impact their academic and life paths. Various psychological, social, and educational factors shape these transitions. Research shows that students' aspirations and engagement are vital during these transitions (Nagin, 2003). Some of these turning points are chronic absenteeism, grade retention, and school dropout, which significantly affect students' academic and socio-emotional outcomes (Wood et al., 2011). These issues are interconnected, with chronic absenteeism often leading to school dropout and grade retention, creating a cascade of adverse effects that can persist into adulthood (Allison & Attisha, 2019).
Chronic absenteeism disrupts individual student learning and the learning environment for peers, leading to an overall decline in academic achievement (Gottfried, 2015). Studies have also shown that absenteeism in the early grades predicts later academic difficulties, including lower math and reading scores (MacNaughton et al., 2017).
Grade retention makes students repeat a grade when they do not achieve a minimum performance according to specific academic or psychosocial standards. Retained students often experience stigma and decreased self-esteem, which can further disengage them from school (Goos et al., 2021), increase the likelihood of suicide (Castellví et al., 2020) and antisocial behaviors (Valenzuela et al., 2019). Students often consider this event one of the most stressful in school (Hu & Hannum, 2020). According to the literature, grade retention is the most significant factor in school dropout (Valbuena et al., 2021).
School dropout is a multidimensional phenomenon (Fortin et al., 2012), commonly associated with school-related, environmental, and individual factors, with grade repetition playing a significant role (López & Álvarez, 2020). The evidence on school dropout is consistently negative, highlighting its detrimental effects on economic stability and health (Rodríguez et al., 2023), as well as its more extreme association with the school-to-prison pipeline (Hemez et al., 2019).
In the long term, the effects of chronic absenteeism, grade retention, and school dropout extend beyond academics (Islam & Shapla, 2021), affecting both individual life outcomes and broader societal health and economic stability (Allison & Attisha, 2019) and causing irreparable damage to students through more extended school trajectories (Cockx et al., 2019), reduced years to devote to higher education, and delayed entry into the labor market (Goos et al., 2021).
A life course model is longitudinal, multilevel, and focused on developmental outcomes, which could be the essential precursors to a predetermined event (Audas & Williams, 2001). Life course research aims to broaden inquiry into the effects of life’s turning points by analyzing transitions in the context of an individual’s developmental course (Nagin et al., 2003). Based on this analytical framework, our main objective is to determine the probability of suffering a turning point based on students' previous trajectories. That risk estimation for school-aged students allows early intervention and identifying territories that require further attention and resources.

Since we need to identify at-risk students, we will use a prediction policy problem approach (Kleinberg et al., 2015), using big data and machine learning techniques instead of a traditional causal inference analysis. This will give us an effective tool to predict turning points that can shape young adult life.

Data and Methods
This study uses data from the Student General Information System. These datasets contain information on all students' academic records (1st to 12th grade) and demographic information (e.g. gender, special needs, Indigenous origin) of students in Chilean schools available from 2004 to the present.
From this data source, using the methodology proposed by Rodriguez et al. (2023), we generated a dataset summarising the individual educational trajectory of each student. Each trajectory summarises all the events that happened to each student during his or her school career, available from administrative data, such as grades, repetitions, non-attendance, school transfers and dropouts. We considered all students enrolled in child and youth education in 2023 (N = 2,927,653) and described their trajectories in a set of 73 variables.
Any event that threatens on-time completion or leads to dropping out of school is considered a pathway disruption event. In particular, we are interested in estimating the probability of a student being chronically absent, repeating a grade, or dropping out of school to create a risk of trajectory disruption index (RTDI). To achieve this goal, we used predictive machine learning techniques based on gradient boosting forest algorithms (specifically, Light Gradient Boosting Machine) (Chen & Guestrin, 2016; Ke et al., 2017; Prokhorenkova et al., 2019) to train a model that predicts the probability (i.e., the RTDI) that any student will experience any of the three events described.
Probability calibration corrections were also applied to provide a comparative risk measure between students and a more actionable and understandable interpretation. This ensures an appropriate probabilistic understanding of risk and maximizes the comparative capacity of the RTDI.
Finally, we geo-referenced the students' addresses to create maps to visualize the most at-risk students' locations.
The direct application of RTDI is to identify those territories where students have a higher risk of interrupted trajectories, either from the point of view of the schools where they study or the territories where they live.

Results
The RTDI shows that the risk in the trajectories is different for national and foreign students (Figure 1). If we now look at the percentage of students with an RTDI of more than 50% (high risk), we find significant differences at the territorial level (Figure 2). The lower percentage of high-risk students is found in the Ñuble region (8%) and the higher in Atacama (23.4%).

[FIGURE 1: RTDI index decomposition by nationality]

These differences replicate at the district and educational level. For instance, districts have differences even Ñuble (the region with the lowest high-risk proportion of students, Figure 3).

[FIGURE 2: Prevalence of high-risk students per region]

[FIGURE 3: Difference in prevalence of high-risk students between districts]

Finally, all these reports and data were consolidated into a dashboard (Figure 4) to provide a comprehensive picture of the heterogeneity of the territories in terms of risk and the challenges authorities face in protecting pupils from disruption.

[FIGURE 4: Integrate dashboard showing RTDI per territory]

Authors