Profiling Behavioral Patterns in Interactive Computational Thinking Tasks Across Countries: Sequence Mining with Process Data of ICILS2018

Information Menu
Search Tips

Back Home

Refresh: Off View Personal Schedule

Individual Submission Summary

Share...

Direct link:

Profiling Behavioral Patterns in Interactive Computational Thinking Tasks Across Countries: Sequence Mining with Process Data of ICILS2018

In Event: Highlighted Session: Methodological Challenges with latent constructs

Mon, March 11, 8:00 to 9:30am, Hyatt Regency Miami, Floor: Terrace Level, Tuttle South

Proposal

Computational thinking and adaptive problem-solving skills in digital competence tasks consist of the abilities of individuals to use computers to collect, manage, produce, and exchange information as well as formulate solutions to problems. These have been recognized across countries as among the most important skills in the 21st century. The International Computer and Information Literacy Study (ICILS) in 2018 organized by the International Association for the Evaluation of Educational Achievement (IEA) extends the evaluation of students’ computer and information literacy (CIL) skills and introduces a novel assessment of students’ computational thinking (CT) skills, defined as the ability to recognize, analyze, and describe real-world problems so their solutions can be operationalized within programming tasks (Fraillon et al., 2019).
Eligible process data recorded during students programming problem-solving process could provide a new angle to understand how students learn, interact, and adapt their computational thinking skills to solve digital tasks in the programming environment.
The aim of this study is two-fold: given limited availability of process data in fine-grained level, we focus on testlet level, namely, to extract behavioral patterns from the whole CT unit. first, to cluster the behavioral patterns into meaningful groups, thus extract representative time allocation patterns through the 9 items within one unit, and second to evaluate the CT skills by each latent group to identify the most optimal pattern across countries.
To achieve this goal, we conduct two sub studies: (1) We conducted a cluster analysis on timing and process-related variables, to group students with homogeneous patterns and map with students’ CT and CIL latent scores and background variables across countries. (2) We focus on command efficiency and time allocation pattern from process data to calculate the sequence distance and extract representative patterns when students solving coding task throughout the whole unit. This sub-study focuses on better understanding how students with missing responses allocate their time and pinpoint potential reasons for their missing in certain items.
The current study focuses on a total of 31,344 students who were assigned to the computational thinking modules, “Farm Drone”, in the ICILS 2018. The students were from nine countries and regions, consisting of Denmark (DNK), Finland (FIN), France (FRA), Germany (DEU), Korea (KOR), Luxembourg (LUX), North Rhine-Westphalia (Germany) (DNW), Portugal (PRT), and the United States (USA). The sample consists of 48.8% girls and 51.2% boys.
The first sub-study focuses on a total of 11,468 students who gave full responses to the nine tasks throughout the Farm Drone module, that is, students without any missing responses were included in this cluster analysis. We employed K-Prototype clustering method on 39 aggregated process variables to group students by homogeneous behavioral patterns. Note that within the 39 process variables, there are both continuous variables (i.e., response time, remaining time, reset clicks, length of commands) and categorical variables (i.e., algorithms and irrelevant targets). Four optimal cluster groups were derived.
The result suggested that too long (Cluster 1) or too short length (Cluster 4) of commands may not be helpful to enhance the proficiency score. The reset function shows a good sign for students’ engagement in the task. But the more frequent use of reset function (i.e., Cluster 3) is not necessary to have better CT or CIL scores but would be a very helpful strategy to explore the difficult and complex items. For example, Cluster 2 showed averagely high reset frequencies in the last two items. Time allocation was also regarded as a very important factor to impact on the CT and CIL proficiency. Solving the CT tasks consistently in a fast manner (e.g., Cluster 4) might not help achieve a high score, because some items do need carefully check the procedure of debugging and creating new codes. Optimal allocating time and reserving time to items to the end (e.g., as Cluster 2 did) would provide more flexibility in handling complex items. Spending too much time on items in the middle of the module (e.g., Cluster 1 on item 5 and Cluster 3 on item 7) might put the students in a rush-up situation. Though the students might not skip the items to the end, they probably might not have sufficient time to carefully go over the details.
The second sub-study focuses on a total of 19,876 students who had at least one missing response throughout the Farm Drone module with the aim to extract the nonresponse sequential pattern and pinpoint the potential reason for the missing responses. Two research questions were investigated in this sub-study: (1) are there any nonresponse patterns could be extracted in the CT tasks, and (2) how students allocated their time in different nonresponse patterns.
Six optimal groups were derived in the sequence cluster analysis. The results suggested that even though students did not give a full response to all the items, they still showed very high ability level given by their existent responses as this group of students persistently tried until they did not remain any time. Students who showed missing response patterns at the beginning and the end of module (i.e., Cluster 3) showed a medium level CT and CIL. This group of students might not get used to the environment at the beginning of the test and allocate too much time at the complex items. Students from Cluster 4 and Cluster 1 resulted in averagely low CT and CIL proficiency scores. This might be caused by their too many missing responses in the module, which could be a sign for low engagement, targetless efforts, vague understanding on programming code and fatigue issues.
In summary, this study provides new angle on measurement of students’ CT and CIL skills with the utilization of process data. Advanced process data analysis either by variable-based approach or machine learning techniques or their combination are in pressing trend and worthwhile to have an extensive and intensive research in the near future especially in cross-country comparison.

Author

Qiwei He, Georgetown University