AERA Annual Meeting: How Data Collection Methods Affect Inferences: Lessons From Three Education Data Sets

Information Menu
Search Tips

Navigation and Settings Menu
Change Preferences / Time Zone
Sign In

Social Media Menu
Facebook
X (Twitter)

Back Home

Refresh: Off

Paper Summary

Share...

Direct link:

How Data Collection Methods Affect Inferences: Lessons From Three Education Data Sets

In Event: Looking Backward to See Beyond: Recent Developments in the Study of Worldwide Trends in Educational Reform

Thu, April 11, 4:20 to 5:50pm, Pennsylvania Convention Center, Floor: Level 100, Room 108A

Abstract

Innovative and resource-demanding data collection is crucial for the advancement of social science research, including the field of education. Many efforts have been made to assemble original datasets and make them publicly available for the benefit of the wider research community. Unfortunately, it is common for researchers to use existing datasets without paying sufficient attention to how they were constructed. To shed light on the advantages, limitations, and implications of different data collection methodologies, and assess how (often seemingly trivial) differences in assumptions or practices influence scores, we take advantage of a unique opportunity to compare three new historical datasets, which measure various education policies and practices from around the world and over time. These three datasets have important similarities that facilitate comparisons, measuring similar aspects of (especially political) education, but they were created using different methods and even seemingly similar measures rely on slightly different assumptions. The EPSM dataset (Education Policies and Systems across Modern History) contains information about the content of de jure school curriculum, teacher training, and other education policies and is based on hand-coding a combination of primary and secondary sources. The HEQ initiative (Historical Education Quality Database) gathers information on similar issues but relies entirely on primary sources such as education laws, regulations, and national curriculum plans. The V-Indoc dataset (Varieties of Indoctrination) relies on country-expert assessments of school curriculums, teacher policies, and the presence and nature of political indoctrination. We introduce each dataset and characterize the degree of convergence/divergence between comparable concepts and variables along several relevant dimensions, e.g. coding assumptions, definition of thresholds, differences in de jure policies versus de facto practices, etc. The paper further illustrates these points by zooming in on a series of case studies. Finally, we discuss the broader implications of our analysis for the construction and use of datasets, clarifying the advantages and limitations of each type of data collection strategy, developing sets of guidelines for dataset-makers (pertaining, e.g., to documentation practices and clarification of assumptions underlying the coding) as well as data users (pertaining, e.g., to tailoring choice of dataset to substantive need and being cognizant and transparently communicating assumptions underlying the data used).

How Data Collection Methods Affect Inferences: Lessons From Three Education Data Sets

Abstract

Authors