Individual Submission Summary
Share...

Direct link:

Exploring heterogeneity in meta-analysis, using insights from machine learning

Fri, October 5, 10:45am to 12:15pm, Doubletree Hilton, Room: Coronado

Abstract

Meta-analysis is a common example of secondary data analysis: Data have often been collected by somebody other than the researcher, and for a different purpose. Oftentimes, similar research questions are studied in different labs, sampling from different populations, using idiosyncratic methods and instrumentation (Higgins et al., 2009). Such between-study differences introduce heterogeneity in the effect sizes found (Cesario, 2014). Three approaches have been proposed to deal with between-studies heterogeneity (Higgins et al., 2009): First, if studies are assumed to be different, they should not be meta-analyzed. Secondly, if they are similar, a random-effects model can estimate the true effect size distribution. Thirdly, heterogeneity caused by differences between studies can be accounted for using meta-regression.
A problem arises, however, when the number of studies on the topic of interest is low, whereas the number of moderators to be considered is relatively large. Such situations do not fit comfortably into the classic meta-analysis paradigm, which, like any regression-based approach, requires many cases per parameter. This may partly explain why, despite the fact that software to conduct meta-analysis with multiple moderators is readily available (Viechtbauer, 2010), most published meta-analyses do not account for more than a few moderators, if any. In many cases, the sample size is simply too low to obtain the power required to reliably examine heterogeneity (Riley, Higgins, & Deeks, 2011).
What is currently lacking is a “fourth approach”, for cases where heterogeneity is suspected and moderators have been coded, but there is a lack of theory to whittle the list of potential moderators to a manageable number (Thompson & Higgins, 2002). This calls for an exploratory technique which can perform variable selection — indentifying which moderators most strongly influence the observed effect size. MetaForest aims to address this need. It uses random-effects weighted random forests to explore heterogeneity in meta-analyses. Random forests are a powerful learning algorithm, flexible yet relatively robust to overfitting. I will present simulation studies which show that, even in datasets as small as 20 cases, MetaForest has excellent predictive performance, cross-validated R2cv, and power to distinguish relevant moderators from irrelevant moderators using variable importance measures. Furthermore, I will provide a short tutorial on how to integrate MetaForest in meta-analysis projects, to ensure that important moderators have not been overlooked. MetaForest is available as an R package (“metaforest”), and has a web interface for those unfamiliar with R (developmentaldatascience.org/metaforest). Further details are provided in the preprint: https://osf.io/khjgb/

Author