Speaker: Anna Smith, PhD, Department of Statistics, University of Kentucky

Title: "Prediction scoring for measuring the replicability of data-driven discoveries"

Data-driven research aims to uncover knowledge and insights regarding the underlying true data generating mechanism (DGM). Results from different studies on a complex DGM (e.g., human behaviors), derived from different data sets using complicated models and algorithms, are hard to quantitatively compare due to random noise and statistical uncertainty in model results. This has been one of the main challenges that contributed to the replication crisis. To address this, we examine the role of predictive models in quantitatively assessing agreement between two datasets that are assumed to come from two distinct DGMs. We formalize a distance between the DGMs that is estimated using cross-validation. We argue that the resulting prediction scores depend on the predictive models created by cross-validation. In this sense, the prediction scores measure the distance between DGMs, along the dimension of the particular predictive model. Using human behavior data, we demonstrate that prediction scores can evaluate preregistered hypotheses and provide insights that compare data from different populations and settings. We examine the asymptotic behavior of the prediction scores with simulated experimental data and demonstrate that leveraging competing predictive models can reveal important differences between the underlying DGMs. Our proposed cross-validated prediction scores are capable of quantifying differences between unobserved data generating mechanisms and allow for the validation and assessment of results from complex models.

Stay connected TwitterFacebookLinkedInYouTubeInstagram