Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2017 Nov;12(6):1100-1122.
doi: 10.1177/1745691617693393. Epub 2017 Aug 25.

Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning

Affiliations
Free PMC article
Review

Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning

Tal Yarkoni et al. Perspect Psychol Sci. .
Free PMC article

Abstract

Psychology has historically been concerned, first and foremost, with explaining the causal mechanisms that give rise to behavior. Randomized, tightly controlled experiments are enshrined as the gold standard of psychological research, and there are endless investigations of the various mediating and moderating variables that govern various behaviors. We argue that psychology's near-total focus on explaining the causes of behavior has led much of the field to be populated by research programs that provide intricate theories of psychological mechanism but that have little (or unknown) ability to predict future behaviors with any appreciable accuracy. We propose that principles and techniques from the field of machine learning can help psychology become a more predictive science. We review some of the fundamental concepts and tools of machine learning and point out examples where these concepts have been used to conduct interesting and important psychological research that focuses on predictive research questions. We suggest that an increased focus on prediction, rather than explanation, can ultimately lead us to greater understanding of behavior.

Keywords: explanation; machine learning; prediction.

Figures

Figure 1.
Figure 1.
Training and test error produced by fitting either a linear regression (left) or a 10th-order polynomial regression (right) when the true relationship in the population (red line) is linear. In both cases, the test data (green) deviate more from the model’s predictions (blue line) than the training data (blue). However, the flexibility of the 10th-order polynomial model facilitates much greater overfitting, resulting in lower training error, but much higher test error, than the linear model. MSE = mean squared error.
Figure 2.
Figure 2.
An estimator’s predictions can deviate from the desired outcome (or true scores) in two ways. First, the predictions may display a systematic tendency (or bias) to deviate from the central tendency of the true scores (compare right panels with left panels). Second, the predictions may show a high degree of variance, or imprecision (compare bottom panels with top panels).
Figure 3.
Figure 3.
Schematic illustration of the bias-variance decomposition. Left: under the classical error model, prediction error is defined as the sum of squared differences between true scores and observed scores (black lines). Right: the bias-variance decomposition partitions the total sum of squared errors into two separate components: a bias term that captures a model’s systematic tendency to deviate from the true scores in a predictable way (black line), and a variance term that represents the deviations of the individual observations from the model’s expected prediction (gray lines).
Figure 4.
Figure 4.
Large samples guards against overfitting. See text for explanation.
Figure 5.
Figure 5.
Regularization via the lasso. Training/test performance of OLS and lasso regression in two sample datasets that illustrate some of the conditions under which the lasso will tend to outperform OLS. (A) In the “dense” dataset with a low n to p ratio, the sample size is small (n = 100), and there are many predictors (p = 50) that each makes a small individual contribution to the outcome. (B) In the “sparse” dataset with a high n to p ratio, the sample is large (n = 1000), the number of predictors is small (p = 20), and only a few (5) variables make non-zero (and large) contributions. The top panels display the coefficient paths for the lasso as the penalty parameter (x-axis) increases (separately for each simulated dataset). Observe how predictors gradually drop out of the model (i.e., their coefficients are eventually reduced to 0) as the penalty rises and the lasso model increasingly values the sparsity of the solution over the minimization of prediction error. The bottom panels display the total prediction error (measured with mean squared error) in the training (dashed lines) and test (solid lines) samples for both OLS (yellow) and lasso (blue) regression. Observe that, in the small, dense dataset, where the number of predictors is high relative to the sample size, OLS grossly overfits the data (the gap between the solid and dashed yellow lines is very large), and is outperformed by the lasso in the test data for a wide range of penalty settings (the solid blue line is below the solid yellow line for the entire x- axis range). By contrast, when the sample size is large relative to the number of predictors, the performance gap is typically small, and lasso only outperforms OLS for narrowly-tuned ranges of the penalty parameter, if at all.

Similar articles

See all similar articles

Cited by 84 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback