Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Sep;128(3):302-19.
doi: 10.1016/j.cognition.2013.02.013. Epub 2013 Jun 6.

The effect of word predictability on reading time is logarithmic

Affiliations

The effect of word predictability on reading time is logarithmic

Nathaniel J Smith et al. Cognition. 2013 Sep.

Abstract

It is well known that real-time human language processing is highly incremental and context-driven, and that the strength of a comprehender's expectation for each word encountered is a key determinant of the difficulty of integrating that word into the preceding context. In reading, this differential difficulty is largely manifested in the amount of time taken to read each word. While numerous studies over the past thirty years have shown expectation-based effects on reading times driven by lexical, syntactic, semantic, pragmatic, and other information sources, there has been little progress in establishing the quantitative relationship between expectation (or prediction) and reading times. Here, by combining a state-of-the-art computational language model, two large behavioral data-sets, and non-parametric statistical techniques, we establish for the first time the quantitative form of this relationship, finding that it is logarithmic over six orders of magnitude in estimated predictability. This result is problematic for a number of established models of eye movement control in reading, but lends partial support to an optimal perceptual discrimination account of word recognition. We also present a novel model in which language processing is highly incremental well below the level of the individual word, and show that it predicts both the shape and time-course of this effect. At a more general level, this result provides challenges for both anticipatory processing and semantic integration accounts of lexical predictability effects. And finally, this result provides evidence that comprehenders are highly sensitive to relative differences in predictability - even for differences between highly unpredictable words - and thus helps bring theoretical unity to our understanding of the role of prediction at multiple levels of linguistic structure in real-time language comprehension.

Keywords: Expectation; Information theory; Probabilistic models of cognition; Psycholinguistics; Reading.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Several hypothesized forms for the predictability effect, plotted in log space. (Of course many other forms are also possible a priori; here we show only those previously mentioned in the literature.) A linear effect is predicted by a simple guessing model. A logarithmic effect is predicted by both an optimal visual discrimination account (Norris, 2006) and an incremental processing account (see text). A super-logarithmic effect is predicted by the audience design theory of uniform information density effects (Levy & Jaeger, 2007).
Figure 2
Figure 2
These graphs show the whole-word processing times resulting from different variants of the incremental processing model. We consider: A linear effect at the fragment level (a, f(x) = −x) versus a reciprocal effect (b, f(x) = 1/x), for different values of k. For k > 1, we also consider two different possibilities for how probability is distributed among the fragments: Either uniformly (pi=pwordk, solid lines) or with later fragments more predictable than earlier fragments (pi=pk(k+1i)2 with pk chosen so that p1 ×…×pk = pword, dashed lines). In all cases, more highly incremental processing (larger k) produces a logarithmic effect at the word level (f(p1) +…+ f(pk) ≈ log pword).
Figure 3
Figure 3
The effect of the probability of word n on reading time measured at word n and on successive words (the spill-over region). Curves are penalized splines with point-wise 95% confidence intervals. To correct for inter-subject variability, we measure the effect of probability against the notional baseline of a perfectly predictable word; zero on this graph does not indicate an instantaneous overall reading time. Confidence intervals do not include the uncertainty induced by measurement error in probability estimation. Lower panels show the proportion of data available at each level of probability. (a) First-pass gaze durations. (b) Self-paced reading times.
Figure 4
Figure 4
By summing the curves in Fig. 3, we can estimate the total slowdown caused by an unpredictable word, regardless of where in the spillover region this slowdown occurs. (a) First-pass gaze durations. (b) Self-paced reading times. Lower panels show the proportion of data available at each level of probability.
Figure 5
Figure 5
To visualize inter-individual variation, we break down the Dundee corpus summed slowdown data (Fig. 4a), analyzing each participant separately. Participant codes from the corpus are shown in the upper right of each panel. Dashed lines represent bootstrapped point-wise 95% confidence intervals. The variation in ‘wiggliness’ of the main curves results in part from noise and numerical instability in mgcv’s GCV-based penalization selection (Wood, 2004) allowing over/under-fitting in some cases. Even so, 9 out of 10 participants show effects of log-probability with an overall linear trend, while no effect was found for participant ‘sg’.
Figure 6
Figure 6
The same curves shown in Fig. 4, but here plotted against raw predictability to better show the severity of the non-linearity. (a) First-pass gaze durations. (b) Self-paced reading times. Lower panels show the proportion of data available at each level of probability. (While these accurately indicate that the majority of our data is concentrated in the <0.1 range, the scale here is somewhat misleading; both analyses contain >10 000 data points with conditional probability >0.1.)
Figure B1
Figure B1
“By item” analysis of per-token mean reading times aggregated across participants, showing the effect of the predictability of wordn on wordn and succeeding words. 95% confidence intervals calculated by bootstrapping over cases. (a) Eye-tracking. (b) Self-paced reading. Lower panels show the proportion of data available at each level of probability.
Figure B2
Figure B2
“By item” analysis of per-token mean reading times aggregated across participants, showing the total reading time slowdown attributable to word predictability. (The sum of the curves in Fig. B1). 95% confidence intervals calculated by bootstrapping over cases. (a) Eye-tracking. (b) Self-paced reading. Lower panels show the proportion of data available at each level of probability.
Figure C1
Figure C1
The effect of penalization in controlling over-fitting. (a) Our original, penalized model (a repeat of Fig. 4). (b) The same model as in a, but fit with raw probability entered instead of log probability, then plotted in log-space. (c) The same model as in a, but fit without penalization. Upper panels show first-pass gaze durations; lower panels show self-paced reading times. That the lower panels show more wiggliness than the upper ones is presumably due to the relative sizes of the two data sets; in the absence of penalization, the smaller data set allows more overfitting than the larger. Dashed lines denote point-wise 95% confidence intervals. Lower panels show the proportion of data available at each level of probability.

Similar articles

Cited by

References

    1. Adelman JS, Brown GDA, Quesada JF. Contextual diversity, not word frequency, determines word-naming and lexical decision times. Psychological Science. 2006;17(9):814–823. doi: 10.1111/j.1467-9280.2006.01787.x. - PubMed
    1. Altmann GTM, Kamide Y. Incremental interpretation at verbs: restricting the domain of subsequent reference. Cognition. 1999;73(3):247–264. doi: 10.1016/S0010-0277(99)00059-1. - PubMed
    1. Atkinson K. The VARCON database, version 4.1. 2004 Retrieved from http://wordlist.sourceforge.net/
    1. Aylett M, Turk A. The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language & Speech. 2004;47(1):31–56. - PubMed
    1. Baayen RH. Demythologizing the word frequency effect: A discriminative learning perspective. The mental lexicon. 2010a;5(3):436–461. doi: 10.1075/ml.5.3.10baa.