Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Sep 4;115(36):E8538-E8546.
doi: 10.1073/pnas.1713020115. Epub 2018 Aug 20.

Chance, long tails, and inference in a non-Gaussian, Bayesian theory of vocal learning in songbirds

Affiliations

Chance, long tails, and inference in a non-Gaussian, Bayesian theory of vocal learning in songbirds

Baohua Zhou et al. Proc Natl Acad Sci U S A. .

Abstract

Traditional theories of sensorimotor learning posit that animals use sensory error signals to find the optimal motor command in the face of Gaussian sensory and motor noise. However, most such theories cannot explain common behavioral observations, for example, that smaller sensory errors are more readily corrected than larger errors and large abrupt (but not gradually introduced) errors lead to weak learning. Here, we propose a theory of sensorimotor learning that explains these observations. The theory posits that the animal controls an entire probability distribution of motor commands rather than trying to produce a single optimal command and that learning arises via Bayesian inference when new sensory information becomes available. We test this theory using data from a songbird, the Bengalese finch, that is adapting the pitch (fundamental frequency) of its song following perturbations of auditory feedback using miniature headphones. We observe the distribution of the sung pitches to have long, non-Gaussian tails, which, within our theory, explains the observed dynamics of learning. Further, the theory makes surprising predictions about the dynamics of the shape of the pitch distribution, which we confirm experimentally.

Keywords: dynamical Bayesian inference; power-law tails; sensorimotor learning; vocal control.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
The dynamical Bayesian model (Bayesian filter). (A) A Bayesian filter consists of the recursive application of two general steps (35): (i) an observation update, which corresponds to novel sensory input and updates the underlying probability distribution of plausible motor commands using Bayes’ formula, and (ii) a time evolution update, which denotes the temporal propagation and corresponds to uncertainty increasing with time (main text); here the probability distribution is updated by convolution with a propagator. These two steps are repeated for each new piece of sensory data in a recursive loop. (B) Example distributions for the entire procedure in two scenarios: Gaussian (Top) and heavy-tailed (Bottom) distributions. The x axis, ϕt, represents the motor command which results in a specific pitch sung by the bird. The outcome of this motor command is then measured by two different sensory modalities, represented by {st(i)}i=1,2, with corresponding likelihood functions L1(ϕt;Δ) and L2(ϕt;0), respectively. The Δ shift for modality 1 is induced by the experimentalist, which results in the animal compensating its pitch toward +Δ. Dashed brown lines represent the individual likelihood functions from the individual modalities, and the solid lines represent their product, which signals how likely it is that the correct motor command corresponds to ϕt. Heavy-tailed distributions can produce a bimodal likelihood, which, multiplied by the prior, suppresses large-error signals. In contrast, Gaussian likelihoods are unimodal and result in greater compensatory changes in behavior.
Fig. 2.
Fig. 2.
Experimental data and model fitting. The same six parameters of the model are used to simultaneously fit all data. (A) The symbols with error bars are four groups of experimental data, with different colors and symbols indicating different shift sizes (red-brown circle, 0.5-semitone shift; blue square, 1-semitone shift; green diamond, 1.5-semitones shift; cyan upper triangle, 3-semitones shift). The error bars indicate the SE of the group mean, accounting for variances across individual birds and within one bird (Materials and Methods). For each group, the data are combined from three to eight different birds, and the sign of the experimental perturbation (lowering or raising pitch) is always defined so that adaptive (i.e., error-correcting) vocal changes are positive. Data points without error bars had only a single bird, and they are not used for the fitting, which we denote by open symbols. The mean pitch sung on day 0 by each bird is defined as the zero-semitone compensation (ϕ=0). The solid lines with 1-SD bands (Materials and Methods) are results of the model fits, with the same color convention as in experimental data. Inset shows learning curves in absolute units, without rescaling by the shift size and without model error bands. (B) The lower triangles with error bars show the data from a staircase-shift experiment, with the same plotting conventions as in A. The data are combined from three birds. During the experiment, every 6 d, the shift size is increased by 0.35 semitone, as shown by the dotted horizontal line segments. On the last day of the experiment, the experienced pitch shift is 2.8 semitones. The magenta solid line with 1-SD band is the model fit. The combined quality of fit for the five curves collectively (four step perturbations and a staircase perturbation) is χ2/df 1.47 (compared with χ2/df 28.4 for the null model of the nonadaptive, zero-pitch compensation line). Note, however, that such Gaussian statistics of the fit quality should be taken with a grain of salt for nonnormally distributed data. (C) Dots represent the distribution of pitch on day 0, before the pitch shift perturbation (the baseline distribution), where the data are from 23 different experiments (all pitch shifts combined). The gray parabola is a Gaussian fit to the data within the ±1 semitone range. The empirical distribution has long, nonexponential tails. The brown solid line with 1-SD band is the model fit. Deviance of the model fit relative to the perfect fit [the latter estimated as the Nemenman–Shafee–Bialek entropy of the data (37, 38)] is 0.057 per sample point.
Fig. 3.
Fig. 3.
Predictions of our model using the parameter values obtained from fitting the data shown in Fig. 2. The dots with error bars (A–E) and the histogram (F) represent experimental data with colors, symbols, error bars (from bootstrapping), and other plotting conventions as in Fig. 2. The dotted lines with 1-SD bands represent model predictions. Our model correctly predicts the behaviors of the SDs of the pitch distributions. Specifically, the best-fit model lines predict increases in the SD in B, C, and E, which correspond to 1 semitone, 1.5 semitones, and the staircase shift, respectively. At the same time, the data show that the SD increases for B, C, and E (P value for a positive dependence of the SD when regressed on time is 4×104, 5×105, and <106 for B, C, and E, respectively). (F) Our model predicts that, at the end of the staircase experiment (mean and SD shown in Figs. 2B and 3E, respectively), the pitch distribution should be bimodal, while it is unimodal initially (compare Fig. 2C). This is also supported by the data. Specifically, a fit with a mixture of two Gaussian peaks has an Akaike’s information criterion score higher than a fit to a single Gaussian by 50 (in decimal log units), which is highly statistically significant (the data here are from day 47 from the single bird who was exposed to the staircase shift for the longest time, and the amount of data is insufficient to fit more complex distributions). Further, the two peaks are centered far from each other (0.59±0.04 of a semitone and 2.17±0.62 semitones, with error bars obtained by bootstrapping), illustrating the true bimodality. Neither the data nor the models show unambiguous bimodality in other learning cases.
Fig. 4.
Fig. 4.
(AC) Objective function as a function of the two parameters (stability and scale) for (A) the first (shifted) likelihood, (B) the second (unshifted) likelihood, and (C) the propagation kernel, while the respective other four parameters are held fixed. The gray shades represent the decimal logarithm of the objective function (effectively, logarithms of the negative log-likelihood), and lighter shades mean a better fit. Because of the logarithmic scaling, small changes in the shading represent large changes in the quality of the fit. The black crosses show the parameter values for the deepest local minimum in this range of parameters. Note that, even though the minimum in C is close to the Gaussian kernel (αk=2), a Gaussian kernel cannot fit the data well. Specifically, it cannot reproduce a non-Gaussian distribution of the baseline pitch, instead essentially matching the parabola in Fig. 2C.
Fig. 5.
Fig. 5.
Fits and predictions with the power-law family of heavy-tailed distributions instead of the stable distribution family. (AC) Equivalent to the panels in Fig. 2 AC. (DF) Equivalent to the panels in Fig. 3 C, E, and F. The shaded areas around the theoretical curves represent confidence intervals for 1 SD. The quality of all of the five fitted mean compensation curves combined is χ2/df 1.56, so that the truncated stable distributions used in the main text provide for (slightly) better fits. At the same time, the deviance of the fitted baseline distribution in C relative to the perfect fit, estimated as the NSB entropy of the data (37, 38), is 0.022 per sample point, slightly better than for the truncated stable distribution model (Fig. 2).

Similar articles

Cited by

References

    1. Shadmehr R, Smith MA, Krakauer JW. Error correction, sensory prediction, and adaptation in motor control. Annu Rev Neurosci. 2010;33:89–108. - PubMed
    1. McDonnell MD, Ward LM. The benefits of noise in neural systems: Bridging theory and experiment. Nat Rev Neurosci. 2011;12:415–426. - PubMed
    1. Neuringer A. Operant variability: Evidence, functions, and theory. Psychon Bull Rev. 2002;9:672–705. - PubMed
    1. Kao MH, Doupe AJ, Brainard MS. Contributions of an avian basal ganglia–forebrain circuit to real-time modulation of song. Nature. 2005;433:638–643. - PubMed
    1. Linkenhoker BA, Knudsen EI. Incremental training increases the plasticity of the auditory space map in adult barn owls. Nature. 2002;419:293–296. - PubMed

Publication types

LinkOut - more resources