Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jun 28;174(1):21-31.e9.
doi: 10.1016/j.cell.2018.05.016.

The Control of Vocal Pitch in Human Laryngeal Motor Cortex

Affiliations

The Control of Vocal Pitch in Human Laryngeal Motor Cortex

Benjamin K Dichter et al. Cell. .

Abstract

In speech, the highly flexible modulation of vocal pitch creates intonation patterns that speakers use to convey linguistic meaning. This human ability is unique among primates. Here, we used high-density cortical recordings directly from the human brain to determine the encoding of vocal pitch during natural speech. We found neural populations in bilateral dorsal laryngeal motor cortex (dLMC) that selectively encoded produced pitch but not non-laryngeal articulatory movements. This neural population controlled short pitch accents to express prosodic emphasis on a word in a sentence. Other larynx cortical representations controlling voicing and longer pitch phrase contours were found at separate sites. dLMC sites also encoded vocal pitch during a non-speech singing task. Finally, direct focal stimulation of dLMC evoked laryngeal movements and involuntary vocalization, confirming its causal role in feedforward control. Together, these results reveal the neural basis for the voluntary control of vocal pitch in human speech. VIDEO ABSTRACT.

Keywords: cortical stimulation; human cortex; larynx; motor cortex; pitch; singing; speech; vocalization; voicing.

PubMed Disclaimer

Figures

Figure 1 |
Figure 1 |. Human cortical encoding of produced pitch in dLMC during a word emphasis task.
Participants were instructed to emphasize specific words in a sentence. (A) Laryngeal anatomy. The vocal folds are stretched by the cricothyroid muscle, and increased tension in the vocal folds results in a higher produced pitch. (B) Pitch-correlated neural activity at an example electrode. The speech waveform for one example sentence (emphasis on”I”) is shown at the top. Pitch contours (green lines) and single trial high gamma activation for the example electrode (black rasters) for every sentence spoken by a single participant are shown. Trials are grouped by the word of emphasis and co-aligned to the beginning of the emphasized word. On a single trial level, transient increases in neural activity are associated with pitch change. (C) Spatial localization of electrodes that have a significant correlation with pitch, after controlling for supralaryngeal articulators. Electrodes cluster on the anterior aspect of the precentral gyrus in the dorsal laryngeal motor cortex (dLMC, located lateral to hand and medial to the lip cortical representations). The right hemisphere is shown, and the arrow indicates the example electrode in (B). We also observed feedback responses in parabelt auditory cortex on superior temporal gyrus (STG). (D) Relationship between pitch and high gamma (HG) cortical activity across all significant electrodes in dLMC (mean and s.d. in grey, example electrode in black) over normalized pitch range. Activation increases monotonically with pitch values (middle 90 percentile range plotted). (E) Correlation values for significant electrodes in the dLMC and auditory STG regions. Electrodes in dLMC were all positively correlated with the produced pitch of the emphasized word, whereas activity of STG electrodes were both positively and negatively correlated with pitch. (F-H) dLMC activity shows both motor and auditory response properties. Temporal analyses show dLMC activity during speaking conditions precedes playback (listening) of same utterances. (F) Pearson cross-correlation for the example electrode in (B) for speaking (green) and playback (purple). (G) Neural activation aligned to sentence onset for speaking (green) and playback (purple) conditions for the example electrode (mean ± SEM). (H) Average temporal offset of neural activation for each electrode in the dLMC with respect to sentence onset. See also Figure S1.
Figure 2 |
Figure 2 |. Cortical representation of pitch contour components in speech: accent, phrase, and voicing.
(A) The Fujisaki model decomposes the pitch contour in natural speech into accent, phrase, and voicing components. Inference of the Fujisaki model is shown on an example sentence. In order from top to bottom: acoustic waveform of produced sentence; pitch contour extracted from sentence; phrase (green), accent (purple), and voicing (brown) components extracted from the pitch contour; original pitch contour (green) and Fujisaki reconstruction of pitch contour (black). (B) Single-trial high gamma raster for an electrode controlling the phrase component of the pitch contour. Green curves show the phrase component of the Fujisaki model for each trial, and the grey rasters show the activation of an example “phrase” electrode (r=0.45). This electrode responded similarly to sentences with different accents (top and bottom). (C) Single-trial high gamma raster for an electrode controlling pitch accents. Purple lines show the accent component for an example participant separated by sentence style, and the grey raster shows the activation of an example “accent” electrode (r=0.17). (D)Single-trial high-gamma raster for an electrode controlling voicing. Brown lines show the proportion of sentences that are voiced for each style at each timepoint. This electrode has higher activation when the participant is voicing (r=0.2). (E) The correlation coefficient between activation of the accent, phrase, and voicing components of the pitch model for each of the electrodes over the sensorimotor cortex. Filled dots from inside and open dots from outside dLMC. Example electrodes in b-d are marked in their respective colors. Electrodes tend to be predominantly along the axes. (F) Venn diagram showing numbers of electrodes with dissociable and joint encoding. (G) Bilateral spatial location of electrodes on the vSMC across all participants. Accent and voicing electrodes were selected using a trial-wise shuffle test (p<0.001). Phrase electrodes were selected using a trial-wise shuffle test and a cutoff of r<0.2. Each brain reconstruction shows the kernel density estimation illustrating the spatial organization of electrodes on a common brain. Accent electrodes were strongly localized to the dLMC, while voicing and phrase electrodes were found in both the dLMC and the vLMC. See also Figure S2.
Figure 3 |
Figure 3 |. Pitch encoding during singing.
(A) Singing task with two simple melodies. Notes are colored by low, middle, and high target tone. The sound waveforms are shown above, with produced pitch for each note below. (B) High gamma response for two example electrodes in dLMC of the example participant for high (green) and low (purple) notes. Time=0 is the acoustic onset of the note. The yellow and blue segments mark time windows used to compute correlations in (C). Error bars are sem across trials. (C) The Pearson correlation between cortical activation and vocal pitch for low and high notes using 50 ms before acoustic onset (left) and 100 – 300 ms after acoustic onset (middle). right: The Pearson correlation computed between pitch and high gamma activation for the contrastive emphasis task for this participant. Arrows mark the electrodes from (B), and the solid black line marks the central sulcus. (D) Comparison between pitch encoding in dLMC electrodes during singing and during speaking for all participants. Pitch encoding was strongly correlated across electrodes in the two behavioral conditions (Pearson r=0.33, p-value < 0.01). See also Figure S3.
Figure 4 |
Figure 4 |. Electrical stimulation of dLMC.
(A) Cortical stimulation mapping of larynx responses in the primary sensory and motor cortices for 18 participants. The larynx was monitored using electromyography (EMG) electrodes on a customized endotracheal tube. The shading of the gray indicates the relative density of positive laryngeal response sites. Other evoked movements are not shown. The red star marks the example site that is shown in more detail in (B) and (C). (B) Laryngeal EMG response for stimulations ranging from 0–100 V. Stimulation was delivered 11 times at 60 V and once at each of the other magnitudes for this patient. (C) Three other patients also received graded stimulation. Peak-to-trough response amplitude was determined for each stimulation, and is shown for each patient, normalized to the maximum and minimum response for each larynx side of each participant. Responses to 60V are shown as a box plot (the borders are the range and the box edges are the quartiles). Laryngeal responses for the example stimulation site of (A) and (B) are shown in red. Stimulation responses to 60V are greater than 0V (p-value < 1e-6, one-sided t-test) and less than 100V (p-value < 1e-3, one-sided t-test). Therefore responses are not an all or none, but rather a graded response where more stimulation yields a greater laryngeal response. Stimulation magnitude is strongly correlated with laryngeal response across participants. The gray shading shows the standard error of the slope determined using bootstrapping (n=1000). (D) Across participants, sites that evoked arm movement were dorsal of the larynx sites and sites that evoked mouth movement were ventral of the larynx sites. (E) Sites that evoked a spontaneous involuntary voiced vocalization during awake stimulation mapping. The vocalization evoked by the red location is shown in (F). (F) Spectrogram and pitch contour of an example evoked vocalization. Noise from the stimulator created a 3.5 kHz band in the spectrogram. (G) Delay times between the start of stimulation and the beginning of the response for anesthetized (black) and awake (grey) stimulation. All of the response times for laryngeal response were shorter than times for vocalization response (the borders of the box plots mark the ranges and the box edges mark the quartiles). See also Figure S4.

Similar articles

Cited by

References

    1. van Alphen P, and van Bergem DR (1989). Markov models and their application in speech recognition, Proceedings of the Institute of Phonetic Sciences of the University of Amsterdam13: 1–26.
    1. Behroozmand R, Shebek R, Hansen DR, Oya H, Robin DA, Howard MA, and Greenlee JDW (2015). Sensory – motor networks involved in speech production and motor control: An fMRI study. Neuroimage 109, 418–428. - PMC - PubMed
    1. Belyk M, and Brown S (2015). Pitch underlies activation of the vocal system during affective vocalization. Soc. Cogn. Affect. Neurosci - PMC - PubMed
    1. Belyk M, and Brown S (2017). The origins of the vocal brain in humans. Neurosci. Bi-obehav. Rev 77, 177–193. - PubMed
    1. Boersma P (1993). Accurate Short-Term Analysis of the Fundamental Frequency and the Harmonics-To-Noise Ratio of a Sampled Sound. Proc. Inst. Phonetic Sci 17, 97– 110.

Publication types

LinkOut - more resources