Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec 1;28(12):4222-4233.
doi: 10.1093/cercor/bhx277.

Neural Encoding of Auditory Features during Music Perception and Imagery

Affiliations

Neural Encoding of Auditory Features during Music Perception and Imagery

Stephanie Martin et al. Cereb Cortex. .

Abstract

Despite many behavioral and neuroimaging investigations, it remains unclear how the human cortex represents spectrotemporal sound features during auditory imagery, and how this representation compares to auditory perception. To assess this, we recorded electrocorticographic signals from an epileptic patient with proficient music ability in 2 conditions. First, the participant played 2 piano pieces on an electronic piano with the sound volume of the digital keyboard on. Second, the participant replayed the same piano pieces, but without auditory feedback, and the participant was asked to imagine hearing the music in his mind. In both conditions, the sound output of the keyboard was recorded, thus allowing precise time-locking between the neural activity and the spectrotemporal content of the music imagery. This novel task design provided a unique opportunity to apply receptive field modeling techniques to quantitatively study neural encoding during auditory mental imagery. In both conditions, we built encoding models to predict high gamma neural activity (70-150 Hz) from the spectrogram representation of the recorded sound. We found robust spectrotemporal receptive fields during auditory imagery with substantial, but not complete overlap in frequency tuning and cortical location compared to receptive fields measured during auditory perception.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Experimental task design. (A) The participant played an electronic piano with the sound of the digital keyboard turned on (perception condition). (B) In the second condition, the participant played the piano with the sound turned off and instead imagined the corresponding music in his mind (imagery condition). In both conditions, the sound output of keyboard was recorded in synchrony with the neural signals (even when the participant did not hear any sound in the imagery condition). The models take as input a spectrogram consisting of time-varying spectral power across a range of acoustic frequencies (200–7000 Hz, bottom left) and output time-varying neural signals. To assess the prediction accuracy, the predicted neural signal (light lines) is compared to the original neural signal (dark lines).
Figure 2.
Figure 2.
Prediction accuracy. (A) Electrode location overlaid on cortical surface reconstruction of the participant’s cerebrum. (B) Overlay of the spectrogram contours for the perception (blue) and imagery (orange) condition (10% of maximum energy from the spectrograms) corresponding to a segment of Chopin’s prelude. (C) Actual and predicted high gamma band power (70–150 Hz) induced by the music perception and imagery segment in (B). Electrode 67 has very similar predictive power across conditions, whereas electrode 179 has significantly better predictive power for perception compared to imagery. Recordings are from 2 different temporal lobe sites highlighted in pink in (A). (D) Prediction accuracy is plotted on the cortical surface reconstruction of the participant’s cerebrum (map thresholded at P < 0.05; FDR correction). (E) Prediction accuracy of significant electrodes of the perception model as a function of the imagery model. Electrode-specific prediction accuracy is correlated between perception and imagery models (r = 0.65; P < 10–4; randomization test). (F) Prediction accuracy as a function of anatomic location (pre-central gyrus (pre-CG), post-central gyrus (post-CG), supramarginal gyrus (SMG), medial temporal gyrus (MTG), and superior temporal gyrus (STG)).
Figure 3.
Figure 3.
Spectrotemporal receptive fields. (A) Examples of standard STRFs for the perception (left panel) and imagery (right panel) models (warm colors indicate where the neuronal ensemble is excited, cold colors indicate where the neuronal ensemble is inhibited). Electrodes whose STRFs are shown are outlined in black in (B). Gray electrodes were removed from the analysis due to excessive noise (see Materials and Methods). (B) The correlation coefficients between the vectorized STRFs in the perception and imagery condition are plotted on the surface reconstruction of the participant’s brain for electrodes that had significant prediction accuracy in at least one of the perception and imagery conditions.
Figure 4.
Figure 4.
Auditory tuning. (A) Peak latency estimated from STRFs was significantly correlated between perception and imagery conditions (r = 0.43; P < 0.005; randomization test). (B) Examples of frequency tuning curves (right) for perception and imagery encoding models (averaged over the time lag dimension of the STRF). Black outline in the surface reconstruction of the patient’s brain (left) indicates electrode location. Gray electrodes were removed from the analysis due to excessive noise. Correlation coefficients between the perception and imagery frequency tuning curves are plotted for significant electrodes on the cortical surface reconstruction (left). The bottom panel plots the histogram of electrode correlation coefficients between perception and imagery frequency tuning. (C) Proportion of predictive electrode sites (N = 41) with peak tuning at each frequency. Tuning peaks were identified as significant parameters in the acoustic frequency tuning curves (z > 3.1; P < 0.001) and separated by more than one-third of an octave.
Figure 5.
Figure 5.
Reconstruction accuracy. (A) Left panel, overall reconstruction accuracy of the spectrogram representation for perception (blue) and imagery (orange) conditions. Error bars denote SEM. Right panel, reconstruction accuracy as a function of acoustic frequency. Shaded region denotes SEM. (B) Examples of original and reconstructed segments for the perception (left) and the imagery (right) model. (C) Left panel, distribution of identification rank for all reconstructed spectrogram (N = 140 for perception and N = 135 for imagery). Median identification rank is 0.65 and 0.63 for the perception and imagery decoding model, respectively, which is significantly higher than 0.50 chance level (P < 0.001; randomization test). Right panel, receiver operating characteristic (ROC) plot of identification performance for the perception (blue curve) and imagery (orange curve) model. Diagonal black line indicates no predictive power.
Figure 6.
Figure 6.
Cross-condition analysis. Reconstruction accuracy when the decoding model was trained on the perception condition and applied to the imagery neural data and vice-versa. Decoding performance improved by 50% when the model was trained and tested on the imagery condition (r = 0.28; P < 0.001; randomization test), compared to when the perception model was applied to imagery data (r = 0.19; P < 0.001; randomization test).
Figure 7.
Figure 7.
Control analysis for motor confound. (A) STRFs for 2 neighboring electrodes for perception (left) and imagery (right) encoding models. For electrode 67, the STRF is strongly correlated between perception and imagery conditions (r = 0.04), while there is a nonsignificant correlation (r = 0.76) in the adjacent electrode 68. (B) Prediction accuracy plotted on the cortical surface reconstruction of the participant’s brain (map thresholded at P < 0.05; FDR correction) for the passive listening data sets (speech and music). Black dots represent electrodes that had significant prediction accuracy in at least one of the perception and imagery conditions. (C) Overall reconstruction accuracy (upper panel) and median identification rank (lower panel) when using all electrodes, only temporal electrodes, or only auditory-response electrodes (see Materials and methods for details).

Similar articles

Cited by

References

    1. Aertsen AMHJ, Olders JHJ, Johannesma PIM. 1981. Spectro-temporal receptive fields of auditory neurons in the grassfrog: III. Analysis of the stimulus-event relation for natural stimuli. Biol Cybern. 39:195–209. - PubMed
    1. Aleman A, Nieuwenstein MR, Böcker KB, de Haan EH. 2000. Music training and mental imagery ability. Neuropsychologia. 38:1664–1668. - PubMed
    1. Atiani S, Elhilali M, David SV, Fritz JB, Shamma SA. 2009. Task difficulty and performance induce diverse adaptive patterns in gain and shape of primary auditory cortical receptive fields. Neuron. 61:467–480. - PMC - PubMed
    1. Boonstra TW, Houweling S, Muskulus M. 2009. Does asynchronous neuronal activity average out on a macroscopic scale? J Neurosci. 29:8871–8874. - PMC - PubMed
    1. Brodsky W, Henik A, Rubinstein B-S, Zorman M. 2003. Auditory imagery from musical notation in expert musicians. Percept Psychophys. 65:602–612. - PubMed

Publication types