Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan 1;124(Pt A):906-917.
doi: 10.1016/j.neuroimage.2015.09.048. Epub 2015 Oct 4.

Robust decoding of selective auditory attention from MEG in a competing-speaker environment via state-space modeling

Affiliations

Robust decoding of selective auditory attention from MEG in a competing-speaker environment via state-space modeling

Sahar Akram et al. Neuroimage. .

Abstract

The underlying mechanism of how the human brain solves the cocktail party problem is largely unknown. Recent neuroimaging studies, however, suggest salient temporal correlations between the auditory neural response and the attended auditory object. Using magnetoencephalography (MEG) recordings of the neural responses of human subjects, we propose a decoding approach for tracking the attentional state while subjects are selectively listening to one of the two speech streams embedded in a competing-speaker environment. We develop a biophysically-inspired state-space model to account for the modulation of the neural response with respect to the attentional state of the listener. The constructed decoder is based on a maximum a posteriori (MAP) estimate of the state parameters via the Expectation Maximization (EM) algorithm. Using only the envelope of the two speech streams as covariates, the proposed decoder enables us to track the attentional state of the listener with a temporal resolution of the order of seconds, together with statistical confidence intervals. We evaluate the performance of the proposed model using numerical simulations and experimentally measured evoked MEG responses from the human brain. Our analysis reveals considerable performance gains provided by the state-space model in terms of temporal resolution, computational complexity and decoding accuracy.

Keywords: Attention; MEG; Nonlinear filtering; Speech segregation; State-space models.

PubMed Disclaimer

Figures

Algorithm 1
Algorithm 1. Estimation of the state-space parameters
Fig. 1
Fig. 1
Schematic depiction of auditory object encoding in the auditory cortex. Here, the auditory scene consists of the mixture of two concurrent speech streams. Recent studies show that cortical activity (black traces) is selectively phased-locked to the temporal envelope of the attended speaker as opposed to the unattended speaker's envelope.
Fig. 2
Fig. 2
A) von Mises–Fisher probability density for different κ parameters. B)Schematic view of von Mises–Fisher statistics on a three dimensional sphere: normalized neural response data points are shown by black dots on the unit sphere. Red and green arrows indicate the vectors of predicted neural response based on attending to speaker 1 or speaker 2, respectively. The angles between the neural response at window k and each of the predictors are shown as θ1,k and θ2,k, for the case of attending to speaker 1 (left plot) and speaker 2 (right plot), respectively. The point cloud formed by the neural response is aligned with the direction of the predictor vector corresponding to the attention state.
Fig. 3
Fig. 3
A) MEG magnetic field distribution for the first DSS component of a sample subject shows a stereotypical pattern of neural activity, originating separately in the left and right auditory cortices. Red and green contours represent the magnetic field strength. Blue arrows schematically represent the locations of the dipole currents, generating the measured magnetic field. B) Estimated TRF for the sample subject has significant components analogous to the well-known M50 and M100 auditory responses, as well as later responses, as demonstrated in the figure.
Fig. 4
Fig. 4
Simulated neural response (black traces) and model prediction (red traces) of A) speaker one and B) speaker two at SNR = 10 dB. Black arrows indicate the instructed attentional state of the subjects. The MEG units are in pT/m. C) Estimated values of {pk} with 95% confidence intervals. D) Estimated values of {pk} from simulated neural response vs. SNR=0,−10and −15 dB. Error hulls indicate 95% confidence intervals. E) Behavioral results of the simulated neural response vs. SNR values ranging from −20 to 10 dB. The time fraction for which the estimated attentional state follows the target speaker (the opposite speaker) as a function of different SNRs is shown in the left panel (right panel).
Fig. 5
Fig. 5
Decoding auditory attentional modulation in experimental MEG data. In each subplot, the neural response (black traces) and the model prediction (red traces) for attending to speaker one and speaker two are shown in the first and second panels, respectively, for one sample subject. The third panel shows the estimated values of {pk} and the corresponding confidence intervals using multi-trial analysis for three sample subjects. The fourth panel shows the estimated {pk} values for single trials. A) Condition 1: attending to the speaker one through the entire trial. B) Condition 2: attending to the speaker two through the entire trial. C) Condition 3: attending to the speaker one until t = 28 s and switching attention to the speaker two after the 2 s pause. D) Condition 4: attending to the speaker two until t = 28 s and switching attention to the speaker one after the 2 s pause. Dashed lines in subplots C and D indicate the start of the 2 s silence cue for attentional switch. Error hulls indicate 90% confidence intervals. The MEG units are in pT/m.
Fig. 6
Fig. 6
Schematic illustration of attentional states and behavioral analysis. A) The estimated attentional condition at each time point can take one of the followings states: Target Attended (TA), Alternative Target Attended (Alt-TA), and the Unfollowed state (UF). Examples of the attentional states for a sample subject are depicted in panel A, for a sample trial from condition 3. B1, C1) Target speaker attended time fractions are plotted with respect to the Alt-target attended time fractions for individual subjects in constant-attention and attention-switch experiments, respectively. B2, C2) Target and Alt-target attended time fractions are computed via multi-trial analysis. Box plots indicate the median and quartile percentages of subjects' behavioral performances in attending to the target and non-target speakers (first and second boxes in each plot, respectively). Individual subject performances, shown in blue markers, are plotted on top of each box plot.
Fig. 7
Fig. 7
A step-wise illustration of the EM convergence. A) The output of the state-space decoder is plotted after each EM iteration for sample trials of attending to speaker 1 (green curves), and attending to speaker 2 (orange curves), in the Constant-Attention experiment. B) EM iterations are plotted for sample trials of the Attention-Switch experiment and for attention switches from speaker 1 to speaker 2 (green curves), and from speaker 2 to speaker 1 (orange curves).

Similar articles

Cited by

References

    1. Akram S, Simon JZ, Shamma SA, Babadi B. A state-space model for decoding auditory attentional modulation from MEG inacompeting-speaker environment. Advances in Neural Information Processing Systems. 2014:460–468.
    1. Ba D, Babadi B, Purdon PL, Brown EN. Convergence and stability of iteratively re-weighted least squares algorithms. IEEE Trans Signal Process. 2014;62:183–195. - PMC - PubMed
    1. Bergman AS. Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press; Cambridge, MA: 1994.
    1. Bialek W, Rieke F, Van Steveninck RdR, Warland D. Reading a neural code. Science. 1991;252:1854–1857. - PubMed
    1. Brungart DS. Informational and energetic masking effects in the perception of two simultaneous talkers. J Acoust Soc Am. 2001;109:1101–1109. - PubMed

Publication types