Objective: We consider the problem of Auditory Attention Detection (AAD), where the goal is to detect which speaker a person is attending to, in a multi-speaker environment, based on neural activity. This work aims to analyze the influence of head-related filtering and ear-specific decoding on the performance of an AAD algorithm.
Approach: We recorded high-density EEG of 16 normal-hearing subjects as they listened to two speech streams while tasked to attend to the speaker in either their left or right ear. The attended ear was switched between trials. The speech stimuli were administered either dichotically, or after filtering using Head-Related Transfer Functions (HRTFs). A spatio-temporal decoder was trained and used to reconstruct the attended stimulus envelope, and the correlations between the reconstructed and the original stimulus envelopes were used to perform AAD, and arrive at a percentage correct score over all trials.
Main results: We found that the HRTF condition resulted in significantly higher AAD performance than the dichotic condition. However, speech intelligibility, measured under the same set of conditions, was lower for the HRTF filtered stimuli. We also found that decoders trained and tested for a specific attended ear performed better, compared to decoders trained and tested for both left and right attended ear simultaneously. In the context of the decoders supporting hearing prostheses, the former approach is less realistic, and studies in which each subject always had to attend to the same ear may find over-optimistic results.
Significance: This work shows the importance of using realistic binaural listening conditions and training on a balanced set of experimental conditions to obtain results that are more representative for the true AAD performance in practical applications.