pDenoiser: A Personalized Speech Enhancement Neural Network for Pre-hospital Emergency Medical Services

IEEE J Biomed Health Inform. 2024 May 1:PP. doi: 10.1109/JBHI.2024.3395832. Online ahead of print.

Abstract

Pre-hospital emergency medical service (EMS) tasks often come with complex and diverse noise interferences, posing challenges in implementing ASR-based medical technologies and hindering efficient and accurate telephonic communication. Among the different types of noise distortion, interfering speech is especially annoying. To address these issues, our aim is to develop a technology capable of extracting the intended speech content of the target physician from noisy and mixed audio during EMS tasks. In this work, we propose a monoaural personalized speech enhancement (PSE) method called pDenoiser, which is a real-time neural network that operates in the time domain. By leveraging the prior vocalization cues of emergency physicians, pDenoiser selectively enhances target speech components while suppressing noise and nontarget speech components, thereby improving speech quality and speech recognition accuracy under noisy conditions. We demonstrate the potential value of our approach through evaluations on both public general-domain test sets and our self-collected real-world EMS test sets. The experimental results are promising, as our model effectively promotes both speech quality and ASR performance under various conditions and outperforms related methods across multiple evaluation metrics. Our methodology will hopefully elevate EMS efficiency and fortify security against nontarget speech during EMS tasks.