Deep perceptual embeddings for unlabelled animal sound events

Veronica Morfi; Robert F Lachlan; Dan Stowell

doi:10.1121/10.0005475

Deep perceptual embeddings for unlabelled animal sound events

J Acoust Soc Am. 2021 Jul;150(1):2. doi: 10.1121/10.0005475.

Authors

Veronica Morfi¹, Robert F Lachlan², Dan Stowell¹

Affiliations

¹ Machine Listening Lab, Centre for Digital Music (C4DM), Department of Electronic Engineering, Queen Mary University of London, London, United Kingdom.
² Department of Psychology, Royal Holloway University of London, London, United Kingdom.

PMID: 34340499
DOI: 10.1121/10.0005475

Abstract

Evaluating sound similarity is a fundamental building block in acoustic perception and computational analysis. Traditional data-driven analyses of perceptual similarity are based on heuristics or simplified linear models, and are thus limited. Deep learning embeddings, often using triplet networks, have been useful in many fields. However, such networks are usually trained using large class-labelled datasets. Such labels are not always feasible to acquire. We explore data-driven neural embeddings for sound event representation when class labels are absent, instead utilising proxies of perceptual similarity judgements. Ultimately, our target is to create a perceptual embedding space that reflects animals' perception of sound. We create deep perceptual embeddings for bird sounds using triplet models. In order to deal with the challenging nature of triplet loss training with the lack of class-labelled data, we utilise multidimensional scaling (MDS) pretraining, attention pooling, and a triplet mining scheme. We also evaluate the advantage of triplet learning compared to learning a neural embedding from a model trained on MDS alone. Using computational proxies of similarity judgements, we demonstrate the feasibility of the method to develop perceptual models for a wide range of data based on behavioural judgements, helping us understand how animals perceive sounds.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Animals
Humans
Sound*

Grants and funding

BB/R008736/1/BB_/Biotechnology and Biological Sciences Research Council/United Kingdom