Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Mar 21;9(3):e92584.
doi: 10.1371/journal.pone.0092584. eCollection 2014.

Semi-automatic classification of birdsong elements using a linear support vector machine

Affiliations

Semi-automatic classification of birdsong elements using a linear support vector machine

Ryosuke O Tachibana et al. PLoS One. .

Abstract

Birdsong provides a unique model for understanding the behavioral and neural bases underlying complex sequential behaviors. However, birdsong analyses require laborious effort to make the data quantitatively analyzable. The previous attempts had succeeded to provide some reduction of human efforts involved in birdsong segment classification. The present study was aimed to further reduce human efforts while increasing classification performance. In the current proposal, a linear-kernel support vector machine was employed to minimize the amount of human-generated label samples for reliable element classification in birdsong, and to enable the classifier to handle highly-dimensional acoustic features while avoiding the over-fitting problem. Bengalese finch's songs in which distinct elements (i.e., syllables) were aligned in a complex sequential pattern were used as a representative test case in the neuroscientific research field. Three evaluations were performed to test (1) algorithm validity and accuracy with exploring appropriate classifier settings, (2) capability to provide accuracy with reducing amount of instruction dataset, and (3) capability in classifying large dataset with minimized manual labeling. The results from the evaluation (1) showed that the algorithm is 99.5% reliable in song syllables classification. This accuracy was indeed maintained in evaluation (2), even when the instruction data classified by human were reduced to one-minute excerpt (corresponding to 300-400 syllables) for classifying two-minute excerpt. The reliability remained comparable, 98.7% accuracy, when a large target dataset of whole day recordings (∼30,000 syllables) was used. Use of a linear-kernel support vector machine showed sufficient accuracies with minimized manually generated instruction data in bird song element classification. The methodology proposed would help reducing laborious processes in birdsong analysis without sacrificing reliability, and therefore can help accelerating behavior and studies using songbirds.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Example of syllable labeling and schematic drawing of the semi-automated labeling procedure.
(A) Waveform and spectrogram of a typical Bengalese finch song. Label examples are shown above the waveform panel. Boundaries of syllables and gaps are indicated as dotted vertical lines. (B) Proposed procedure for semi-automated labeling. Stages in gray boxes are automatically processed, and a white box indicates manual processing stage.
Figure 2
Figure 2. Cross validation performances of classification in ideal dataset situation (Evaluation 1).
The correct rates (%) were derived with systematically varying number of labels (L), number of samples per label (N), feature conditions, and optimization algorithms. (A) Correct rates of each of label number conditions (differentiated by line colors) as a function of sample number conditions for ALL-feature and 2R–2L-optimization situation. (B) Correct rates at L = 7, N = 20 location (shown as a black arrow in the leftmost panel) as a function of feature conditions using 2R–2L optimization. (C) Correct rates L = 7, N = 20 location from different optimization functions. Error bars indicate standard error (n = 8 birds). *p<0.05 (Tukey-Kramer HSD).
Figure 3
Figure 3. Cross validation performances of two-minute dataset (Evaluation 2).
(A) Correct rate percentage for each of instruction data size conditions. (B) Cohen's kappa representing degree of agreement with taking the chance levels into account are calculated for each of instruction data size conditions. Error bars indicate standard error (n = 13 birds). *p<0.05 (Tukey-Kramer HSD).
Figure 4
Figure 4. Estimated classification performance for a large dataset (Evaluation 3).
(A) Occurrence frequency of evaluation scores (black dots) derived from each bird and a fitted probability density curve (red line) in one-day recording data. Light blue line indicates distribution of evaluation scores of training data (one-minute recording). The evaluation score is an index representing likelihood of its belongingness to any label class, which is normalized as to be ±1 for the classification margins (shown as arrowheads above panel). (B) Correct rate distribution on the evaluation score axis. The correct rates from all birds (black dots) were pooled and averaged (open square), and fitted by a logistic function (red line). (C) Probability density of correct (blue zone) and incorrect (red) were calculated by multiplying the occurrence probability (red line in A; red broken line in C) and the correct and incorrect rates, respectively. (D) Accumulated probabilities of correct (blue area) and incorrect (red area) classifications. The accumulation, or integral, was performed from plus to minus (left to right in figure) on evaluation score axis. Green line indicates an estimated correct rate curve that finally converges to 98.7% (shown as an arrowhead right side of the panel).
Figure 5
Figure 5. Unstableness of beginning and end of song bout and their evaluation scores.
(A) Three spectrograms of beginning (left panels) and ending parts (right) of bouts. Several syllables in beginning parts (shown as arrow heads) of introductory syllables (labeled ‘i’) are often weak and unstable. Last syllables located at the terminated part of bout are sometimes shortened and unclear (shown as arrow heads). (B) Percent occurrence rates of evaluation scores that have negative values, at various syllable locations in beginning (left) and ending parts (right). N indicates the location of terminating syllable. Error bars show standard error (n = 13).

Similar articles

Cited by

References

    1. Okanoya K (2004) The Bengalese finch: a window on the behavioral neurobiology of birdsong syntax. Ann N Y Acad Sci 1016: 724–735. - PubMed
    1. Tchernichovski O, Lints TJ, Deregnaucourt S, Cimenser A, Mitra PP (2004) Studying the song development process: rationale and methods. Ann N Y Acad Sci 1016: 348–363. - PubMed
    1. Anderson SE, Dave AS, Margoliash D (1996) Template-based automatic recognition of birdsong syllables from continuous recordings. J Acoust Soc Am 100: 1209–1219. - PubMed
    1. Kogan JA, Margoliash D (1998) Automated recognition of bird song elements from continuous recordings using dynamic time warping and hidden Markov models: a comparative study. J Acoust Soc Am 103: 2185–2196. - PubMed
    1. Chen Z, Maher RC (2006) Semi-automatic classification of bird vocalizations using spectral peak tracks. J Acoust Soc Am 120: 2974–2984. - PubMed

Publication types

Grants and funding

This study was supported by Adolescent Mind & Self-Regulation, Grant-in-Aid for Scientific Research on Innovative Areas, MEXT, JAPAN (Grant Number: 23118003). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources