Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct;45(11):1942-1952.
doi: 10.1038/s41386-020-0776-y. Epub 2020 Jul 25.

Deep learning-based behavioral analysis reaches human accuracy and is capable of outperforming commercial solutions

Affiliations

Deep learning-based behavioral analysis reaches human accuracy and is capable of outperforming commercial solutions

Oliver Sturman et al. Neuropsychopharmacology. 2020 Oct.

Abstract

To study brain function, preclinical research heavily relies on animal monitoring and the subsequent analyses of behavior. Commercial platforms have enabled semi high-throughput behavioral analyses by automating animal tracking, yet they poorly recognize ethologically relevant behaviors and lack the flexibility to be employed in variable testing environments. Critical advances based on deep-learning and machine vision over the last couple of years now enable markerless tracking of individual body parts of freely moving rodents with high precision. Here, we compare the performance of commercially available platforms (EthoVision XT14, Noldus; TSE Multi-Conditioning System, TSE Systems) to cross-verified human annotation. We provide a set of videos-carefully annotated by several human raters-of three widely used behavioral tests (open field test, elevated plus maze, forced swim test). Using these data, we then deployed the pose estimation software DeepLabCut to extract skeletal mouse representations. Using simple post-analyses, we were able to track animals based on their skeletal representation in a range of classic behavioral tests at similar or greater accuracy than commercial behavioral tracking systems. We then developed supervised machine learning classifiers that integrate the skeletal representation with the manual annotations. This new combined approach allows us to score ethologically relevant behaviors with similar accuracy to humans, the current gold standard, while outperforming commercial solutions. Finally, we show that the resulting machine learning approach eliminates variation both within and between human annotators. In summary, our approach helps to improve the quality and accuracy of behavioral data, while outperforming commercial systems at a fraction of the cost.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1. The labels used to train the DLC networks.
a The standardized points of interest used to track the animal. The points of interest required to track the animal in the open field (b), the elevated plus maze (c) and the forced swim test (d).
Fig. 2
Fig. 2. A comparison of basic tracking parameters in the open field test.
a Schematic showing the workflow of the comparison between systems. b, c Distance and time in center as reported by DeepLabCut (with post-hoc analysis), EthoVision XT14, and the TSE Multi Conditioning System (TSE). d, e Correlation analysis of the performance of the different systems. Data expressed as mean ± standard error of the mean. Colors represent individual animals and are consistent across analysis techniques for direct comparison (n = 20) ****p < 0.0001.
Fig. 3
Fig. 3. A comparison of basic tracking parameters in the forced swim test and elevated plus maze.
a Schematic showing the workflow of the comparison between systems. b, d, f, h Basic tracking parameters in the forced swim test and elevated plus maze as reported by both DeepLabCut (with post-hoc analysis) and EthoVision XT14. c, e, g, i Correlation between the scores of the two systems. Data expressed as mean ± standard error of the mean. Colors represent individual animals and are consistent across analysis techniques for comparison (FST n = 29, EPM n = 24) *p < 0.05.
Fig. 4
Fig. 4. A comparison of quantifying ethological behaviors in the forced swim test and elevated plus maze.
a Schematic of the workflow for the comparison between systems. b, c The polygon used in the definition of floating, and the body points taken into account when defining head dips. d, e Floating in the forced swim test and head dips in the elevated plus maze as reported by three human annotators (rater 1–3), DeepLabCut (with post-hoc analysis), and EthoVision XT14’s behavioral recognition module. f, g Correlation analysis for comparison between approaches. h Schematic showing the experimental design for yohimbine injections. i Time spent in the open arms after injection with yohimbine (3 mg/kg) or vehicle, as reported by DeepLabCut and EthoVision. j Head dips as reported manually, by DeepLabCut (with post-hoc analysis) and EthoVision. k Correlation analysis for comparison between approaches regarding head dips. Data expressed as mean ± standard error of the mean. Colors represent individual animals and are consistent across analysis techniques for comparison (FST n = 10, EPM n = 5) **p < 0.01, ****p < 0.0001.
Fig. 5
Fig. 5. A comparison of complex behavioral scoring between human raters, machine learning classifiers and commercially available solutions.
a Schematic of the workflow. b, c Unsupported and supported rears in the open field test as reported by three human raters (averaged and plotted as manual scoring) and three machine learning classifiers (averaged and plotted as ML classifiers), EthoVision XT14 and the TSE Multi Conditioning System (TSE). d, e Correlation analysis for comparison. Data expressed as mean ± standard error of the mean. Colors represent individual animals and are consistent across analysis techniques for comparison (n = 20). *p < 0.05, **p > 0.01, ***p < 0.001, ****p > 0.0001.

Comment in

Similar articles

Cited by

References

    1. Berman GJ. Measuring behavior across scales. BMC Biol. 2018;16:23. - PMC - PubMed
    1. von Ziegler L, Sturman O, Bohacek J. Big behavior: challenges and opportunities in a new era of deep behavior profiling. 2020. 10.1038/s41386-020-0751-7 - PMC - PubMed
    1. Maroteaux G, Loos M, van der Sluis S, Koopmans B, Aarts E, van Gassen K, et al. High-throughput phenotyping of avoidance learning in mice discriminates different genotypes and identifies a novel gene. Genes Brain Behav. 2012;11:772–84. - PMC - PubMed
    1. van den Boom BJG, Pavlidi P, Wolf CJH, Mooij AH, Willuhn I. Automated classification of self-grooming in mice using open-source software. J Neurosci Methods. 2017;289:48–56. - PubMed
    1. Bailoo JD, Bohlen MO, Wahlsten D. The precision of video and photocell tracking systems and the elimination of tracking errors with infrared backlighting. J Neurosci Methods. 2010;188:45–52. - PMC - PubMed

Publication types