Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 8:9:e61909.
doi: 10.7554/eLife.61909.

Real-time, low-latency closed-loop feedback using markerless posture tracking

Affiliations

Real-time, low-latency closed-loop feedback using markerless posture tracking

Gary A Kane et al. Elife. .

Abstract

The ability to control a behavioral task or stimulate neural activity based on animal behavior in real-time is an important tool for experimental neuroscientists. Ideally, such tools are noninvasive, low-latency, and provide interfaces to trigger external hardware based on posture. Recent advances in pose estimation with deep learning allows researchers to train deep neural networks to accurately quantify a wide variety of animal behaviors. Here, we provide a new <monospace>DeepLabCut-Live!</monospace> package that achieves low-latency real-time pose estimation (within 15 ms, >100 FPS), with an additional forward-prediction module that achieves zero-latency feedback, and a dynamic-cropping mode that allows for higher inference speeds. We also provide three options for using this tool with ease: (1) a stand-alone GUI (called <monospace>DLC-Live! GUI</monospace>), and integration into (2) <monospace>Bonsai,</monospace> and (3) <monospace>AutoPilot</monospace>. Lastly, we benchmarked performance on a wide range of systems so that experimentalists can easily decide what hardware is required for their needs.

Keywords: DeepLabCut; any animal; computational biology; low-latency; mouse; neuroscience; pose-estimation; real-time tracking; systems biology.

PubMed Disclaimer

Conflict of interest statement

GK, JS, AM, MM No competing interests declared, GL Gonçalo Lopes is director at NeuroGEARS Ltd.

Figures

Figure 1.
Figure 1.. Overview of using DLC networks in real-time experiments within Bonsai, the DLC-Live! GUI, and AutoPilot.
Figure 2.
Figure 2.. Inference speed of different networks with different image sizes on different system configurations.
Solid lines are tests using DLC-MobileNetV2-0.35 networks and dashed lines using DLC-ResNet-50 networks. Shaded regions represent 10th-90th percentile of the distributions. N = 1000–10,000 observations per point.
Figure 2—figure supplement 1.
Figure 2—figure supplement 1.. Inference speed of different networks with different image sizes with different system configurations, using CPUs for inference.
Inference speed of different networks with different image sizes with different system configurations, using CPUs for inference. Solid lines are tests using DLC-MobileNetV2-0.35 networks and dashed lines using DLC-ResNet-50 networks. Shaded regions represent 10th-90th percentile of the distributions. N = 1000–10,000 observations per point.
Figure 2—figure supplement 2.
Figure 2—figure supplement 2.. Accuracy of DLC networks on resized images.
Accuracy of DLC networks on resized images. Accuracy of DLC networks was tested on a range of image sizes (test image scale = 0.125–1.5, different panels). Lines indicate the root mean square error (RMSE, y-axis) between DLC predicted keypoints, using DLC networks trained with different scale parameters (x-axis), and human labels. Different colors are for errors on images in the training dataset (blue) vs. test dataset (red). Note, the RMSE is not reported as on the scaled image, but as would be in the original image (640 × 480) for comparability. .
Figure 2—figure supplement 3.
Figure 2—figure supplement 3.. Dynamic cropping increased inference speed with little sacrifice to accuracy.
(Left) Inference speeds (in fps) for the full size images compared to dynamic cropping with 50, 25, or 10 pixel margin around the animals last position. Boxplots represent the 0.1, 0.3, 0.5, 0.7, and 0.9% quantiles of each distribution. (Right) The cumulative distribution of errors (i.e. the proportion of images with a smaller error) for dynamic tracking relative to full size images.
Figure 2—figure supplement 4.
Figure 2—figure supplement 4.. The number of keypionts in a DeepLabCut network does not affect inference speed.
Inference speeds were not affected by the number of keypoints in a DeepLabCut network. Inference speeds were tested on a series of dog tracking networks with 1 (only the nose), 7 (the face), 13 (the upper body), or 20 (the entire body) keypoints. Inference speeds were tested on different image sizes (x-axis) and on different DeepLabCut architectures (different panels). Lines represent the median and the ribbon represents the 0.1–0.9 quantiles of the distribution of inference speeds.
Figure 3.
Figure 3.. Pose estimation latency using the DLC-Live!-GUI.
Top: A lick sequence from the test video with eight keypoints measured using the DLC-Live!-GUI (using Windows/GeForce GTX 1080 GPU). Image timestamps are presented in the bottom-left corner of each image. Bottom: Latency from image acquisition to obtaining the pose, from the last pose to the current pose, and from image acquisition when the mouse’s tongue was detected to turning on an LED. The width of the violin plots indicate the probability density – the likelihood of observing a given value on the y-axis.
Figure 4.
Figure 4.. Inference speed using the Bonsai-DLC Plugin.
Dashed Lines are ResNet-50, solid lines are MobileNetV2-0.35 (A) Overview of using DeepLabCut within Bonsai. Both capture-to-write or capture to detect for real-time feedback is possible. (B) Top: Direct comparison of the inference speeds using Bonsai-DLC plugin vs. the DeepLabCut-Live! package across image sizes from the same computer (OS: Windows 10, GPU: NVIDIA Ge-Force 1080 with Max-Q Design). Bottom: Inference speeds using the Bonsai-DLC plugin while the GPU was engaged in a particle simulation. More particles indicates greater competition for GPU resources.
Figure 5.
Figure 5.. Latency of DeepLabCut-Live with Autopilot.
(A) Images of an LED were captured on a Raspberry Pi 4 (shaded blue) with Autopilot, sent to an Nvidia Jetson TX2 (shaded red) to process with DeepLabCut-Live and Autopilot Transforms, which triggered a TTL pulse from the Pi when the LED was lit. (B) Latencies from LED illumination to TTL pulse (software timestamps shown, points and densities) varied by resolution (color groups) and acquisition frame rate (shaded FPS bar). Processing time at each stage of the chain in (A) contributes to latency, but pose inference on the relatively slow TX2 (shaded color areas, from Figure 2) had the largest effect. Individual measurements (n = 500 each condition) cluster around multiples of the inter-frame interval (eg. 1/30 FPS = 33.3 ms).
Figure 6.
Figure 6.. Real-time feedback based on posture.
(A) Diagram of the workflow for the dog feedback experiment. Image acquisition and pose estimation were controlled by the DLC-Live! GUI. A DeepLabCut-live Processor was used to detect ‘rearing’ postures based on the likelihood and y-coordinate of the withers and elbows and turn an LED on or off accordingly. (B) An example jump sequence with LED status, labeled with keypoints measured offline using the DeepLabCut-live benchmarking tool. The images are cropped for visibility. (C) Example trajectories of the withers and elbows, locked to one jump sequence. Larger, transparent points represent the true trajectories – trajectories measured offline, from each image in the video. The smaller opaque points represent trajectories measured in real-time, in which the time of each point reflects the delay from image acquisition to pose estimation, with and without the Kalman filter forward prediction. Without forward prediction, estimated trajectories are somewhat delayed from the true trajectories. With the Kalman filter forward prediction, trajectories are less accurate but less delayed when keypoints exhibit rapid changes in position, such as during a rearing movement. (D) The delay from the time the dog first exhibited a rearing posture (from postures measured offline) to the time the LED was turned on or off. Each point represents a single instance of the detection of a transition to a rearing posture or out of a rearing posture.
Figure 6—figure supplement 1.
Figure 6—figure supplement 1.. The Kalman filter predictor reduces errors when predicting up to three frames into the future in a mouse reaching task.
(Top) Images at different stages of the mouse reach, with the blue circle indicating the DeepLabCut predicted coordinate of the back of the hand. (Bottom, Left) The trajectory of the x-coordinate of the back of the mouse’s hand during the reach (pictured above). The thicker gray line indicates the trajectory measured by DLC. the blue line indicates the DLC trajectory as if it was delayed by 1–7 frames (indicated by the panel label on the right). The red line indicates the trajectory as if it were predicted 1–7 frames into the future by the Kalman filter predictor. (Bottom, Right) The cumulative distribution of errors for the back of the hand coordinate as if it were delayed 1–7 frames (blue) or as if it were predicted 1–7 frames into the future using the Kalman filter predictor (red) compared to the DLC estimated back of the hand coordinate. The dotted vertical line indicates the error at which the KF distribution intersects with the delayed pose distribution (i.e. the point at which there are more frames with this pixel error or smaller for the Kalman filter distribution than the delayed pose distribution). .
Figure 6—figure supplement 2.
Figure 6—figure supplement 2.. The Kalman filter predictor reduces errors when predicting up to three frames into the future in a mouse open-field task.
The Kalman filter predictor reduces errors when predicting up to three frames into the future in a mouse open-field task. (Top) Images at different stages of the mouse open field video, with the blue circle indicating the DeepLabCut predicted coordinate of the snout. (Bottom Left) The trajectory of the x-coordinate of the mouse’s snout. The thicker gray line indicates the trajectory measured by DLC. The blue line indicates the DLC trajectory as if it was delayed by 1–7 frames (indicated by the panel label on the right). The red line indicates the trajectory as if it were predicted 1–7 frames into the future by the Kalman filter predictor. (Bottom Right) The cumulative distribution of errors for the back of the hand coordinate as if it were delayed 1–7 frames (blue) or as if it were predicted 1–7 frames into the future using the Kalman filter predictor (red) compared to the DLC estimated snout coordinate. The dotted vertical line indicates the error at which the two distributions intersect (i.e. the point at which there are more frames with this pixel error or smaller for either the delayed pose or the Kalman filter distribution).

Comment in

  • Real-time behavioral analysis.
    Vogt N. Vogt N. Nat Methods. 2021 Feb;18(2):123. doi: 10.1038/s41592-021-01072-z. Nat Methods. 2021. PMID: 33542504 No abstract available.

Similar articles

Cited by

References

    1. Abadi M, Barham P, Chen J, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Chen Z. Tensorflow: a system for large-scale machine learning. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16); 2016. pp. 265–283.
    1. Alted F, Haenel V, Team BD. Python-blosc. d08c8a1Github. 2020 http://github.com/blosc/python-blosc
    1. Andriluka M, Iqbal U, Insafutdinov E, Pishchulin L, Milan A, Gall J, Schiele B. Posetrack: a benchmark for human pose estimation and tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. pp. 5167–5176. - DOI
    1. Bala PC, Eisenreich BR, Yoo SBM, Hayden BY, Park HS, Zimmermann J. Automated markerless pose estimation in freely moving macaques with OpenMonkeyStudio. Nature Communications. 2020;11:4560. doi: 10.1038/s41467-020-18441-5. - DOI - PMC - PubMed
    1. Bazarevsky V, Kartynnik Y, Vakunov A, Raveendran K, Grundmann M. Blazeface: sub-millisecond neural face detection on mobile gpus. arXiv. 2019 https://arxiv.org/abs/1907.05047

Publication types