Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 7:13:6.
doi: 10.3389/fnbot.2019.00006. eCollection 2019.

A Differentiable Physics Engine for Deep Learning in Robotics

Affiliations
Free PMC article

A Differentiable Physics Engine for Deep Learning in Robotics

Jonas Degrave et al. Front Neurorobot. .
Free PMC article

Abstract

An important field in robotics is the optimization of controllers. Currently, robots are often treated as a black box in this optimization process, which is the reason why derivative-free optimization methods such as evolutionary algorithms or reinforcement learning are omnipresent. When gradient-based methods are used, models are kept small or rely on finite difference approximations for the Jacobian. This method quickly grows expensive with increasing numbers of parameters, such as found in deep learning. We propose the implementation of a modern physics engine, which can differentiate control parameters. This engine is implemented for both CPU and GPU. Firstly, this paper shows how such an engine speeds up the optimization process, even for small problems. Furthermore, it explains why this is an alternative approach to deep Q-learning, for using deep learning in robotics. Finally, we argue that this is a big step for deep learning in robotics, as it opens up new possibilities to optimize robots, both in hardware and software.

Keywords: deep learning; differentiable physics engine; gradient descent; neural network controller; robotics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of how a closed loop neural network controller would be used to actuate a robot. The neural network receives sensor signals from the sensors on the robot and uses these to generate motor signals which are sent to the servo motors. The neural network can also generate a signal which it can use at the next timestep to control the robot.
Figure 2
Figure 2
Illustration of the dynamic system with the robot and controller, after unrolling over time. The neural networks gdeep and hdeep with weights W receive sensor signals st from the sensors on the robot and use these to generate motor signals ut which are used by the physics engine fph to find the next state of the robot in the physical system. These neural networks also have a memory, implemented with recurrent connections ht. From the state xt of these robots, the loss L can be found. In order to find dL/dW, every block in this chart needs to be differentiable. The contribution of this paper, is to implement a differentiable fph, which allows us to optimize W to minimize L more efficiently than was possible before.
Figure 3
Figure 3
(A) Illustration of the ball model used in the first task. (B) Illustration of the quadruped robot model with 8 actuated degrees of freedom, 1 in each shoulder, 1 in each elbow. The spine of the robot can collide with the ground, through 4 spheres in the inside of the cuboid. (C) Illustration of the robot arm model with 4 actuated degrees of freedom.
Figure 4
Figure 4
A frame captured by the differentiable camera looking at the model of the pendulum-cart system. The resolution used is 288 by 96 pixels. All the textures are made from pictures of the actual system.
Figure 5
Figure 5
The camera model used to convert the three dimensional point P into a two dimensional pixel on the projection plane (u, v).

Similar articles

Cited by

References

    1. Abadi M., Agarwal A., Barham P., Brevdo E., Chen Z., Citro C., et al. (2016). TensorFlow: large-scale machine learning on heterogeneous systems. arXiv [Preprint]. arXiv:1603.04467. Available online at: https://arxiv.org/abs/1603.04467
    1. Aguilar-Ibañez C. (2017). Stabilization of the pvtol aircraft based on a sliding mode and a saturation function. Int. J. Robust Nonlinear Control 27, 843–859. 10.1002/rnc.3601 - DOI
    1. Al-Rfou R., Alain G., Almahairi A., Angermueller C., Bahdanau D., Ballas N., et al. (2016). Theano: a Python framework for fast computation of mathematical expressions. arXiv [Preprint]. arXiv:1605.02688. Available online at: https://arxiv.org/abs/1605.02688
    1. Bertsekas D. P., Bertsekas D. P., Bertsekas D. P., Bertsekas D. P. (2005). Dynamic Programming and Optimal Control, Vol. 1 Belmont, MA: Athena scientific.
    1. Bongard J., Zykov V., Lipson H. (2006). Resilient machines through continuous self-modeling. Science 314, 1118–1121. 10.1126/science.1133687 - DOI - PubMed

LinkOut - more resources