Self-supervised recurrent depth estimation with attention mechanisms

Ilya Makarov; Maria Bakhanova; Sergey Nikolenko; Olga Gerasimova

doi:10.7717/peerj-cs.865

Self-supervised recurrent depth estimation with attention mechanisms

PeerJ Comput Sci. 2022 Jan 31:8:e865. doi: 10.7717/peerj-cs.865. eCollection 2022.

Authors

Ilya Makarov^#^{1

2

3}, Maria Bakhanova^#¹, Sergey Nikolenko^{4

5}, Olga Gerasimova¹

Affiliations

¹ HSE University, Moscow, Russia.
² Artificial Intelligence Research Institute (AIRI), Moscow, Russia.
³ Big Data Research Center, National University of Science and Technology MISIS, Moscow, Russia.
⁴ Steklov Institute of Mathematics at St. Petersburg, St. Petersburg, Russia.
⁵ St. Petersburg State University, St. Petersburg, Russia.

^# Contributed equally.

Abstract

Depth estimation has been an essential task for many computer vision applications, especially in autonomous driving, where safety is paramount. Depth can be estimated not only with traditional supervised learning but also via a self-supervised approach that relies on camera motion and does not require ground truth depth maps. Recently, major improvements have been introduced to make self-supervised depth prediction more precise. However, most existing approaches still focus on single-frame depth estimation, even in the self-supervised setting. Since most methods can operate with frame sequences, we believe that the quality of current models can be significantly improved with the help of information about previous frames. In this work, we study different ways of integrating recurrent blocks and attention mechanisms into a common self-supervised depth estimation pipeline. We propose a set of modifications that utilize temporal information from previous frames and provide new neural network architectures for monocular depth estimation in a self-supervised manner. Our experiments on the KITTI dataset show that proposed modifications can be an effective tool for exploiting temporal information in a depth prediction pipeline.

Keywords: Attention Mechanism; Augmented Reality; Autonomous Vehicles; Computer Vision; Deep Convolutional Neural Networks; Depth Reconstruction; Recurrent Neural Networks; Self-Supervised Learning.

Grants and funding

The article was prepared within the framework of the HSE University Basic Research Program. The work of I. Makarov in the Related Work section was prepared in the framework of the federal academic leadership program Priority 2030 of NUST MISIS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.