Gait is a unique biometric trait with several useful properties. It can be recognized remotely and without the cooperation of the individual, with low-resolution cameras, and it is difficult to obscure. Therefore, it is suitable for crime investigation, surveillance, and access control. Existing approaches for gait recognition generally belong to the supervised learning domain, where all samples in the dataset are annotated. In the real world, annotation is often expensive and time-consuming. Moreover, convolutional neural networks (CNNs) have dominated the field of gait recognition for many years and have been extensively researched, while other recent methods such as vision transformer (ViT) remain unexplored. In this manuscript, we propose a self-supervised learning (SSL) approach for pretraining the feature extractor using the DINO model to automatically learn useful gait features with the vision transformer architecture. The feature extractor is then used for extracting gait features on which the fully connected neural network classifier is trained using the supervised approach. Experiments on CASIA-B and OU-MVLP gait datasets show the effectiveness of the proposed approach.
Keywords: Gait Energy Image (GEI); gait recognition; people identification; self-supervised learning; vision transformers.