Recurrent 3D Hand Pose Estimation Using Cascaded Pose-Guided 3D Alignments

Xiaoming Deng; Dexin Zuo; Yinda Zhang; Zhaopeng Cui; Jian Cheng; Ping Tan; Liang Chang; Marc Pollefeys; Sean Fanello; Hongan Wang

doi:10.1109/TPAMI.2022.3159725

Recurrent 3D Hand Pose Estimation Using Cascaded Pose-Guided 3D Alignments

IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):932-945. doi: 10.1109/TPAMI.2022.3159725. Epub 2022 Dec 5.

Authors

Xiaoming Deng, Dexin Zuo, Yinda Zhang, Zhaopeng Cui, Jian Cheng, Ping Tan, Liang Chang, Marc Pollefeys, Sean Fanello, Hongan Wang

PMID: 35294342
DOI: 10.1109/TPAMI.2022.3159725

Abstract

3D hand pose estimation is a challenging problem in computer vision due to the high degrees-of-freedom of hand articulated motion space and large viewpoint variation. As a consequence, similar poses observed from multiple views can be dramatically different. In order to deal with this issue, view-independent features are required to achieve state-of-the-art performance. In this paper, we investigate the impact of view-independent features on 3D hand pose estimation from a single depth image, and propose a novel recurrent neural network for 3D hand pose estimation, in which a cascaded 3D pose-guided alignment strategy is designed for view-independent feature extraction and a recurrent hand pose module is designed for modeling the dependencies among sequential aligned features for 3D hand pose estimation. In particular, our cascaded pose-guided 3D alignments are performed in 3D space in a coarse-to-fine fashion. First, hand joints are predicted and globally transformed into a canonical reference frame; Second, the palm of the hand is detected and aligned; Third, local transformations are applied to the fingers to refine the final predictions. The proposed recurrent hand pose module for aligned 3D representation can extract recurrent pose-aware features and iteratively refines the estimated hand pose. Our recurrent module could be utilized for both single-view estimation and sequence-based estimation with 3D hand pose tracking. Experiments show that our method improves the state-of-the-art by a large margin on popular benchmarks with the simple yet efficient alignment and network architectures.