Face alignment is widely used in high-level face analysis applications, such as human activity recognition and human-computer interaction. However, most existing models involve a large number of parameters and are computationally inefficient in practical applications. In this paper, we aim to build a lightweight facial landmark detector by proposing a network-level architecture-slimming method. Concretely, we introduce a selective feature fusion mechanism to quantify and prune redundant transformation and aggregation operations in a high-resolution supernetwork. Moreover, we develop a triple knowledge distillation scheme to further refine a slimmed network, where two peer student networks could learn the implicit landmark distributions from each other while absorbing the knowledge from a teacher network. Extensive experiments on challenging benchmarks, including 300W, COFW, and WFLW, demonstrate that our approach achieves competitive performance with a better trade-off between the number of parameters (0.98 M-1.32 M) and the number of floating-point operations (0.59 G-0.6 G) when compared to recent state-of-the-art methods.
Keywords: face alignment; knowledge distillation; lightweight model; network pruning.