Elastic Knowledge Distillation by Learning From Recollection

IEEE Trans Neural Netw Learn Syst. 2023 May;34(5):2647-2658. doi: 10.1109/TNNLS.2021.3107317. Epub 2023 May 2.

Abstract

Model performance can be further improved with the extra guidance apart from the one-hot ground truth. To achieve it, recently proposed recollection-based methods utilize the valuable information contained in the past training history and derive a "recollection" from it to provide data-driven prior to guide the training. In this article, we focus on two fundamental aspects of this method, i.e., recollection construction and recollection utilization. Specifically, to meet the various demands of models with different capacities and at different training periods, we propose to construct a set of recollections with diverse distributions from the same training history. After that, all the recollections collaborate together to provide guidance, which is adaptive to different model capacities, as well as different training periods, according to our similarity-based elastic knowledge distillation (KD) algorithm. Without any external prior to guide the training, our method achieves a significant performance gain and outperforms the methods of the same category, even as well as KD with well-trained teacher. Extensive experiments and further analysis are conducted to demonstrate the effectiveness of our method.