Contrastive Learning via Variational Information Bottleneck

IEEE Trans Pattern Anal Mach Intell. 2025 May 20:PP. doi: 10.1109/TPAMI.2025.3571990. Online ahead of print.

Abstract

Recent advances in self-supervised learning have witnessed great achievements, especially with the introduction of contrastive learning, where the goal is to maximize the mutual information between different augmentations of the same image, i.e., positive pairs. However, such optimization does not necessarily correspond to optimal representation due to noisy samples, thus inevitably being over-confident in the relevance between views. As a result, the learned model would capture spurious correlation and retain superfluous information that deteriorates representations. In this paper, we facilitate contrastive learning by reducing superfluous relevance between positive views. To this end, we introduce the representation entropy minimization regularization over the objective of vanilla contrastive learning, which forces representations to retain possibly the least information, thus alleviating superfluous relevance from irrelevant views. Then, we derive the analytical expression of the proposed objective by converting it to an information bottleneck problem and solving via variation approximation, which leads to a novel contrastive learning framework, termed as CLIMB, short for Contrastive Learning via variational InforMation Bottleneck. Experiments over multiple benchmarks demonstrate that CLIMB brings consistent improvement. Notably, using DINO as an instantiation, CLIMB achieves 4.5% and 3.5% gain under the k-NN classification metric with EfficientNet-B0 and ResNet-50 as backbones, respectively.