Strength-Adaptive Adversarial Training

IEEE Trans Pattern Anal Mach Intell. 2026 Apr 16:PP. doi: 10.1109/TPAMI.2026.3684741. Online ahead of print.

Abstract

Adversarial training (AT) has been shown to effectively enhance a network's resilience against adversarial attack. However, conventional AT, which relies on a fixed pre-specified perturbation budget, suffers from several limitations when training robust models. First, enforcing the same perturbation budget across networks with different capacities leads to varying levels of robustness disparity between natural and robust accuracies, which deviates from the desired outcome of a robust network. Second, because the perturbation budget is fixed throughout training, the attack strength fails to scale adaptively with the evolving robustness of the model. This mismatch often results in robust overfitting and further degradation of adversarial robustness. To address these limitations, we propose a novel technique called Strength-Adaptive Adversarial Training (SAAT). In SAAT, the adversary incorporates an adversarial-loss constraint to guide the generation of adversarial training data. This constraint allows the perturbation budget to adapt dynamically based on the current training state, which effectively mitigates robust overfitting. Moreover, by explicitly regulating the attack strength through the adversarial loss, SAAT enables precise control over the robustness disparity between natural accuracy and adversarial robustness. Extensive experiments demonstrate that SAAT substantially improves adversarial robustness over standard AT.