Adapting to Evolving Adversaries with Regularized Continual Robust Training

Sihui Dai, Christian Cianfarani, Vikash Sehwag, Prateek Mittal, Arjun Bhagoji
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:11954-12000, 2025.

Abstract

Robust training methods typically defend against specific attack types, such as $\ell_p$ attacks with fixed budgets, and rarely account for the fact that defenders may encounter new attacks over time. A natural solution is to adapt the defended model to new adversaries as they arise via fine-tuning, a method which we call continual robust training (CRT). However, when implemented naively, fine-tuning on new attacks degrades robustness on previous attacks. This raises the question: how can we improve the initial training and fine-tuning of the model to simultaneously achieve robustness against previous and new attacks? We present theoretical results which show that the gap in a model’s robustness against different attacks is bounded by how far each attack perturbs a sample in the model’s logit space, suggesting that regularizing with respect to this logit space distance can help maintain robustness against previous attacks. Extensive experiments on 3 datasets (CIFAR-10, CIFAR-100, and ImageNette) and over 100 attack combinations demonstrate that the proposed regularization improves robust accuracy with little overhead in training time. Our findings and open-source code lay the groundwork for the deployment of models robust to evolving attacks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-dai25c, title = {Adapting to Evolving Adversaries with Regularized Continual Robust Training}, author = {Dai, Sihui and Cianfarani, Christian and Sehwag, Vikash and Mittal, Prateek and Bhagoji, Arjun}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {11954--12000}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/dai25c/dai25c.pdf}, url = {https://proceedings.mlr.press/v267/dai25c.html}, abstract = {Robust training methods typically defend against specific attack types, such as $\ell_p$ attacks with fixed budgets, and rarely account for the fact that defenders may encounter new attacks over time. A natural solution is to adapt the defended model to new adversaries as they arise via fine-tuning, a method which we call continual robust training (CRT). However, when implemented naively, fine-tuning on new attacks degrades robustness on previous attacks. This raises the question: how can we improve the initial training and fine-tuning of the model to simultaneously achieve robustness against previous and new attacks? We present theoretical results which show that the gap in a model’s robustness against different attacks is bounded by how far each attack perturbs a sample in the model’s logit space, suggesting that regularizing with respect to this logit space distance can help maintain robustness against previous attacks. Extensive experiments on 3 datasets (CIFAR-10, CIFAR-100, and ImageNette) and over 100 attack combinations demonstrate that the proposed regularization improves robust accuracy with little overhead in training time. Our findings and open-source code lay the groundwork for the deployment of models robust to evolving attacks.} }
Endnote
%0 Conference Paper %T Adapting to Evolving Adversaries with Regularized Continual Robust Training %A Sihui Dai %A Christian Cianfarani %A Vikash Sehwag %A Prateek Mittal %A Arjun Bhagoji %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-dai25c %I PMLR %P 11954--12000 %U https://proceedings.mlr.press/v267/dai25c.html %V 267 %X Robust training methods typically defend against specific attack types, such as $\ell_p$ attacks with fixed budgets, and rarely account for the fact that defenders may encounter new attacks over time. A natural solution is to adapt the defended model to new adversaries as they arise via fine-tuning, a method which we call continual robust training (CRT). However, when implemented naively, fine-tuning on new attacks degrades robustness on previous attacks. This raises the question: how can we improve the initial training and fine-tuning of the model to simultaneously achieve robustness against previous and new attacks? We present theoretical results which show that the gap in a model’s robustness against different attacks is bounded by how far each attack perturbs a sample in the model’s logit space, suggesting that regularizing with respect to this logit space distance can help maintain robustness against previous attacks. Extensive experiments on 3 datasets (CIFAR-10, CIFAR-100, and ImageNette) and over 100 attack combinations demonstrate that the proposed regularization improves robust accuracy with little overhead in training time. Our findings and open-source code lay the groundwork for the deployment of models robust to evolving attacks.
APA
Dai, S., Cianfarani, C., Sehwag, V., Mittal, P. & Bhagoji, A.. (2025). Adapting to Evolving Adversaries with Regularized Continual Robust Training. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:11954-12000 Available from https://proceedings.mlr.press/v267/dai25c.html.

Related Material