Boosting Adversarial Robustness with CLAT: Criticality Leveraged Adversarial Training

Bhavna Gopal, Huanrui Yang, Jingyang Zhang, Mark Horton, Yiran Chen
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:20142-20161, 2025.

Abstract

Adversarial training (AT) enhances neural network robustness. Typically, AT updates all trainable parameters, but can lead to overfitting and increased errors on clean data. Research suggests that fine-tuning specific parameters may be more effective; however, methods for identifying these essential parameters and establishing effective optimization objectives remain inadequately addressed. We present CLAT, an innovative adversarial fine-tuning algorithm that mitigates adversarial overfitting by integrating "criticality" into the training process. Instead of tuning the entire model, CLAT identifies and fine-tunes fewer parameters in robustness-critical layers—those predominantly learning non-robust features—while keeping the rest of the model fixed. Additionally, CLAT employs a dynamic layer selection process that adapts to changes in layer criticality during training. Empirical results demonstrate that CLAT can be seamlessly integrated with existing adversarial training methods, enhancing clean accuracy and adversarial robustness by over 2% compared to baseline approaches.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-gopal25a, title = {Boosting Adversarial Robustness with {CLAT}: Criticality Leveraged Adversarial Training}, author = {Gopal, Bhavna and Yang, Huanrui and Zhang, Jingyang and Horton, Mark and Chen, Yiran}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {20142--20161}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/gopal25a/gopal25a.pdf}, url = {https://proceedings.mlr.press/v267/gopal25a.html}, abstract = {Adversarial training (AT) enhances neural network robustness. Typically, AT updates all trainable parameters, but can lead to overfitting and increased errors on clean data. Research suggests that fine-tuning specific parameters may be more effective; however, methods for identifying these essential parameters and establishing effective optimization objectives remain inadequately addressed. We present CLAT, an innovative adversarial fine-tuning algorithm that mitigates adversarial overfitting by integrating "criticality" into the training process. Instead of tuning the entire model, CLAT identifies and fine-tunes fewer parameters in robustness-critical layers—those predominantly learning non-robust features—while keeping the rest of the model fixed. Additionally, CLAT employs a dynamic layer selection process that adapts to changes in layer criticality during training. Empirical results demonstrate that CLAT can be seamlessly integrated with existing adversarial training methods, enhancing clean accuracy and adversarial robustness by over 2% compared to baseline approaches.} }
Endnote
%0 Conference Paper %T Boosting Adversarial Robustness with CLAT: Criticality Leveraged Adversarial Training %A Bhavna Gopal %A Huanrui Yang %A Jingyang Zhang %A Mark Horton %A Yiran Chen %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-gopal25a %I PMLR %P 20142--20161 %U https://proceedings.mlr.press/v267/gopal25a.html %V 267 %X Adversarial training (AT) enhances neural network robustness. Typically, AT updates all trainable parameters, but can lead to overfitting and increased errors on clean data. Research suggests that fine-tuning specific parameters may be more effective; however, methods for identifying these essential parameters and establishing effective optimization objectives remain inadequately addressed. We present CLAT, an innovative adversarial fine-tuning algorithm that mitigates adversarial overfitting by integrating "criticality" into the training process. Instead of tuning the entire model, CLAT identifies and fine-tunes fewer parameters in robustness-critical layers—those predominantly learning non-robust features—while keeping the rest of the model fixed. Additionally, CLAT employs a dynamic layer selection process that adapts to changes in layer criticality during training. Empirical results demonstrate that CLAT can be seamlessly integrated with existing adversarial training methods, enhancing clean accuracy and adversarial robustness by over 2% compared to baseline approaches.
APA
Gopal, B., Yang, H., Zhang, J., Horton, M. & Chen, Y.. (2025). Boosting Adversarial Robustness with CLAT: Criticality Leveraged Adversarial Training. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:20142-20161 Available from https://proceedings.mlr.press/v267/gopal25a.html.

Related Material