Improving Zero-Shot Adversarial Robustness in Vision-Language Models by Closed-form Alignment of Adversarial Path Simplices

Junhao Dong; Piotr Koniusz; Yifei Zhang; Hao Zhu; Weiming Liu; Xinghua Qu; Yew-Soon Ong

Improving Zero-Shot Adversarial Robustness in Vision-Language Models by Closed-form Alignment of Adversarial Path Simplices

Junhao Dong, Piotr Koniusz, Yifei Zhang, Hao Zhu, Weiming Liu, Xinghua Qu, Yew-Soon Ong

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:14061-14078, 2025.

Abstract

Vision-Language Models (VLMs) such as CLIP excel at zero-shot classification due to large-scale pre-training but are vulnerable to adversarial examples. Adversarial fine-tuning robustifies zero-shot models by aligning prediction scores of individual adversaries with their clean counterparts, which typically overlooks intermediate adversarial samples along the adversarial trajectory crossing the decision boundary. Such intermediate adversaries and their vicinity produce informative representations capturing the decision boundary in detail. They can be improved by sampling adversarial candidates from simplices formed by joining two consecutive vertices on the adversarial trajectory and their clean counterpart. However, sampling simplices for adversaries is very costly. To train robust VLM, we overcome these limitations by Taylor expansion and formulating an upper-bound of alignment loss that depends on the Jacobian/Hessian obtained at clean samples. As regions between clean and intermediate adversarial samples capture a larger decision landscape, we robustify VLM by plausible adversaries from simplices by our closed-form formulation equivalent to infinite uniform sampling of the simplex. We obtain state-of-the-art robustness across 15 datasets and diverse vision-language tasks.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-dong25f,
  title = 	 {Improving Zero-Shot Adversarial Robustness in Vision-Language Models by Closed-form Alignment of Adversarial Path Simplices},
  author =       {Dong, Junhao and Koniusz, Piotr and Zhang, Yifei and Zhu, Hao and Liu, Weiming and Qu, Xinghua and Ong, Yew-Soon},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {14061--14078},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/dong25f/dong25f.pdf},
  url = 	 {https://proceedings.mlr.press/v267/dong25f.html},
  abstract = 	 {Vision-Language Models (VLMs) such as CLIP excel at zero-shot classification due to large-scale pre-training but are vulnerable to adversarial examples. Adversarial fine-tuning robustifies zero-shot models by aligning prediction scores of individual adversaries with their clean counterparts, which typically overlooks intermediate adversarial samples along the adversarial trajectory crossing the decision boundary. Such intermediate adversaries and their vicinity produce informative representations capturing the decision boundary in detail. They can be improved by sampling adversarial candidates from simplices formed by joining two consecutive vertices on the adversarial trajectory and their clean counterpart. However, sampling simplices for adversaries is very costly. To train robust VLM, we overcome these limitations by Taylor expansion and formulating an upper-bound of alignment loss that depends on the Jacobian/Hessian obtained at clean samples. As regions between clean and intermediate adversarial samples capture a larger decision landscape, we robustify VLM by plausible adversaries from simplices by our closed-form formulation equivalent to infinite uniform sampling of the simplex. We obtain state-of-the-art robustness across 15 datasets and diverse vision-language tasks.}
}

Endnote

%0 Conference Paper
%T Improving Zero-Shot Adversarial Robustness in Vision-Language Models by Closed-form Alignment of Adversarial Path Simplices
%A Junhao Dong
%A Piotr Koniusz
%A Yifei Zhang
%A Hao Zhu
%A Weiming Liu
%A Xinghua Qu
%A Yew-Soon Ong
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-dong25f
%I PMLR
%P 14061--14078
%U https://proceedings.mlr.press/v267/dong25f.html
%V 267
%X Vision-Language Models (VLMs) such as CLIP excel at zero-shot classification due to large-scale pre-training but are vulnerable to adversarial examples. Adversarial fine-tuning robustifies zero-shot models by aligning prediction scores of individual adversaries with their clean counterparts, which typically overlooks intermediate adversarial samples along the adversarial trajectory crossing the decision boundary. Such intermediate adversaries and their vicinity produce informative representations capturing the decision boundary in detail. They can be improved by sampling adversarial candidates from simplices formed by joining two consecutive vertices on the adversarial trajectory and their clean counterpart. However, sampling simplices for adversaries is very costly. To train robust VLM, we overcome these limitations by Taylor expansion and formulating an upper-bound of alignment loss that depends on the Jacobian/Hessian obtained at clean samples. As regions between clean and intermediate adversarial samples capture a larger decision landscape, we robustify VLM by plausible adversaries from simplices by our closed-form formulation equivalent to infinite uniform sampling of the simplex. We obtain state-of-the-art robustness across 15 datasets and diverse vision-language tasks.

APA

Dong, J., Koniusz, P., Zhang, Y., Zhu, H., Liu, W., Qu, X. & Ong, Y.. (2025). Improving Zero-Shot Adversarial Robustness in Vision-Language Models by Closed-form Alignment of Adversarial Path Simplices. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:14061-14078 Available from https://proceedings.mlr.press/v267/dong25f.html.

Improving Zero-Shot Adversarial Robustness in Vision-Language Models by Closed-form Alignment of Adversarial Path Simplices

Abstract

Cite this Paper

Related Material