Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Haoyang Liu, Aditya Singh, Yijiang Li, Haohan Wang
Conference on Parsimony and Learning, PMLR 280:1-23, 2025.

Abstract

Enhancing the robustness of deep learning models, particularly in the realm of vision transformers (ViTs), is crucial for their real-world deployment. In this work, we provide a finetuning approach to enhance the robustness of vision transformers inspired by the concept of nullspace from linear algebra. Our investigation centers on whether a vision transformer can exhibit resilience to input variations akin to the nullspace property in linear mappings, implying that perturbations sampled from this nullspace do not influence the model’s output when added to the input. We start from the observation that many existing ViTs satisfy this property because a non-trivial nullspace exists in their patch embedding layers. Secondly, as nullspace is a concept associated with linear algebra, we demonstrate that it is possible to synthesize approximate nullspace elements for the non-linear blocks of ViTs employing an optimisation strategy. Finally, we propose a fine-tuning strategy for ViTs wherein we augment the training data with synthesized approximate nullspace noise. After finetuning, we find that the model demonstrates robustness to adversarial and natural image perbutations alike.

Cite this Paper


BibTeX
@InProceedings{pmlr-v280-liu25a, title = {Approximate Nullspace Augmented Finetuning for Robust Vision Transformers}, author = {Liu, Haoyang and Singh, Aditya and Li, Yijiang and Wang, Haohan}, booktitle = {Conference on Parsimony and Learning}, pages = {1--23}, year = {2025}, editor = {Chen, Beidi and Liu, Shijia and Pilanci, Mert and Su, Weijie and Sulam, Jeremias and Wang, Yuxiang and Zhu, Zhihui}, volume = {280}, series = {Proceedings of Machine Learning Research}, month = {24--27 Mar}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v280/main/assets/liu25a/liu25a.pdf}, url = {https://proceedings.mlr.press/v280/liu25a.html}, abstract = {Enhancing the robustness of deep learning models, particularly in the realm of vision transformers (ViTs), is crucial for their real-world deployment. In this work, we provide a finetuning approach to enhance the robustness of vision transformers inspired by the concept of nullspace from linear algebra. Our investigation centers on whether a vision transformer can exhibit resilience to input variations akin to the nullspace property in linear mappings, implying that perturbations sampled from this nullspace do not influence the model’s output when added to the input. We start from the observation that many existing ViTs satisfy this property because a non-trivial nullspace exists in their patch embedding layers. Secondly, as nullspace is a concept associated with linear algebra, we demonstrate that it is possible to synthesize approximate nullspace elements for the non-linear blocks of ViTs employing an optimisation strategy. Finally, we propose a fine-tuning strategy for ViTs wherein we augment the training data with synthesized approximate nullspace noise. After finetuning, we find that the model demonstrates robustness to adversarial and natural image perbutations alike.} }
Endnote
%0 Conference Paper %T Approximate Nullspace Augmented Finetuning for Robust Vision Transformers %A Haoyang Liu %A Aditya Singh %A Yijiang Li %A Haohan Wang %B Conference on Parsimony and Learning %C Proceedings of Machine Learning Research %D 2025 %E Beidi Chen %E Shijia Liu %E Mert Pilanci %E Weijie Su %E Jeremias Sulam %E Yuxiang Wang %E Zhihui Zhu %F pmlr-v280-liu25a %I PMLR %P 1--23 %U https://proceedings.mlr.press/v280/liu25a.html %V 280 %X Enhancing the robustness of deep learning models, particularly in the realm of vision transformers (ViTs), is crucial for their real-world deployment. In this work, we provide a finetuning approach to enhance the robustness of vision transformers inspired by the concept of nullspace from linear algebra. Our investigation centers on whether a vision transformer can exhibit resilience to input variations akin to the nullspace property in linear mappings, implying that perturbations sampled from this nullspace do not influence the model’s output when added to the input. We start from the observation that many existing ViTs satisfy this property because a non-trivial nullspace exists in their patch embedding layers. Secondly, as nullspace is a concept associated with linear algebra, we demonstrate that it is possible to synthesize approximate nullspace elements for the non-linear blocks of ViTs employing an optimisation strategy. Finally, we propose a fine-tuning strategy for ViTs wherein we augment the training data with synthesized approximate nullspace noise. After finetuning, we find that the model demonstrates robustness to adversarial and natural image perbutations alike.
APA
Liu, H., Singh, A., Li, Y. & Wang, H.. (2025). Approximate Nullspace Augmented Finetuning for Robust Vision Transformers. Conference on Parsimony and Learning, in Proceedings of Machine Learning Research 280:1-23 Available from https://proceedings.mlr.press/v280/liu25a.html.

Related Material