Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention

Aaron J Havens; Alexandre Araujo; Huan Zhang; Bin Hu

Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention

Aaron J Havens, Alexandre Araujo, Huan Zhang, Bin Hu

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:17680-17696, 2024.

Abstract

Self-attention has been widely used in various machine learning models, such as vision transformers. The standard dot-product self-attention is arguably the most popular structure, and there is a growing interest in understanding the mathematical properties of such attention mechanisms. This paper presents a fine-grained local sensitivity analysis of the standard dot-product self-attention, leading to new non-vacuous certified robustness results for vision transformers. Despite the well-known fact that dot-product self-attention is not (globally) Lipschitz, we develop new theoretical analysis of Local Fine-grained Attention Sensitivity (LoFAST) quantifying the effect of input feature perturbations on the attention output. Our analysis reveals that the local sensitivity of dot-product self-attention to $\ell_2$ perturbations can actually be controlled by several key quantities associated with the attention weight matrices and the unperturbed input. We empirically validate our theoretical findings by computing non-vacuous certified $\ell_2$-robustness for vision transformers on CIFAR-10 and SVHN datasets. The code for LoFAST is available at https://github.com/AaronHavens/LoFAST.

Cite this Paper

BibTeX

@InProceedings{pmlr-v235-havens24a,
  title = 	 {Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention},
  author =       {Havens, Aaron J and Araujo, Alexandre and Zhang, Huan and Hu, Bin},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {17680--17696},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/havens24a/havens24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/havens24a.html},
  abstract = 	 {Self-attention has been widely used in various machine learning models, such as vision transformers. The standard dot-product self-attention is arguably the most popular structure, and there is a growing interest in understanding the mathematical properties of such attention mechanisms. This paper presents a fine-grained local sensitivity analysis of the standard dot-product self-attention, leading to new non-vacuous certified robustness results for vision transformers. Despite the well-known fact that dot-product self-attention is not (globally) Lipschitz, we develop new theoretical analysis of Local Fine-grained Attention Sensitivity (LoFAST) quantifying the effect of input feature perturbations on the attention output. Our analysis reveals that the local sensitivity of dot-product self-attention to $\ell_2$ perturbations can actually be controlled by several key quantities associated with the attention weight matrices and the unperturbed input. We empirically validate our theoretical findings by computing non-vacuous certified $\ell_2$-robustness for vision transformers on CIFAR-10 and SVHN datasets. The code for LoFAST is available at https://github.com/AaronHavens/LoFAST.}
}

Endnote

%0 Conference Paper
%T Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention
%A Aaron J Havens
%A Alexandre Araujo
%A Huan Zhang
%A Bin Hu
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-havens24a
%I PMLR
%P 17680--17696
%U https://proceedings.mlr.press/v235/havens24a.html
%V 235
%X Self-attention has been widely used in various machine learning models, such as vision transformers. The standard dot-product self-attention is arguably the most popular structure, and there is a growing interest in understanding the mathematical properties of such attention mechanisms. This paper presents a fine-grained local sensitivity analysis of the standard dot-product self-attention, leading to new non-vacuous certified robustness results for vision transformers. Despite the well-known fact that dot-product self-attention is not (globally) Lipschitz, we develop new theoretical analysis of Local Fine-grained Attention Sensitivity (LoFAST) quantifying the effect of input feature perturbations on the attention output. Our analysis reveals that the local sensitivity of dot-product self-attention to $\ell_2$ perturbations can actually be controlled by several key quantities associated with the attention weight matrices and the unperturbed input. We empirically validate our theoretical findings by computing non-vacuous certified $\ell_2$-robustness for vision transformers on CIFAR-10 and SVHN datasets. The code for LoFAST is available at https://github.com/AaronHavens/LoFAST.

APA

Havens, A.J., Araujo, A., Zhang, H. & Hu, B.. (2024). Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:17680-17696 Available from https://proceedings.mlr.press/v235/havens24a.html.

Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention

Abstract

Cite this Paper

Related Material