Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention

Aaron J Havens, Alexandre Araujo, Huan Zhang, Bin Hu
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:17680-17696, 2024.

Abstract

Self-attention has been widely used in various machine learning models, such as vision transformers. The standard dot-product self-attention is arguably the most popular structure, and there is a growing interest in understanding the mathematical properties of such attention mechanisms. This paper presents a fine-grained local sensitivity analysis of the standard dot-product self-attention, leading to new non-vacuous certified robustness results for vision transformers. Despite the well-known fact that dot-product self-attention is not (globally) Lipschitz, we develop new theoretical analysis of Local Fine-grained Attention Sensitivity (LoFAST) quantifying the effect of input feature perturbations on the attention output. Our analysis reveals that the local sensitivity of dot-product self-attention to $\ell_2$ perturbations can actually be controlled by several key quantities associated with the attention weight matrices and the unperturbed input. We empirically validate our theoretical findings by computing non-vacuous certified $\ell_2$-robustness for vision transformers on CIFAR-10 and SVHN datasets. The code for LoFAST is available at https://github.com/AaronHavens/LoFAST.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-havens24a, title = {Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention}, author = {Havens, Aaron J and Araujo, Alexandre and Zhang, Huan and Hu, Bin}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {17680--17696}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/havens24a/havens24a.pdf}, url = {https://proceedings.mlr.press/v235/havens24a.html}, abstract = {Self-attention has been widely used in various machine learning models, such as vision transformers. The standard dot-product self-attention is arguably the most popular structure, and there is a growing interest in understanding the mathematical properties of such attention mechanisms. This paper presents a fine-grained local sensitivity analysis of the standard dot-product self-attention, leading to new non-vacuous certified robustness results for vision transformers. Despite the well-known fact that dot-product self-attention is not (globally) Lipschitz, we develop new theoretical analysis of Local Fine-grained Attention Sensitivity (LoFAST) quantifying the effect of input feature perturbations on the attention output. Our analysis reveals that the local sensitivity of dot-product self-attention to $\ell_2$ perturbations can actually be controlled by several key quantities associated with the attention weight matrices and the unperturbed input. We empirically validate our theoretical findings by computing non-vacuous certified $\ell_2$-robustness for vision transformers on CIFAR-10 and SVHN datasets. The code for LoFAST is available at https://github.com/AaronHavens/LoFAST.} }
Endnote
%0 Conference Paper %T Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention %A Aaron J Havens %A Alexandre Araujo %A Huan Zhang %A Bin Hu %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-havens24a %I PMLR %P 17680--17696 %U https://proceedings.mlr.press/v235/havens24a.html %V 235 %X Self-attention has been widely used in various machine learning models, such as vision transformers. The standard dot-product self-attention is arguably the most popular structure, and there is a growing interest in understanding the mathematical properties of such attention mechanisms. This paper presents a fine-grained local sensitivity analysis of the standard dot-product self-attention, leading to new non-vacuous certified robustness results for vision transformers. Despite the well-known fact that dot-product self-attention is not (globally) Lipschitz, we develop new theoretical analysis of Local Fine-grained Attention Sensitivity (LoFAST) quantifying the effect of input feature perturbations on the attention output. Our analysis reveals that the local sensitivity of dot-product self-attention to $\ell_2$ perturbations can actually be controlled by several key quantities associated with the attention weight matrices and the unperturbed input. We empirically validate our theoretical findings by computing non-vacuous certified $\ell_2$-robustness for vision transformers on CIFAR-10 and SVHN datasets. The code for LoFAST is available at https://github.com/AaronHavens/LoFAST.
APA
Havens, A.J., Araujo, A., Zhang, H. & Hu, B.. (2024). Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:17680-17696 Available from https://proceedings.mlr.press/v235/havens24a.html.

Related Material