Multi Resolution Analysis (MRA) for Approximate Self-Attention

Zhanpeng Zeng; Sourav Pal; Jeffery Kline; Glenn M Fung; Vikas Singh

Multi Resolution Analysis (MRA) for Approximate Self-Attention

Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:25955-25972, 2022.

Abstract

Transformers have emerged as a preferred model for many tasks in natural langugage processing and vision. Recent efforts on training and deploying Transformers more efficiently have identified many strategies to approximate the self-attention matrix, a key module in a Transformer architecture. Effective ideas include various prespecified sparsity patterns, low-rank basis expansions and combinations thereof. In this paper, we revisit classical Multiresolution Analysis (MRA) concepts such as Wavelets, whose potential value in this setting remains underexplored thus far. We show that simple approximations based on empirical feedback and design choices informed by modern hardware and implementation challenges, eventually yield a MRA-based approach for self-attention with an excellent performance profile across most criteria of interest. We undertake an extensive set of experiments and demonstrate that this multi-resolution scheme outperforms most efficient self-attention proposals and is favorable for both short and long sequences. Code is available at \url{https://github.com/mlpen/mra-attention}.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-zeng22a,
  title = 	 {Multi Resolution Analysis ({MRA}) for Approximate Self-Attention},
  author =       {Zeng, Zhanpeng and Pal, Sourav and Kline, Jeffery and Fung, Glenn M and Singh, Vikas},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {25955--25972},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/zeng22a/zeng22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/zeng22a.html},
  abstract = 	 {Transformers have emerged as a preferred model for many tasks in natural langugage processing and vision. Recent efforts on training and deploying Transformers more efficiently have identified many strategies to approximate the self-attention matrix, a key module in a Transformer architecture. Effective ideas include various prespecified sparsity patterns, low-rank basis expansions and combinations thereof. In this paper, we revisit classical Multiresolution Analysis (MRA) concepts such as Wavelets, whose potential value in this setting remains underexplored thus far. We show that simple approximations based on empirical feedback and design choices informed by modern hardware and implementation challenges, eventually yield a MRA-based approach for self-attention with an excellent performance profile across most criteria of interest. We undertake an extensive set of experiments and demonstrate that this multi-resolution scheme outperforms most efficient self-attention proposals and is favorable for both short and long sequences. Code is available at \url{https://github.com/mlpen/mra-attention}.}
}

Endnote

%0 Conference Paper
%T Multi Resolution Analysis (MRA) for Approximate Self-Attention
%A Zhanpeng Zeng
%A Sourav Pal
%A Jeffery Kline
%A Glenn M Fung
%A Vikas Singh
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-zeng22a
%I PMLR
%P 25955--25972
%U https://proceedings.mlr.press/v162/zeng22a.html
%V 162
%X Transformers have emerged as a preferred model for many tasks in natural langugage processing and vision. Recent efforts on training and deploying Transformers more efficiently have identified many strategies to approximate the self-attention matrix, a key module in a Transformer architecture. Effective ideas include various prespecified sparsity patterns, low-rank basis expansions and combinations thereof. In this paper, we revisit classical Multiresolution Analysis (MRA) concepts such as Wavelets, whose potential value in this setting remains underexplored thus far. We show that simple approximations based on empirical feedback and design choices informed by modern hardware and implementation challenges, eventually yield a MRA-based approach for self-attention with an excellent performance profile across most criteria of interest. We undertake an extensive set of experiments and demonstrate that this multi-resolution scheme outperforms most efficient self-attention proposals and is favorable for both short and long sequences. Code is available at \url{https://github.com/mlpen/mra-attention}.

APA


Zeng, Z., Pal, S., Kline, J., Fung, G.M. & Singh, V.. (2022). Multi Resolution Analysis (MRA) for Approximate Self-Attention. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:25955-25972 Available from https://proceedings.mlr.press/v162/zeng22a.html.

Related Material

Download PDF