Area Attention

Yang Li; Lukasz Kaiser; Samy Bengio; Si Si

Area Attention

Yang Li, Lukasz Kaiser, Samy Bengio, Si Si

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:3846-3855, 2019.

Abstract

Existing attention mechanisms are trained to attend to individual items in a collection (the memory) with a predefined, fixed granularity, e.g., a word token or an image grid. We propose area attention: a way to attend to areas in the memory, where each area contains a group of items that are structurally adjacent, e.g., spatially for a 2D memory such as images, or temporally for a 1D memory such as natural language sentences. Importantly, the shape and the size of an area are dynamically determined via learning, which enables a model to attend to information with varying granularity. Area attention can easily work with existing model architectures such as multi-head attention for simultaneously attending to multiple areas in the memory. We evaluate area attention on two tasks: neural machine translation (both character and token-level) and image captioning, and improve upon strong (state-of-the-art) baselines in all the cases. These improvements are obtainable with a basic form of area attention that is parameter free.

Cite this Paper

BibTeX


@InProceedings{pmlr-v97-li19e,
  title = 	 {Area Attention},
  author =       {Li, Yang and Kaiser, Lukasz and Bengio, Samy and Si, Si},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {3846--3855},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/li19e/li19e.pdf},
  url = 	 {https://proceedings.mlr.press/v97/li19e.html},
  abstract = 	 {Existing attention mechanisms are trained to attend to individual items in a collection (the memory) with a predefined, fixed granularity, e.g., a word token or an image grid. We propose area attention: a way to attend to areas in the memory, where each area contains a group of items that are structurally adjacent, e.g., spatially for a 2D memory such as images, or temporally for a 1D memory such as natural language sentences. Importantly, the shape and the size of an area are dynamically determined via learning, which enables a model to attend to information with varying granularity. Area attention can easily work with existing model architectures such as multi-head attention for simultaneously attending to multiple areas in the memory. We evaluate area attention on two tasks: neural machine translation (both character and token-level) and image captioning, and improve upon strong (state-of-the-art) baselines in all the cases. These improvements are obtainable with a basic form of area attention that is parameter free.}
}

Endnote

%0 Conference Paper
%T Area Attention
%A Yang Li
%A Lukasz Kaiser
%A Samy Bengio
%A Si Si
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-li19e
%I PMLR
%P 3846--3855
%U https://proceedings.mlr.press/v97/li19e.html
%V 97
%X Existing attention mechanisms are trained to attend to individual items in a collection (the memory) with a predefined, fixed granularity, e.g., a word token or an image grid. We propose area attention: a way to attend to areas in the memory, where each area contains a group of items that are structurally adjacent, e.g., spatially for a 2D memory such as images, or temporally for a 1D memory such as natural language sentences. Importantly, the shape and the size of an area are dynamically determined via learning, which enables a model to attend to information with varying granularity. Area attention can easily work with existing model architectures such as multi-head attention for simultaneously attending to multiple areas in the memory. We evaluate area attention on two tasks: neural machine translation (both character and token-level) and image captioning, and improve upon strong (state-of-the-art) baselines in all the cases. These improvements are obtainable with a basic form of area attention that is parameter free.

APA


Li, Y., Kaiser, L., Bengio, S. & Si, S.. (2019). Area Attention. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:3846-3855 Available from https://proceedings.mlr.press/v97/li19e.html.

Area Attention

Abstract

Cite this Paper

Related Material