General-purpose, long-context autoregressive modeling with Perceiver AR

Curtis Hawthorne; Andrew Jaegle; Cătălina Cangea; Sebastian Borgeaud; Charlie Nash; Mateusz Malinowski; Sander Dieleman; Oriol Vinyals; Matthew Botvinick; Ian Simon; Hannah Sheahan; Neil Zeghidour; Jean-Baptiste Alayrac; Joao Carreira; Jesse Engel

General-purpose, long-context autoregressive modeling with Perceiver AR

Curtis Hawthorne, Andrew Jaegle, Cătălina Cangea, Sebastian Borgeaud, Charlie Nash, Mateusz Malinowski, Sander Dieleman, Oriol Vinyals, Matthew Botvinick, Ian Simon, Hannah Sheahan, Neil Zeghidour, Jean-Baptiste Alayrac, Joao Carreira, Jesse Engel

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:8535-8558, 2022.

Abstract

Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. However, the most commonly used autoregressive models, Transformers, are prohibitively expensive to scale to the number of inputs and layers needed to capture this long-range structure. We develop Perceiver AR, an autoregressive, modality-agnostic architecture which uses cross-attention to map long-range inputs to a small number of latents while also maintaining end-to-end causal masking. Perceiver AR can directly attend to over a hundred thousand tokens, enabling practical long-context density estimation without the need for hand-crafted sparsity patterns or memory mechanisms. When trained on images or music, Perceiver AR generates outputs with clear long-term coherence and structure. Our architecture also obtains state-of-the-art likelihood on long-sequence benchmarks, including 64x64 ImageNet images and PG-19 books.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-hawthorne22a,
  title = 	 {General-purpose, long-context autoregressive modeling with Perceiver {AR}},
  author =       {Hawthorne, Curtis and Jaegle, Andrew and Cangea, C{\u{a}}t{\u{a}}lina and Borgeaud, Sebastian and Nash, Charlie and Malinowski, Mateusz and Dieleman, Sander and Vinyals, Oriol and Botvinick, Matthew and Simon, Ian and Sheahan, Hannah and Zeghidour, Neil and Alayrac, Jean-Baptiste and Carreira, Joao and Engel, Jesse},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {8535--8558},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/hawthorne22a/hawthorne22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/hawthorne22a.html},
  abstract = 	 {Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. However, the most commonly used autoregressive models, Transformers, are prohibitively expensive to scale to the number of inputs and layers needed to capture this long-range structure. We develop Perceiver AR, an autoregressive, modality-agnostic architecture which uses cross-attention to map long-range inputs to a small number of latents while also maintaining end-to-end causal masking. Perceiver AR can directly attend to over a hundred thousand tokens, enabling practical long-context density estimation without the need for hand-crafted sparsity patterns or memory mechanisms. When trained on images or music, Perceiver AR generates outputs with clear long-term coherence and structure. Our architecture also obtains state-of-the-art likelihood on long-sequence benchmarks, including 64x64 ImageNet images and PG-19 books.}
}

Endnote

%0 Conference Paper
%T General-purpose, long-context autoregressive modeling with Perceiver AR
%A Curtis Hawthorne
%A Andrew Jaegle
%A Cătălina Cangea
%A Sebastian Borgeaud
%A Charlie Nash
%A Mateusz Malinowski
%A Sander Dieleman
%A Oriol Vinyals
%A Matthew Botvinick
%A Ian Simon
%A Hannah Sheahan
%A Neil Zeghidour
%A Jean-Baptiste Alayrac
%A Joao Carreira
%A Jesse Engel
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-hawthorne22a
%I PMLR
%P 8535--8558
%U https://proceedings.mlr.press/v162/hawthorne22a.html
%V 162
%X Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. However, the most commonly used autoregressive models, Transformers, are prohibitively expensive to scale to the number of inputs and layers needed to capture this long-range structure. We develop Perceiver AR, an autoregressive, modality-agnostic architecture which uses cross-attention to map long-range inputs to a small number of latents while also maintaining end-to-end causal masking. Perceiver AR can directly attend to over a hundred thousand tokens, enabling practical long-context density estimation without the need for hand-crafted sparsity patterns or memory mechanisms. When trained on images or music, Perceiver AR generates outputs with clear long-term coherence and structure. Our architecture also obtains state-of-the-art likelihood on long-sequence benchmarks, including 64x64 ImageNet images and PG-19 books.

APA


Hawthorne, C., Jaegle, A., Cangea, C., Borgeaud, S., Nash, C., Malinowski, M., Dieleman, S., Vinyals, O., Botvinick, M., Simon, I., Sheahan, H., Zeghidour, N., Alayrac, J., Carreira, J. & Engel, J.. (2022). General-purpose, long-context autoregressive modeling with Perceiver AR. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:8535-8558 Available from https://proceedings.mlr.press/v162/hawthorne22a.html.

Related Material

Download PDF