Poolingformer: Long Document Modeling with Pooling Attention

Hang Zhang; Yeyun Gong; Yelong Shen; Weisheng Li; Jiancheng Lv; Nan Duan; Weizhu Chen

Poolingformer: Long Document Modeling with Pooling Attention

Hang Zhang, Yeyun Gong, Yelong Shen, Weisheng Li, Jiancheng Lv, Nan Duan, Weizhu Chen

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:12437-12446, 2021.

Abstract

In this paper, we introduce a two-level attention schema, Poolingformer, for long document modeling. Its first level uses a smaller sliding window pattern to aggregate information from neighbors. Its second level employs a larger window to increase receptive fields with pooling attention to reduce both computational cost and memory consumption. We first evaluate Poolingformer on two long sequence QA tasks: the monolingual NQ and the multilingual TyDi QA. Experimental results show that Poolingformer sits atop three official leaderboards measured by F1, outperforming previous state-of-the-art models by 1.9 points (79.8 vs. 77.9) on NQ long answer, 1.9 points (79.5 vs. 77.6) on TyDi QA passage answer, and 1.6 points (67.6 vs. 66.0) on TyDi QA minimal answer. We further evaluate Poolingformer on a long sequence summarization task. Experimental results on the arXiv benchmark continue to demonstrate its superior performance.

Cite this Paper

BibTeX


@InProceedings{pmlr-v139-zhang21h,
  title = 	 {Poolingformer: Long Document Modeling with Pooling Attention},
  author =       {Zhang, Hang and Gong, Yeyun and Shen, Yelong and Li, Weisheng and Lv, Jiancheng and Duan, Nan and Chen, Weizhu},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {12437--12446},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/zhang21h/zhang21h.pdf},
  url = 	 {https://proceedings.mlr.press/v139/zhang21h.html},
  abstract = 	 {In this paper, we introduce a two-level attention schema, Poolingformer, for long document modeling. Its first level uses a smaller sliding window pattern to aggregate information from neighbors. Its second level employs a larger window to increase receptive fields with pooling attention to reduce both computational cost and memory consumption. We first evaluate Poolingformer on two long sequence QA tasks: the monolingual NQ and the multilingual TyDi QA. Experimental results show that Poolingformer sits atop three official leaderboards measured by F1, outperforming previous state-of-the-art models by 1.9 points (79.8 vs. 77.9) on NQ long answer, 1.9 points (79.5 vs. 77.6) on TyDi QA passage answer, and 1.6 points (67.6 vs. 66.0) on TyDi QA minimal answer. We further evaluate Poolingformer on a long sequence summarization task. Experimental results on the arXiv benchmark continue to demonstrate its superior performance.}
}

Endnote

%0 Conference Paper
%T Poolingformer: Long Document Modeling with Pooling Attention
%A Hang Zhang
%A Yeyun Gong
%A Yelong Shen
%A Weisheng Li
%A Jiancheng Lv
%A Nan Duan
%A Weizhu Chen
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-zhang21h
%I PMLR
%P 12437--12446
%U https://proceedings.mlr.press/v139/zhang21h.html
%V 139
%X In this paper, we introduce a two-level attention schema, Poolingformer, for long document modeling. Its first level uses a smaller sliding window pattern to aggregate information from neighbors. Its second level employs a larger window to increase receptive fields with pooling attention to reduce both computational cost and memory consumption. We first evaluate Poolingformer on two long sequence QA tasks: the monolingual NQ and the multilingual TyDi QA. Experimental results show that Poolingformer sits atop three official leaderboards measured by F1, outperforming previous state-of-the-art models by 1.9 points (79.8 vs. 77.9) on NQ long answer, 1.9 points (79.5 vs. 77.6) on TyDi QA passage answer, and 1.6 points (67.6 vs. 66.0) on TyDi QA minimal answer. We further evaluate Poolingformer on a long sequence summarization task. Experimental results on the arXiv benchmark continue to demonstrate its superior performance.

APA


Zhang, H., Gong, Y., Shen, Y., Li, W., Lv, J., Duan, N. & Chen, W.. (2021). Poolingformer: Long Document Modeling with Pooling Attention. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:12437-12446 Available from https://proceedings.mlr.press/v139/zhang21h.html.

Related Material

Download PDF