BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining

Weizhen Qi, Yeyun Gong, Jian Jiao, Yu Yan, Weizhu Chen, Dayiheng Liu, Kewen Tang, Houqiang Li, Jiusheng Chen, Ruofei Zhang, Ming Zhou, Nan Duan
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:8630-8639, 2021.

Abstract

In this paper, we propose BANG, a new pretraining model to Bridge the gap between Autoregressive (AR) and Non-autoregressive (NAR) Generation. AR and NAR generation can be uniformly regarded as to what extent previous tokens can be attended, and BANG bridges AR and NAR generation through designing a novel model structure for large-scale pre-training. A pretrained BANG model can simultaneously support AR, NAR, and semi-NAR generation to meet different requirements. Experiments on question generation (SQuAD 1.1), summarization (XSum), and dialogue generation (PersonaChat) show that BANG improves NAR and semi-NAR performance significantly as well as attaining comparable performance with strong AR pretrained models. Compared with the semi-NAR strong baselines, BANG achieves absolute improvements of 14.01 and 5.24 in the overall scores of SQuAD 1.1 and XSum, respectively. In addition, BANG achieves absolute improvements of 10.73, 6.39, and 5.90 in the overall scores of SQuAD, XSUM, and PersonaChat compared with the NAR strong baselines, respectively. Our code will be made publicly available.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-qi21a, title = {BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining}, author = {Qi, Weizhen and Gong, Yeyun and Jiao, Jian and Yan, Yu and Chen, Weizhu and Liu, Dayiheng and Tang, Kewen and Li, Houqiang and Chen, Jiusheng and Zhang, Ruofei and Zhou, Ming and Duan, Nan}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {8630--8639}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/qi21a/qi21a.pdf}, url = {https://proceedings.mlr.press/v139/qi21a.html}, abstract = {In this paper, we propose BANG, a new pretraining model to Bridge the gap between Autoregressive (AR) and Non-autoregressive (NAR) Generation. AR and NAR generation can be uniformly regarded as to what extent previous tokens can be attended, and BANG bridges AR and NAR generation through designing a novel model structure for large-scale pre-training. A pretrained BANG model can simultaneously support AR, NAR, and semi-NAR generation to meet different requirements. Experiments on question generation (SQuAD 1.1), summarization (XSum), and dialogue generation (PersonaChat) show that BANG improves NAR and semi-NAR performance significantly as well as attaining comparable performance with strong AR pretrained models. Compared with the semi-NAR strong baselines, BANG achieves absolute improvements of 14.01 and 5.24 in the overall scores of SQuAD 1.1 and XSum, respectively. In addition, BANG achieves absolute improvements of 10.73, 6.39, and 5.90 in the overall scores of SQuAD, XSUM, and PersonaChat compared with the NAR strong baselines, respectively. Our code will be made publicly available.} }
Endnote
%0 Conference Paper %T BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining %A Weizhen Qi %A Yeyun Gong %A Jian Jiao %A Yu Yan %A Weizhu Chen %A Dayiheng Liu %A Kewen Tang %A Houqiang Li %A Jiusheng Chen %A Ruofei Zhang %A Ming Zhou %A Nan Duan %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-qi21a %I PMLR %P 8630--8639 %U https://proceedings.mlr.press/v139/qi21a.html %V 139 %X In this paper, we propose BANG, a new pretraining model to Bridge the gap between Autoregressive (AR) and Non-autoregressive (NAR) Generation. AR and NAR generation can be uniformly regarded as to what extent previous tokens can be attended, and BANG bridges AR and NAR generation through designing a novel model structure for large-scale pre-training. A pretrained BANG model can simultaneously support AR, NAR, and semi-NAR generation to meet different requirements. Experiments on question generation (SQuAD 1.1), summarization (XSum), and dialogue generation (PersonaChat) show that BANG improves NAR and semi-NAR performance significantly as well as attaining comparable performance with strong AR pretrained models. Compared with the semi-NAR strong baselines, BANG achieves absolute improvements of 14.01 and 5.24 in the overall scores of SQuAD 1.1 and XSum, respectively. In addition, BANG achieves absolute improvements of 10.73, 6.39, and 5.90 in the overall scores of SQuAD, XSUM, and PersonaChat compared with the NAR strong baselines, respectively. Our code will be made publicly available.
APA
Qi, W., Gong, Y., Jiao, J., Yan, Y., Chen, W., Liu, D., Tang, K., Li, H., Chen, J., Zhang, R., Zhou, M. & Duan, N.. (2021). BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:8630-8639 Available from https://proceedings.mlr.press/v139/qi21a.html.

Related Material