Aligned Cross Entropy for Non-Autoregressive Machine Translation

Marjan Ghazvininejad, Vladimir Karpukhin, Luke Zettlemoyer, Omer Levy
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:3515-3523, 2020.

Abstract

Non-autoregressive machine translation models significantly speed up decoding by allowing for parallel prediction of the entire target sequence. However, modeling word order is more challenging due to the lack of autoregressive factors in the model. This difficultly is compounded during training with cross entropy loss, which can highly penalize small shifts in word order. In this paper, we propose aligned cross entropy (AXE) as an alternative loss function for training of non-autoregressive models. AXE uses a differentiable dynamic program to assign loss based on the best possible monotonic alignment between target tokens and model predictions. AXE-based training of conditional masked language models (CMLMs) substantially improves performance on major WMT benchmarks, while setting a new state of the art for non-autoregressive models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-ghazvininejad20a, title = {Aligned Cross Entropy for Non-Autoregressive Machine Translation}, author = {Ghazvininejad, Marjan and Karpukhin, Vladimir and Zettlemoyer, Luke and Levy, Omer}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {3515--3523}, year = {2020}, editor = {Hal Daumé III and Aarti Singh}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/ghazvininejad20a/ghazvininejad20a.pdf}, url = { http://proceedings.mlr.press/v119/ghazvininejad20a.html }, abstract = {Non-autoregressive machine translation models significantly speed up decoding by allowing for parallel prediction of the entire target sequence. However, modeling word order is more challenging due to the lack of autoregressive factors in the model. This difficultly is compounded during training with cross entropy loss, which can highly penalize small shifts in word order. In this paper, we propose aligned cross entropy (AXE) as an alternative loss function for training of non-autoregressive models. AXE uses a differentiable dynamic program to assign loss based on the best possible monotonic alignment between target tokens and model predictions. AXE-based training of conditional masked language models (CMLMs) substantially improves performance on major WMT benchmarks, while setting a new state of the art for non-autoregressive models.} }
Endnote
%0 Conference Paper %T Aligned Cross Entropy for Non-Autoregressive Machine Translation %A Marjan Ghazvininejad %A Vladimir Karpukhin %A Luke Zettlemoyer %A Omer Levy %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-ghazvininejad20a %I PMLR %P 3515--3523 %U http://proceedings.mlr.press/v119/ghazvininejad20a.html %V 119 %X Non-autoregressive machine translation models significantly speed up decoding by allowing for parallel prediction of the entire target sequence. However, modeling word order is more challenging due to the lack of autoregressive factors in the model. This difficultly is compounded during training with cross entropy loss, which can highly penalize small shifts in word order. In this paper, we propose aligned cross entropy (AXE) as an alternative loss function for training of non-autoregressive models. AXE uses a differentiable dynamic program to assign loss based on the best possible monotonic alignment between target tokens and model predictions. AXE-based training of conditional masked language models (CMLMs) substantially improves performance on major WMT benchmarks, while setting a new state of the art for non-autoregressive models.
APA
Ghazvininejad, M., Karpukhin, V., Zettlemoyer, L. & Levy, O.. (2020). Aligned Cross Entropy for Non-Autoregressive Machine Translation. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:3515-3523 Available from http://proceedings.mlr.press/v119/ghazvininejad20a.html .

Related Material