Self-supervised and Supervised Joint Training for Resource-rich Machine Translation

Yong Cheng, Wei Wang, Lu Jiang, Wolfgang Macherey
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:1825-1835, 2021.

Abstract

Self-supervised pre-training of text representations has been successfully applied to low-resource Neural Machine Translation (NMT). However, it usually fails to achieve notable gains on resource-rich NMT. In this paper, we propose a joint training approach, F2-XEnDec, to combine self-supervised and supervised learning to optimize NMT models. To exploit complementary self-supervised signals for supervised learning, NMT models are trained on examples that are interbred from monolingual and parallel sentences through a new process called crossover encoder-decoder. Experiments on two resource-rich translation benchmarks, WMT’14 English-German and WMT’14 English-French, demonstrate that our approach achieves substantial improvements over several strong baseline methods and obtains a new state of the art of 46.19 BLEU on English-French when incorporating back translation. Results also show that our approach is capable of improving model robustness to input perturbations such as code-switching noise which frequently appears on the social media.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-cheng21b, title = {Self-supervised and Supervised Joint Training for Resource-rich Machine Translation}, author = {Cheng, Yong and Wang, Wei and Jiang, Lu and Macherey, Wolfgang}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {1825--1835}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/cheng21b/cheng21b.pdf}, url = {https://proceedings.mlr.press/v139/cheng21b.html}, abstract = {Self-supervised pre-training of text representations has been successfully applied to low-resource Neural Machine Translation (NMT). However, it usually fails to achieve notable gains on resource-rich NMT. In this paper, we propose a joint training approach, F2-XEnDec, to combine self-supervised and supervised learning to optimize NMT models. To exploit complementary self-supervised signals for supervised learning, NMT models are trained on examples that are interbred from monolingual and parallel sentences through a new process called crossover encoder-decoder. Experiments on two resource-rich translation benchmarks, WMT’14 English-German and WMT’14 English-French, demonstrate that our approach achieves substantial improvements over several strong baseline methods and obtains a new state of the art of 46.19 BLEU on English-French when incorporating back translation. Results also show that our approach is capable of improving model robustness to input perturbations such as code-switching noise which frequently appears on the social media.} }
Endnote
%0 Conference Paper %T Self-supervised and Supervised Joint Training for Resource-rich Machine Translation %A Yong Cheng %A Wei Wang %A Lu Jiang %A Wolfgang Macherey %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-cheng21b %I PMLR %P 1825--1835 %U https://proceedings.mlr.press/v139/cheng21b.html %V 139 %X Self-supervised pre-training of text representations has been successfully applied to low-resource Neural Machine Translation (NMT). However, it usually fails to achieve notable gains on resource-rich NMT. In this paper, we propose a joint training approach, F2-XEnDec, to combine self-supervised and supervised learning to optimize NMT models. To exploit complementary self-supervised signals for supervised learning, NMT models are trained on examples that are interbred from monolingual and parallel sentences through a new process called crossover encoder-decoder. Experiments on two resource-rich translation benchmarks, WMT’14 English-German and WMT’14 English-French, demonstrate that our approach achieves substantial improvements over several strong baseline methods and obtains a new state of the art of 46.19 BLEU on English-French when incorporating back translation. Results also show that our approach is capable of improving model robustness to input perturbations such as code-switching noise which frequently appears on the social media.
APA
Cheng, Y., Wang, W., Jiang, L. & Macherey, W.. (2021). Self-supervised and Supervised Joint Training for Resource-rich Machine Translation. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:1825-1835 Available from https://proceedings.mlr.press/v139/cheng21b.html.

Related Material