Self-supervised and Supervised Joint Training for Resource-rich Machine Translation

Yong Cheng; Wei Wang; Lu Jiang; Wolfgang Macherey

Self-supervised and Supervised Joint Training for Resource-rich Machine Translation

Yong Cheng, Wei Wang, Lu Jiang, Wolfgang Macherey

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:1825-1835, 2021.

Abstract

Self-supervised pre-training of text representations has been successfully applied to low-resource Neural Machine Translation (NMT). However, it usually fails to achieve notable gains on resource-rich NMT. In this paper, we propose a joint training approach, F2-XEnDec, to combine self-supervised and supervised learning to optimize NMT models. To exploit complementary self-supervised signals for supervised learning, NMT models are trained on examples that are interbred from monolingual and parallel sentences through a new process called crossover encoder-decoder. Experiments on two resource-rich translation benchmarks, WMT’14 English-German and WMT’14 English-French, demonstrate that our approach achieves substantial improvements over several strong baseline methods and obtains a new state of the art of 46.19 BLEU on English-French when incorporating back translation. Results also show that our approach is capable of improving model robustness to input perturbations such as code-switching noise which frequently appears on the social media.

Cite this Paper

BibTeX


@InProceedings{pmlr-v139-cheng21b,
  title = 	 {Self-supervised and Supervised Joint Training for Resource-rich Machine Translation},
  author =       {Cheng, Yong and Wang, Wei and Jiang, Lu and Macherey, Wolfgang},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {1825--1835},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/cheng21b/cheng21b.pdf},
  url = 	 {https://proceedings.mlr.press/v139/cheng21b.html},
  abstract = 	 {Self-supervised pre-training of text representations has been successfully applied to low-resource Neural Machine Translation (NMT). However, it usually fails to achieve notable gains on resource-rich NMT. In this paper, we propose a joint training approach, F2-XEnDec, to combine self-supervised and supervised learning to optimize NMT models. To exploit complementary self-supervised signals for supervised learning, NMT models are trained on examples that are interbred from monolingual and parallel sentences through a new process called crossover encoder-decoder. Experiments on two resource-rich translation benchmarks, WMT’14 English-German and WMT’14 English-French, demonstrate that our approach achieves substantial improvements over several strong baseline methods and obtains a new state of the art of 46.19 BLEU on English-French when incorporating back translation. Results also show that our approach is capable of improving model robustness to input perturbations such as code-switching noise which frequently appears on the social media.}
}

Endnote

%0 Conference Paper
%T Self-supervised and Supervised Joint Training for Resource-rich Machine Translation
%A Yong Cheng
%A Wei Wang
%A Lu Jiang
%A Wolfgang Macherey
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-cheng21b
%I PMLR
%P 1825--1835
%U https://proceedings.mlr.press/v139/cheng21b.html
%V 139
%X Self-supervised pre-training of text representations has been successfully applied to low-resource Neural Machine Translation (NMT). However, it usually fails to achieve notable gains on resource-rich NMT. In this paper, we propose a joint training approach, F2-XEnDec, to combine self-supervised and supervised learning to optimize NMT models. To exploit complementary self-supervised signals for supervised learning, NMT models are trained on examples that are interbred from monolingual and parallel sentences through a new process called crossover encoder-decoder. Experiments on two resource-rich translation benchmarks, WMT’14 English-German and WMT’14 English-French, demonstrate that our approach achieves substantial improvements over several strong baseline methods and obtains a new state of the art of 46.19 BLEU on English-French when incorporating back translation. Results also show that our approach is capable of improving model robustness to input perturbations such as code-switching noise which frequently appears on the social media.

APA


Cheng, Y., Wang, W., Jiang, L. & Macherey, W.. (2021). Self-supervised and Supervised Joint Training for Resource-rich Machine Translation. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:1825-1835 Available from https://proceedings.mlr.press/v139/cheng21b.html.

Self-supervised and Supervised Joint Training for Resource-rich Machine Translation

Abstract

Cite this Paper

Related Material