The Unreasonable Effectiveness of Few-shot Learning for Machine Translation

Xavier Garcia; Yamini Bansal; Colin Cherry; George Foster; Maxim Krikun; Melvin Johnson; Orhan Firat

The Unreasonable Effectiveness of Few-shot Learning for Machine Translation

Xavier Garcia, Yamini Bansal, Colin Cherry, George Foster, Maxim Krikun, Melvin Johnson, Orhan Firat

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:10867-10878, 2023.

Abstract

We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs. We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems. In particular, we outperform the best performing system on the WMT’21 English-Chinese news translation task by only using five examples of English-Chinese parallel data at inference. Furthermore, the resulting models are two orders of magnitude smaller than state-of-the-art language models. We then analyze the factors which impact the performance of few-shot translation systems, and highlight that the quality of the few-shot demonstrations heavily determines the quality of the translations generated by our models. Finally, we show that the few-shot paradigm also provides a way to control certain attributes of the translation — we show that we are able to control for regional varieties and formality using only a five examples at inference, paving the way towards controllable machine translation systems.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-garcia23a,
  title = 	 {The Unreasonable Effectiveness of Few-shot Learning for Machine Translation},
  author =       {Garcia, Xavier and Bansal, Yamini and Cherry, Colin and Foster, George and Krikun, Maxim and Johnson, Melvin and Firat, Orhan},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {10867--10878},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/garcia23a/garcia23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/garcia23a.html},
  abstract = 	 {We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs. We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems. In particular, we outperform the best performing system on the WMT’21 English-Chinese news translation task by only using five examples of English-Chinese parallel data at inference. Furthermore, the resulting models are two orders of magnitude smaller than state-of-the-art language models. We then analyze the factors which impact the performance of few-shot translation systems, and highlight that the quality of the few-shot demonstrations heavily determines the quality of the translations generated by our models. Finally, we show that the few-shot paradigm also provides a way to control certain attributes of the translation — we show that we are able to control for regional varieties and formality using only a five examples at inference, paving the way towards controllable machine translation systems.}
}

Endnote

%0 Conference Paper
%T The Unreasonable Effectiveness of Few-shot Learning for Machine Translation
%A Xavier Garcia
%A Yamini Bansal
%A Colin Cherry
%A George Foster
%A Maxim Krikun
%A Melvin Johnson
%A Orhan Firat
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-garcia23a
%I PMLR
%P 10867--10878
%U https://proceedings.mlr.press/v202/garcia23a.html
%V 202
%X We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs. We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems. In particular, we outperform the best performing system on the WMT’21 English-Chinese news translation task by only using five examples of English-Chinese parallel data at inference. Furthermore, the resulting models are two orders of magnitude smaller than state-of-the-art language models. We then analyze the factors which impact the performance of few-shot translation systems, and highlight that the quality of the few-shot demonstrations heavily determines the quality of the translations generated by our models. Finally, we show that the few-shot paradigm also provides a way to control certain attributes of the translation — we show that we are able to control for regional varieties and formality using only a five examples at inference, paving the way towards controllable machine translation systems.

APA


Garcia, X., Bansal, Y., Cherry, C., Foster, G., Krikun, M., Johnson, M. & Firat, O.. (2023). The Unreasonable Effectiveness of Few-shot Learning for Machine Translation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:10867-10878 Available from https://proceedings.mlr.press/v202/garcia23a.html.

The Unreasonable Effectiveness of Few-shot Learning for Machine Translation

Abstract

Cite this Paper

Related Material