Scaling Laws for Multilingual Neural Machine Translation

Patrick Fernandes, Behrooz Ghorbani, Xavier Garcia, Markus Freitag, Orhan Firat
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:10053-10071, 2023.

Abstract

In this work, we provide a large-scale empirical study of the scaling properties of multilingual neural machine translation models. We examine how increases in the model size affect the model performance and investigate the role of the individual language pair weights on the scaling behavior. We find that these weights only affect the multiplicative factor of the scaling law, and in particular, the scaling exponent is unaffected by them. Through a novel joint scaling law formulation, we compute the effective number of parameters allocated to each language pair and examine the role of language similarity in the scaling behavior of our models. We find little evidence that language similarity has any impact. In contrast, “direction” of the multilinguality plays a significant role, with models translating from multiple languages into English having a larger number of effective parameters per task than their reversed counterparts. Finally, we leverage our observations to predict the performance of multilingual models trained with any language weighting at any scale, greatly reducing efforts required for language balancing in large multilingual models. Our findings apply to both in-domain and out-of-domain test sets and to multiple evaluation metrics, such as ChrF and BLEURT.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-fernandes23a, title = {Scaling Laws for Multilingual Neural Machine Translation}, author = {Fernandes, Patrick and Ghorbani, Behrooz and Garcia, Xavier and Freitag, Markus and Firat, Orhan}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {10053--10071}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/fernandes23a/fernandes23a.pdf}, url = {https://proceedings.mlr.press/v202/fernandes23a.html}, abstract = {In this work, we provide a large-scale empirical study of the scaling properties of multilingual neural machine translation models. We examine how increases in the model size affect the model performance and investigate the role of the individual language pair weights on the scaling behavior. We find that these weights only affect the multiplicative factor of the scaling law, and in particular, the scaling exponent is unaffected by them. Through a novel joint scaling law formulation, we compute the effective number of parameters allocated to each language pair and examine the role of language similarity in the scaling behavior of our models. We find little evidence that language similarity has any impact. In contrast, “direction” of the multilinguality plays a significant role, with models translating from multiple languages into English having a larger number of effective parameters per task than their reversed counterparts. Finally, we leverage our observations to predict the performance of multilingual models trained with any language weighting at any scale, greatly reducing efforts required for language balancing in large multilingual models. Our findings apply to both in-domain and out-of-domain test sets and to multiple evaluation metrics, such as ChrF and BLEURT.} }
Endnote
%0 Conference Paper %T Scaling Laws for Multilingual Neural Machine Translation %A Patrick Fernandes %A Behrooz Ghorbani %A Xavier Garcia %A Markus Freitag %A Orhan Firat %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-fernandes23a %I PMLR %P 10053--10071 %U https://proceedings.mlr.press/v202/fernandes23a.html %V 202 %X In this work, we provide a large-scale empirical study of the scaling properties of multilingual neural machine translation models. We examine how increases in the model size affect the model performance and investigate the role of the individual language pair weights on the scaling behavior. We find that these weights only affect the multiplicative factor of the scaling law, and in particular, the scaling exponent is unaffected by them. Through a novel joint scaling law formulation, we compute the effective number of parameters allocated to each language pair and examine the role of language similarity in the scaling behavior of our models. We find little evidence that language similarity has any impact. In contrast, “direction” of the multilinguality plays a significant role, with models translating from multiple languages into English having a larger number of effective parameters per task than their reversed counterparts. Finally, we leverage our observations to predict the performance of multilingual models trained with any language weighting at any scale, greatly reducing efforts required for language balancing in large multilingual models. Our findings apply to both in-domain and out-of-domain test sets and to multiple evaluation metrics, such as ChrF and BLEURT.
APA
Fernandes, P., Ghorbani, B., Garcia, X., Freitag, M. & Firat, O.. (2023). Scaling Laws for Multilingual Neural Machine Translation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:10053-10071 Available from https://proceedings.mlr.press/v202/fernandes23a.html.

Related Material