Mixture Models for Diverse Machine Translation: Tricks of the Trade

Tianxiao Shen, Myle Ott, Michael Auli, Marc’Aurelio Ranzato
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:5719-5728, 2019.

Abstract

Mixture models trained via EM are among the simplest, most widely used and well understood latent variable models in the machine learning literature. Surprisingly, these models have been hardly explored in text generation applications such as machine translation. In principle, they provide a latent variable to control generation and produce a diverse set of hypotheses. In practice, however, mixture models are prone to degeneracies—often only one component gets trained or the latent variable is simply ignored. We find that disabling dropout noise in responsibility computation is critical to successful training. In addition, the design choices of parameterization, prior distribution, hard versus soft EM and online versus offline assignment can dramatically affect model performance. We develop an evaluation protocol to assess both quality and diversity of generations against multiple references, and provide an extensive empirical study of several mixture model variants. Our analysis shows that certain types of mixture models are more robust and offer the best trade-off between translation quality and diversity compared to variational models and diverse decoding approaches.\footnote{Code to reproduce the results in this paper is available at \url{https://github.com/pytorch/fairseq}}

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-shen19c, title = {Mixture Models for Diverse Machine Translation: Tricks of the Trade}, author = {Shen, Tianxiao and Ott, Myle and Auli, Michael and Ranzato, Marc'Aurelio}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {5719--5728}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/shen19c/shen19c.pdf}, url = {https://proceedings.mlr.press/v97/shen19c.html}, abstract = {Mixture models trained via EM are among the simplest, most widely used and well understood latent variable models in the machine learning literature. Surprisingly, these models have been hardly explored in text generation applications such as machine translation. In principle, they provide a latent variable to control generation and produce a diverse set of hypotheses. In practice, however, mixture models are prone to degeneracies—often only one component gets trained or the latent variable is simply ignored. We find that disabling dropout noise in responsibility computation is critical to successful training. In addition, the design choices of parameterization, prior distribution, hard versus soft EM and online versus offline assignment can dramatically affect model performance. We develop an evaluation protocol to assess both quality and diversity of generations against multiple references, and provide an extensive empirical study of several mixture model variants. Our analysis shows that certain types of mixture models are more robust and offer the best trade-off between translation quality and diversity compared to variational models and diverse decoding approaches.\footnote{Code to reproduce the results in this paper is available at \url{https://github.com/pytorch/fairseq}}} }
Endnote
%0 Conference Paper %T Mixture Models for Diverse Machine Translation: Tricks of the Trade %A Tianxiao Shen %A Myle Ott %A Michael Auli %A Marc’Aurelio Ranzato %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-shen19c %I PMLR %P 5719--5728 %U https://proceedings.mlr.press/v97/shen19c.html %V 97 %X Mixture models trained via EM are among the simplest, most widely used and well understood latent variable models in the machine learning literature. Surprisingly, these models have been hardly explored in text generation applications such as machine translation. In principle, they provide a latent variable to control generation and produce a diverse set of hypotheses. In practice, however, mixture models are prone to degeneracies—often only one component gets trained or the latent variable is simply ignored. We find that disabling dropout noise in responsibility computation is critical to successful training. In addition, the design choices of parameterization, prior distribution, hard versus soft EM and online versus offline assignment can dramatically affect model performance. We develop an evaluation protocol to assess both quality and diversity of generations against multiple references, and provide an extensive empirical study of several mixture model variants. Our analysis shows that certain types of mixture models are more robust and offer the best trade-off between translation quality and diversity compared to variational models and diverse decoding approaches.\footnote{Code to reproduce the results in this paper is available at \url{https://github.com/pytorch/fairseq}}
APA
Shen, T., Ott, M., Auli, M. & Ranzato, M.. (2019). Mixture Models for Diverse Machine Translation: Tricks of the Trade. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:5719-5728 Available from https://proceedings.mlr.press/v97/shen19c.html.

Related Material