Stochastic Beams and Where To Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement

Wouter Kool; Herke Van Hoof; Max Welling

Stochastic Beams and Where To Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement

Wouter Kool, Herke Van Hoof, Max Welling

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:3499-3508, 2019.

Abstract

The well-known Gumbel-Max trick for sampling from a categorical distribution can be extended to sample

$k$ elements without replacement. We show how to implicitly apply this ’Gumbel-Top-

$k$ ’ trick on a factorized distribution over sequences, allowing to draw exact samples without replacement using a Stochastic Beam Search. Even for exponentially large domains, the number of model evaluations grows only linear in

$k$ and the maximum sampled sequence length. The algorithm creates a theoretical connection between sampling and (deterministic) beam search and can be used as a principled intermediate alternative. In a translation task, the proposed method compares favourably against alternatives to obtain diverse yet good quality translations. We show that sequences sampled without replacement can be used to construct low-variance estimators for expected sentence-level BLEU score and model entropy.

Cite this Paper

BibTeX


@InProceedings{pmlr-v97-kool19a,
  title = 	 {Stochastic Beams and Where To Find Them: The {G}umbel-Top-k Trick for Sampling Sequences Without Replacement},
  author =       {Kool, Wouter and Van Hoof, Herke and Welling, Max},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {3499--3508},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/kool19a/kool19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/kool19a.html},
  abstract = 	 {The well-known Gumbel-Max trick for sampling from a categorical distribution can be extended to sample $k$ elements without replacement. We show how to implicitly apply this ’Gumbel-Top-$k$’ trick on a factorized distribution over sequences, allowing to draw exact samples without replacement using a Stochastic Beam Search. Even for exponentially large domains, the number of model evaluations grows only linear in $k$ and the maximum sampled sequence length. The algorithm creates a theoretical connection between sampling and (deterministic) beam search and can be used as a principled intermediate alternative. In a translation task, the proposed method compares favourably against alternatives to obtain diverse yet good quality translations. We show that sequences sampled without replacement can be used to construct low-variance estimators for expected sentence-level BLEU score and model entropy.}
}

Endnote

%0 Conference Paper
%T Stochastic Beams and Where To Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement
%A Wouter Kool
%A Herke Van Hoof
%A Max Welling
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-kool19a
%I PMLR
%P 3499--3508
%U https://proceedings.mlr.press/v97/kool19a.html
%V 97
%X The well-known Gumbel-Max trick for sampling from a categorical distribution can be extended to sample $k$ elements without replacement. We show how to implicitly apply this ’Gumbel-Top-$k$’ trick on a factorized distribution over sequences, allowing to draw exact samples without replacement using a Stochastic Beam Search. Even for exponentially large domains, the number of model evaluations grows only linear in $k$ and the maximum sampled sequence length. The algorithm creates a theoretical connection between sampling and (deterministic) beam search and can be used as a principled intermediate alternative. In a translation task, the proposed method compares favourably against alternatives to obtain diverse yet good quality translations. We show that sequences sampled without replacement can be used to construct low-variance estimators for expected sentence-level BLEU score and model entropy.

APA


Kool, W., Van Hoof, H. & Welling, M.. (2019). Stochastic Beams and Where To Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:3499-3508 Available from https://proceedings.mlr.press/v97/kool19a.html.

Stochastic Beams and Where To Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement

Abstract

Cite this Paper

Related Material