Stochastic Beams and Where To Find Them: The GumbelTopk Trick for Sampling Sequences Without Replacement
[edit]
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:34993508, 2019.
Abstract
The wellknown GumbelMax trick for sampling from a categorical distribution can be extended to sample $k$ elements without replacement. We show how to implicitly apply this ’GumbelTop$k$’ trick on a factorized distribution over sequences, allowing to draw exact samples without replacement using a Stochastic Beam Search. Even for exponentially large domains, the number of model evaluations grows only linear in $k$ and the maximum sampled sequence length. The algorithm creates a theoretical connection between sampling and (deterministic) beam search and can be used as a principled intermediate alternative. In a translation task, the proposed method compares favourably against alternatives to obtain diverse yet good quality translations. We show that sequences sampled without replacement can be used to construct lowvariance estimators for expected sentencelevel BLEU score and model entropy.
Related Material


