Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models

Luke Vilnis; Yury Zemlyanskiy; Patrick Murray; Alexandre Tachard Passos; Sumit Sanghai

Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models

Luke Vilnis, Yury Zemlyanskiy, Patrick Murray, Alexandre Tachard Passos, Sumit Sanghai

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:35120-35136, 2023.

Abstract

Decoding methods for large language models often trade-off between diversity of outputs and parallelism of computation. Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize. Alternatively, methods such as temperature sampling and its modifications (top-k sampling, nucleus sampling, typical decoding, and others), are embarrassingly parallel, but have no guarantees about duplicate samples. We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model, compatible with common sampling variations, with provable beam diversity under certain conditions, as well as being embarrassingly parallel and providing unbiased and consistent expectations from the original model. We demonstrate the effectiveness of our approach on WMT machine translation, more than halving the standard deviation when estimating expected BLEU score reward, and closing the BLEU score gap between independent sampling and beam search by up to 63%.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-vilnis23a,
  title = 	 {Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models},
  author =       {Vilnis, Luke and Zemlyanskiy, Yury and Murray, Patrick and Passos, Alexandre Tachard and Sanghai, Sumit},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {35120--35136},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/vilnis23a/vilnis23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/vilnis23a.html},
  abstract = 	 {Decoding methods for large language models often trade-off between diversity of outputs and parallelism of computation. Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize. Alternatively, methods such as temperature sampling and its modifications (top-k sampling, nucleus sampling, typical decoding, and others), are embarrassingly parallel, but have no guarantees about duplicate samples. We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model, compatible with common sampling variations, with provable beam diversity under certain conditions, as well as being embarrassingly parallel and providing unbiased and consistent expectations from the original model. We demonstrate the effectiveness of our approach on WMT machine translation, more than halving the standard deviation when estimating expected BLEU score reward, and closing the BLEU score gap between independent sampling and beam search by up to 63%.}
}

Endnote

%0 Conference Paper
%T Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
%A Luke Vilnis
%A Yury Zemlyanskiy
%A Patrick Murray
%A Alexandre Tachard Passos
%A Sumit Sanghai
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-vilnis23a
%I PMLR
%P 35120--35136
%U https://proceedings.mlr.press/v202/vilnis23a.html
%V 202
%X Decoding methods for large language models often trade-off between diversity of outputs and parallelism of computation. Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize. Alternatively, methods such as temperature sampling and its modifications (top-k sampling, nucleus sampling, typical decoding, and others), are embarrassingly parallel, but have no guarantees about duplicate samples. We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model, compatible with common sampling variations, with provable beam diversity under certain conditions, as well as being embarrassingly parallel and providing unbiased and consistent expectations from the original model. We demonstrate the effectiveness of our approach on WMT machine translation, more than halving the standard deviation when estimating expected BLEU score reward, and closing the BLEU score gap between independent sampling and beam search by up to 63%.

APA


Vilnis, L., Zemlyanskiy, Y., Murray, P., Passos, A.T. & Sanghai, S.. (2023). Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:35120-35136 Available from https://proceedings.mlr.press/v202/vilnis23a.html.

Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models

Abstract

Cite this Paper

Related Material