Oops I Took A Gradient: Scalable Sampling for Discrete Distributions

Will Grathwohl; Kevin Swersky; Milad Hashemi; David Duvenaud; Chris Maddison

Oops I Took A Gradient: Scalable Sampling for Discrete Distributions

Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud, Chris Maddison

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:3831-3841, 2021.

Abstract

We propose a general and scalable approximate sampling strategy for probabilistic models with discrete variables. Our approach uses gradients of the likelihood function with respect to its discrete inputs to propose updates in a Metropolis-Hastings sampler. We show empirically that this approach outperforms generic samplers in a number of difficult settings including Ising models, Potts models, restricted Boltzmann machines, and factorial hidden Markov models. We also demonstrate our improved sampler for training deep energy-based models on high dimensional discrete image data. This approach outperforms variational auto-encoders and existing energy-based models. Finally, we give bounds showing that our approach is near-optimal in the class of samplers which propose local updates.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-grathwohl21a,
  title = 	 {Oops I Took A Gradient: Scalable Sampling for Discrete Distributions},
  author =       {Grathwohl, Will and Swersky, Kevin and Hashemi, Milad and Duvenaud, David and Maddison, Chris},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {3831--3841},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/grathwohl21a/grathwohl21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/grathwohl21a.html},
  abstract = 	 {We propose a general and scalable approximate sampling strategy for probabilistic models with discrete variables. Our approach uses gradients of the likelihood function with respect to its discrete inputs to propose updates in a Metropolis-Hastings sampler. We show empirically that this approach outperforms generic samplers in a number of difficult settings including Ising models, Potts models, restricted Boltzmann machines, and factorial hidden Markov models. We also demonstrate our improved sampler for training deep energy-based models on high dimensional discrete image data. This approach outperforms variational auto-encoders and existing energy-based models. Finally, we give bounds showing that our approach is near-optimal in the class of samplers which propose local updates.}
}

Endnote

%0 Conference Paper
%T Oops I Took A Gradient: Scalable Sampling for Discrete Distributions
%A Will Grathwohl
%A Kevin Swersky
%A Milad Hashemi
%A David Duvenaud
%A Chris Maddison
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-grathwohl21a
%I PMLR
%P 3831--3841
%U https://proceedings.mlr.press/v139/grathwohl21a.html
%V 139
%X We propose a general and scalable approximate sampling strategy for probabilistic models with discrete variables. Our approach uses gradients of the likelihood function with respect to its discrete inputs to propose updates in a Metropolis-Hastings sampler. We show empirically that this approach outperforms generic samplers in a number of difficult settings including Ising models, Potts models, restricted Boltzmann machines, and factorial hidden Markov models. We also demonstrate our improved sampler for training deep energy-based models on high dimensional discrete image data. This approach outperforms variational auto-encoders and existing energy-based models. Finally, we give bounds showing that our approach is near-optimal in the class of samplers which propose local updates.

APA

Grathwohl, W., Swersky, K., Hashemi, M., Duvenaud, D. & Maddison, C.. (2021). Oops I Took A Gradient: Scalable Sampling for Discrete Distributions. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:3831-3841 Available from https://proceedings.mlr.press/v139/grathwohl21a.html.

Oops I Took A Gradient: Scalable Sampling for Discrete Distributions

Abstract

Cite this Paper

Related Material