Principled Gradient-Based MCMC for Conditional Sampling of Text

Li Du, Afra Amini, Lucas Torroba Hennigen, Xinyan Velocity Yu, Holden Lee, Jason Eisner, Ryan Cotterell
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:11663-11685, 2024.

Abstract

We consider the problem of sampling text from an energy-based model. This arises, for example, when sampling text from a neural language model subject to soft constraints. Although the target distribution is discrete, the internal computations of the energy function (given by the language model) are differentiable, so one would like to exploit gradient information within a method such as MCMC. Alas, all previous attempts to generalize gradient-based MCMC to text sampling fail to sample correctly from the target distribution. We propose a solution, along with variants, and study its theoretical properties. Through experiments on various forms of text generation, we demonstrate that our unbiased samplers are able to generate more fluent text while better adhering to the control objectives. The same methods could be used to sample from discrete energy-based models unrelated to text.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-du24a, title = {Principled Gradient-Based {MCMC} for Conditional Sampling of Text}, author = {Du, Li and Amini, Afra and Torroba Hennigen, Lucas and Yu, Xinyan Velocity and Lee, Holden and Eisner, Jason and Cotterell, Ryan}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {11663--11685}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/du24a/du24a.pdf}, url = {https://proceedings.mlr.press/v235/du24a.html}, abstract = {We consider the problem of sampling text from an energy-based model. This arises, for example, when sampling text from a neural language model subject to soft constraints. Although the target distribution is discrete, the internal computations of the energy function (given by the language model) are differentiable, so one would like to exploit gradient information within a method such as MCMC. Alas, all previous attempts to generalize gradient-based MCMC to text sampling fail to sample correctly from the target distribution. We propose a solution, along with variants, and study its theoretical properties. Through experiments on various forms of text generation, we demonstrate that our unbiased samplers are able to generate more fluent text while better adhering to the control objectives. The same methods could be used to sample from discrete energy-based models unrelated to text.} }
Endnote
%0 Conference Paper %T Principled Gradient-Based MCMC for Conditional Sampling of Text %A Li Du %A Afra Amini %A Lucas Torroba Hennigen %A Xinyan Velocity Yu %A Holden Lee %A Jason Eisner %A Ryan Cotterell %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-du24a %I PMLR %P 11663--11685 %U https://proceedings.mlr.press/v235/du24a.html %V 235 %X We consider the problem of sampling text from an energy-based model. This arises, for example, when sampling text from a neural language model subject to soft constraints. Although the target distribution is discrete, the internal computations of the energy function (given by the language model) are differentiable, so one would like to exploit gradient information within a method such as MCMC. Alas, all previous attempts to generalize gradient-based MCMC to text sampling fail to sample correctly from the target distribution. We propose a solution, along with variants, and study its theoretical properties. Through experiments on various forms of text generation, we demonstrate that our unbiased samplers are able to generate more fluent text while better adhering to the control objectives. The same methods could be used to sample from discrete energy-based models unrelated to text.
APA
Du, L., Amini, A., Torroba Hennigen, L., Yu, X.V., Lee, H., Eisner, J. & Cotterell, R.. (2024). Principled Gradient-Based MCMC for Conditional Sampling of Text. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:11663-11685 Available from https://proceedings.mlr.press/v235/du24a.html.

Related Material