Controlled Decoding from Language Models

Sidharth Mudgal; Jong Lee; Harish Ganapathy; Yaguang Li; Tao Wang; Yanping Huang; Zhifeng Chen; Heng-Tze Cheng; Michael Collins; Trevor Strohman; Jilin Chen; Alex Beutel; Ahmad Beirami

Controlled Decoding from Language Models

Sidharth Mudgal, Jong Lee, Harish Ganapathy, Yaguang Li, Tao Wang, Yanping Huang, Zhifeng Chen, Heng-Tze Cheng, Michael Collins, Trevor Strohman, Jilin Chen, Alex Beutel, Ahmad Beirami

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:36486-36503, 2024.

Abstract

KL-regularized reinforcement learning (RL) is a popular alignment framework to control the language model responses towards high reward outcomes. We pose a tokenwise RL objective and propose a modular solver for it, called controlled decoding (CD). CD exerts control through a separate prefix scorer module, which is trained to learn a value function for the reward. The prefix scorer is used at inference time to control the generation from a frozen base model, provably sampling from a solution to the RL objective. We empirically demonstrate that CD is effective as a control mechanism on popular benchmarks. We also show that prefix scorers for multiple rewards may be combined at inference time, effectively solving a multi-objective RL problem with no additional training. We show that the benefits of applying CD transfer to an unseen base model with no further tuning as well. Finally, we show that CD can be applied in a blockwise decoding fashion at inference-time, essentially bridging the gap between the popular best-of-

$K$ strategy and tokenwise control through reinforcement learning. This makes CD a promising approach for alignment of language models.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-mudgal24a,
  title = 	 {Controlled Decoding from Language Models},
  author =       {Mudgal, Sidharth and Lee, Jong and Ganapathy, Harish and Li, Yaguang and Wang, Tao and Huang, Yanping and Chen, Zhifeng and Cheng, Heng-Tze and Collins, Michael and Strohman, Trevor and Chen, Jilin and Beutel, Alex and Beirami, Ahmad},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {36486--36503},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/mudgal24a/mudgal24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/mudgal24a.html},
  abstract = 	 {KL-regularized reinforcement learning (RL) is a popular alignment framework to control the language model responses towards high reward outcomes. We pose a tokenwise RL objective and propose a modular solver for it, called controlled decoding (CD). CD exerts control through a separate prefix scorer module, which is trained to learn a value function for the reward. The prefix scorer is used at inference time to control the generation from a frozen base model, provably sampling from a solution to the RL objective. We empirically demonstrate that CD is effective as a control mechanism on popular benchmarks. We also show that prefix scorers for multiple rewards may be combined at inference time, effectively solving a multi-objective RL problem with no additional training. We show that the benefits of applying CD transfer to an unseen base model with no further tuning as well. Finally, we show that CD can be applied in a blockwise decoding fashion at inference-time, essentially bridging the gap between the popular best-of-$K$ strategy and tokenwise control through reinforcement learning. This makes CD a promising approach for alignment of language models.}
}

Endnote

%0 Conference Paper
%T Controlled Decoding from Language Models
%A Sidharth Mudgal
%A Jong Lee
%A Harish Ganapathy
%A Yaguang Li
%A Tao Wang
%A Yanping Huang
%A Zhifeng Chen
%A Heng-Tze Cheng
%A Michael Collins
%A Trevor Strohman
%A Jilin Chen
%A Alex Beutel
%A Ahmad Beirami
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-mudgal24a
%I PMLR
%P 36486--36503
%U https://proceedings.mlr.press/v235/mudgal24a.html
%V 235
%X KL-regularized reinforcement learning (RL) is a popular alignment framework to control the language model responses towards high reward outcomes. We pose a tokenwise RL objective and propose a modular solver for it, called controlled decoding (CD). CD exerts control through a separate prefix scorer module, which is trained to learn a value function for the reward. The prefix scorer is used at inference time to control the generation from a frozen base model, provably sampling from a solution to the RL objective. We empirically demonstrate that CD is effective as a control mechanism on popular benchmarks. We also show that prefix scorers for multiple rewards may be combined at inference time, effectively solving a multi-objective RL problem with no additional training. We show that the benefits of applying CD transfer to an unseen base model with no further tuning as well. Finally, we show that CD can be applied in a blockwise decoding fashion at inference-time, essentially bridging the gap between the popular best-of-$K$ strategy and tokenwise control through reinforcement learning. This makes CD a promising approach for alignment of language models.

APA


Mudgal, S., Lee, J., Ganapathy, H., Li, Y., Wang, T., Huang, Y., Chen, Z., Cheng, H., Collins, M., Strohman, T., Chen, J., Beutel, A. & Beirami, A.. (2024). Controlled Decoding from Language Models. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:36486-36503 Available from https://proceedings.mlr.press/v235/mudgal24a.html.

Controlled Decoding from Language Models

Abstract

Cite this Paper

Related Material