Optimizing Language Models for Human Preferences is a Causal Inference Problem

Victoria Lin, Eli Ben-Michael, Louis-Philippe Morency
Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:2250-2270, 2024.

Abstract

As large language models (LLMs) see greater use in academic and commercial settings, there is increasing interest in methods that allow language models to generate texts aligned with human preferences. In this paper, we present an initial exploration of language model optimization for human preferences from *direct outcome datasets*, where each sample consists of a text and an associated numerical outcome measuring the reader’s response. We first propose that language model optimization should be viewed as a *causal problem* to ensure that the model correctly learns the relationship between the text and the outcome. We formalize this causal language optimization problem, and we develop a method{—}*causal preference optimization* (CPO){—}that solves an unbiased surrogate objective for the problem. We further extend CPO with *doubly robust* CPO (DR-CPO), which reduces the variance of the surrogate objective while retaining provably strong guarantees on bias. Finally, we empirically demonstrate the effectiveness of (DR-)CPO in optimizing state-of-the-art LLMs for human preferences on direct outcome data, and we validate the robustness of DR-CPO under difficult confounding conditions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v244-lin24a, title = {Optimizing Language Models for Human Preferences is a Causal Inference Problem}, author = {Lin, Victoria and Ben-Michael, Eli and Morency, Louis-Philippe}, booktitle = {Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence}, pages = {2250--2270}, year = {2024}, editor = {Kiyavash, Negar and Mooij, Joris M.}, volume = {244}, series = {Proceedings of Machine Learning Research}, month = {15--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v244/main/assets/lin24a/lin24a.pdf}, url = {https://proceedings.mlr.press/v244/lin24a.html}, abstract = {As large language models (LLMs) see greater use in academic and commercial settings, there is increasing interest in methods that allow language models to generate texts aligned with human preferences. In this paper, we present an initial exploration of language model optimization for human preferences from *direct outcome datasets*, where each sample consists of a text and an associated numerical outcome measuring the reader’s response. We first propose that language model optimization should be viewed as a *causal problem* to ensure that the model correctly learns the relationship between the text and the outcome. We formalize this causal language optimization problem, and we develop a method{—}*causal preference optimization* (CPO){—}that solves an unbiased surrogate objective for the problem. We further extend CPO with *doubly robust* CPO (DR-CPO), which reduces the variance of the surrogate objective while retaining provably strong guarantees on bias. Finally, we empirically demonstrate the effectiveness of (DR-)CPO in optimizing state-of-the-art LLMs for human preferences on direct outcome data, and we validate the robustness of DR-CPO under difficult confounding conditions.} }
Endnote
%0 Conference Paper %T Optimizing Language Models for Human Preferences is a Causal Inference Problem %A Victoria Lin %A Eli Ben-Michael %A Louis-Philippe Morency %B Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2024 %E Negar Kiyavash %E Joris M. Mooij %F pmlr-v244-lin24a %I PMLR %P 2250--2270 %U https://proceedings.mlr.press/v244/lin24a.html %V 244 %X As large language models (LLMs) see greater use in academic and commercial settings, there is increasing interest in methods that allow language models to generate texts aligned with human preferences. In this paper, we present an initial exploration of language model optimization for human preferences from *direct outcome datasets*, where each sample consists of a text and an associated numerical outcome measuring the reader’s response. We first propose that language model optimization should be viewed as a *causal problem* to ensure that the model correctly learns the relationship between the text and the outcome. We formalize this causal language optimization problem, and we develop a method{—}*causal preference optimization* (CPO){—}that solves an unbiased surrogate objective for the problem. We further extend CPO with *doubly robust* CPO (DR-CPO), which reduces the variance of the surrogate objective while retaining provably strong guarantees on bias. Finally, we empirically demonstrate the effectiveness of (DR-)CPO in optimizing state-of-the-art LLMs for human preferences on direct outcome data, and we validate the robustness of DR-CPO under difficult confounding conditions.
APA
Lin, V., Ben-Michael, E. & Morency, L.. (2024). Optimizing Language Models for Human Preferences is a Causal Inference Problem. Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 244:2250-2270 Available from https://proceedings.mlr.press/v244/lin24a.html.

Related Material