Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models

Rei Higuchi, Taiji Suzuki
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:23219-23234, 2025.

Abstract

Aligning large language models (LLMs) with human preferences is crucial for safe deployment, yet existing methods assume specific preference models like Bradley-Terry model. This assumption leads to statistical inconsistency, where more data doesn’t guarantee convergence to true human preferences. To address this critical gap, we introduce a novel alignment method Direct Density Ratio Optimization (DDRO). DDRO directly estimates the density ratio between preferred and unpreferred output distributions, circumventing the need for explicit human preference modeling. We theoretically prove that DDRO is statistically consistent, ensuring convergence to the true preferred distribution as the data size grows, regardless of the underlying preference structure. Experiments demonstrate that DDRO achieves superior performance compared to existing methods, showcasing its effectiveness and potential for significant improvement. DDRO unlocks the potential for truly data-driven alignment, paving the way for more reliable and human-aligned LLMs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-higuchi25a, title = {Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models}, author = {Higuchi, Rei and Suzuki, Taiji}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {23219--23234}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/higuchi25a/higuchi25a.pdf}, url = {https://proceedings.mlr.press/v267/higuchi25a.html}, abstract = {Aligning large language models (LLMs) with human preferences is crucial for safe deployment, yet existing methods assume specific preference models like Bradley-Terry model. This assumption leads to statistical inconsistency, where more data doesn’t guarantee convergence to true human preferences. To address this critical gap, we introduce a novel alignment method Direct Density Ratio Optimization (DDRO). DDRO directly estimates the density ratio between preferred and unpreferred output distributions, circumventing the need for explicit human preference modeling. We theoretically prove that DDRO is statistically consistent, ensuring convergence to the true preferred distribution as the data size grows, regardless of the underlying preference structure. Experiments demonstrate that DDRO achieves superior performance compared to existing methods, showcasing its effectiveness and potential for significant improvement. DDRO unlocks the potential for truly data-driven alignment, paving the way for more reliable and human-aligned LLMs.} }
Endnote
%0 Conference Paper %T Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models %A Rei Higuchi %A Taiji Suzuki %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-higuchi25a %I PMLR %P 23219--23234 %U https://proceedings.mlr.press/v267/higuchi25a.html %V 267 %X Aligning large language models (LLMs) with human preferences is crucial for safe deployment, yet existing methods assume specific preference models like Bradley-Terry model. This assumption leads to statistical inconsistency, where more data doesn’t guarantee convergence to true human preferences. To address this critical gap, we introduce a novel alignment method Direct Density Ratio Optimization (DDRO). DDRO directly estimates the density ratio between preferred and unpreferred output distributions, circumventing the need for explicit human preference modeling. We theoretically prove that DDRO is statistically consistent, ensuring convergence to the true preferred distribution as the data size grows, regardless of the underlying preference structure. Experiments demonstrate that DDRO achieves superior performance compared to existing methods, showcasing its effectiveness and potential for significant improvement. DDRO unlocks the potential for truly data-driven alignment, paving the way for more reliable and human-aligned LLMs.
APA
Higuchi, R. & Suzuki, T.. (2025). Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:23219-23234 Available from https://proceedings.mlr.press/v267/higuchi25a.html.

Related Material