$f$-PO: Generalizing Preference Optimization with $f$-divergence Minimization

Jiaqi Han, Mingjian Jiang, Yuxuan Song, Stefano Ermon, Minkai Xu
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:1144-1152, 2025.

Abstract

Preference optimization has made significant progress recently, with numerous methods developed to align language models with human preferences. This paper introduces $f$-divergence Preference Optimization ($f$-PO), a novel framework that generalizes and extends existing approaches. $f$-PO minimizes $f$-divergences between the optimized policy and the optimal policy, encompassing a broad family of alignment methods using various divergences. Our approach unifies previous algorithms like DPO and EXO, while offering new variants through different choices of $f$-divergences. We provide theoretical analysis of $f$-PO’s properties and conduct extensive experiments on state-of-the-art language models using benchmark datasets. Results demonstrate $f$-PO’s effectiveness across various tasks, achieving superior performance compared to existing methods on popular benchmarks such as AlpacaEval 2, Arena-Hard, MT-Bench, and Open LLM Leaderboard v2. Additionally, we present ablation studies exploring the impact of different $f$-divergences, offering insights into the trade-offs between regularization and performance in offline preference optimization. Our work contributes both practical algorithms and theoretical understanding to the field of language model alignment. Code is available at \url{https://github.com/MinkaiXu/fPO.}

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-han25a, title = {$f$-PO: Generalizing Preference Optimization with $f$-divergence Minimization}, author = {Han, Jiaqi and Jiang, Mingjian and Song, Yuxuan and Ermon, Stefano and Xu, Minkai}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {1144--1152}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/han25a/han25a.pdf}, url = {https://proceedings.mlr.press/v258/han25a.html}, abstract = {Preference optimization has made significant progress recently, with numerous methods developed to align language models with human preferences. This paper introduces $f$-divergence Preference Optimization ($f$-PO), a novel framework that generalizes and extends existing approaches. $f$-PO minimizes $f$-divergences between the optimized policy and the optimal policy, encompassing a broad family of alignment methods using various divergences. Our approach unifies previous algorithms like DPO and EXO, while offering new variants through different choices of $f$-divergences. We provide theoretical analysis of $f$-PO’s properties and conduct extensive experiments on state-of-the-art language models using benchmark datasets. Results demonstrate $f$-PO’s effectiveness across various tasks, achieving superior performance compared to existing methods on popular benchmarks such as AlpacaEval 2, Arena-Hard, MT-Bench, and Open LLM Leaderboard v2. Additionally, we present ablation studies exploring the impact of different $f$-divergences, offering insights into the trade-offs between regularization and performance in offline preference optimization. Our work contributes both practical algorithms and theoretical understanding to the field of language model alignment. Code is available at \url{https://github.com/MinkaiXu/fPO.}} }
Endnote
%0 Conference Paper %T $f$-PO: Generalizing Preference Optimization with $f$-divergence Minimization %A Jiaqi Han %A Mingjian Jiang %A Yuxuan Song %A Stefano Ermon %A Minkai Xu %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-han25a %I PMLR %P 1144--1152 %U https://proceedings.mlr.press/v258/han25a.html %V 258 %X Preference optimization has made significant progress recently, with numerous methods developed to align language models with human preferences. This paper introduces $f$-divergence Preference Optimization ($f$-PO), a novel framework that generalizes and extends existing approaches. $f$-PO minimizes $f$-divergences between the optimized policy and the optimal policy, encompassing a broad family of alignment methods using various divergences. Our approach unifies previous algorithms like DPO and EXO, while offering new variants through different choices of $f$-divergences. We provide theoretical analysis of $f$-PO’s properties and conduct extensive experiments on state-of-the-art language models using benchmark datasets. Results demonstrate $f$-PO’s effectiveness across various tasks, achieving superior performance compared to existing methods on popular benchmarks such as AlpacaEval 2, Arena-Hard, MT-Bench, and Open LLM Leaderboard v2. Additionally, we present ablation studies exploring the impact of different $f$-divergences, offering insights into the trade-offs between regularization and performance in offline preference optimization. Our work contributes both practical algorithms and theoretical understanding to the field of language model alignment. Code is available at \url{https://github.com/MinkaiXu/fPO.}
APA
Han, J., Jiang, M., Song, Y., Ermon, S. & Xu, M.. (2025). $f$-PO: Generalizing Preference Optimization with $f$-divergence Minimization. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:1144-1152 Available from https://proceedings.mlr.press/v258/han25a.html.

Related Material