ROPO: Robust Preference Optimization for Large Language Models

Xize Liang, Chao Chen, Shuang Qiu, Jie Wang, Yue Wu, Zhihang Fu, Hanzhu Chen, Feng Wu, Jieping Ye
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:37131-37161, 2025.

Abstract

The prevalent noise in the preference data unavoidably poses significant challenges to the preference alignment of large language models (LLMs). Existing efforts for this problem either marginally alleviate the impact of noise without noise reduction, or rely on external LLMs that incur substantial computational costs. To address these challenges, we propose RObust Preference Optimization (ROPO), an iterative alignment approach that integrates noise-tolerance and noise filtering without the aid of external models. Specifically, ROPO first formulates the training process with adaptive noise reduction as an optimization problem, which can be efficiently solved in an iterative paradigm. Then, to equip this solving process with noise-tolerance and noise-identification capabilities, we derive a robust loss that suppresses the gradients from samples with high uncertainty. We demonstrate both empirically and theoretically that the derived loss is key to the noise-tolerance and effective filtering of noisy samples. The derived loss further inspires a robustness-guided rejection sampling technique to compensate for the potential important information in discarded queries. Extensive experiments on several widely-used datasets and model architectures demonstrate that ROPO significantly outperforms all baselines under four practical noise settings and the random symmetric noise, with its advantage increasing as the noise rate increases.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-liang25d, title = {{ROPO}: Robust Preference Optimization for Large Language Models}, author = {Liang, Xize and Chen, Chao and Qiu, Shuang and Wang, Jie and Wu, Yue and Fu, Zhihang and Chen, Hanzhu and Wu, Feng and Ye, Jieping}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {37131--37161}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/liang25d/liang25d.pdf}, url = {https://proceedings.mlr.press/v267/liang25d.html}, abstract = {The prevalent noise in the preference data unavoidably poses significant challenges to the preference alignment of large language models (LLMs). Existing efforts for this problem either marginally alleviate the impact of noise without noise reduction, or rely on external LLMs that incur substantial computational costs. To address these challenges, we propose RObust Preference Optimization (ROPO), an iterative alignment approach that integrates noise-tolerance and noise filtering without the aid of external models. Specifically, ROPO first formulates the training process with adaptive noise reduction as an optimization problem, which can be efficiently solved in an iterative paradigm. Then, to equip this solving process with noise-tolerance and noise-identification capabilities, we derive a robust loss that suppresses the gradients from samples with high uncertainty. We demonstrate both empirically and theoretically that the derived loss is key to the noise-tolerance and effective filtering of noisy samples. The derived loss further inspires a robustness-guided rejection sampling technique to compensate for the potential important information in discarded queries. Extensive experiments on several widely-used datasets and model architectures demonstrate that ROPO significantly outperforms all baselines under four practical noise settings and the random symmetric noise, with its advantage increasing as the noise rate increases.} }
Endnote
%0 Conference Paper %T ROPO: Robust Preference Optimization for Large Language Models %A Xize Liang %A Chao Chen %A Shuang Qiu %A Jie Wang %A Yue Wu %A Zhihang Fu %A Hanzhu Chen %A Feng Wu %A Jieping Ye %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-liang25d %I PMLR %P 37131--37161 %U https://proceedings.mlr.press/v267/liang25d.html %V 267 %X The prevalent noise in the preference data unavoidably poses significant challenges to the preference alignment of large language models (LLMs). Existing efforts for this problem either marginally alleviate the impact of noise without noise reduction, or rely on external LLMs that incur substantial computational costs. To address these challenges, we propose RObust Preference Optimization (ROPO), an iterative alignment approach that integrates noise-tolerance and noise filtering without the aid of external models. Specifically, ROPO first formulates the training process with adaptive noise reduction as an optimization problem, which can be efficiently solved in an iterative paradigm. Then, to equip this solving process with noise-tolerance and noise-identification capabilities, we derive a robust loss that suppresses the gradients from samples with high uncertainty. We demonstrate both empirically and theoretically that the derived loss is key to the noise-tolerance and effective filtering of noisy samples. The derived loss further inspires a robustness-guided rejection sampling technique to compensate for the potential important information in discarded queries. Extensive experiments on several widely-used datasets and model architectures demonstrate that ROPO significantly outperforms all baselines under four practical noise settings and the random symmetric noise, with its advantage increasing as the noise rate increases.
APA
Liang, X., Chen, C., Qiu, S., Wang, J., Wu, Y., Fu, Z., Chen, H., Wu, F. & Ye, J.. (2025). ROPO: Robust Preference Optimization for Large Language Models. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:37131-37161 Available from https://proceedings.mlr.press/v267/liang25d.html.

Related Material