Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

Chengqian Gao, Haonan Li, Liu Liu, Zeke Xie, Peilin Zhao, Zhiqiang Xu
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:18386-18409, 2025.

Abstract

The alignment of large language models (LLMs) often assumes that using more clean data yields better outcomes, overlooking the match between model capacity and example difficulty. Challenging this, we propose a new principle: Preference data vary in difficulty, and overly difficult examples hinder alignment, by exceeding the model’s capacity. Through systematic experimentation, we validate this principle with three key findings: (1) preference examples vary in difficulty, as evidenced by consistent learning orders across alignment runs; (2) overly difficult examples significantly degrade performance across four LLMs and two datasets; and (3) the capacity of a model dictates its threshold for handling difficult examples, underscoring a critical relationship between data selection and model capacity. Building on this principle, we introduce Selective DPO, which filters out overly difficult examples. This simple adjustment improves alignment performance by 9-16% in win rates on the AlpacaEval 2 benchmark compared to the DPO baseline, surpassing a series of DPO variants with different algorithmic adjustments. These results together illuminate the importance of aligning data difficulty with model capacity, offering a transformative perspective for improving alignment strategies in LLMs. Code is available at https://github.com/glorgao/SelectiveDPO

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-gao25f, title = {Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples}, author = {Gao, Chengqian and Li, Haonan and Liu, Liu and Xie, Zeke and Zhao, Peilin and Xu, Zhiqiang}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {18386--18409}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/gao25f/gao25f.pdf}, url = {https://proceedings.mlr.press/v267/gao25f.html}, abstract = {The alignment of large language models (LLMs) often assumes that using more clean data yields better outcomes, overlooking the match between model capacity and example difficulty. Challenging this, we propose a new principle: Preference data vary in difficulty, and overly difficult examples hinder alignment, by exceeding the model’s capacity. Through systematic experimentation, we validate this principle with three key findings: (1) preference examples vary in difficulty, as evidenced by consistent learning orders across alignment runs; (2) overly difficult examples significantly degrade performance across four LLMs and two datasets; and (3) the capacity of a model dictates its threshold for handling difficult examples, underscoring a critical relationship between data selection and model capacity. Building on this principle, we introduce Selective DPO, which filters out overly difficult examples. This simple adjustment improves alignment performance by 9-16% in win rates on the AlpacaEval 2 benchmark compared to the DPO baseline, surpassing a series of DPO variants with different algorithmic adjustments. These results together illuminate the importance of aligning data difficulty with model capacity, offering a transformative perspective for improving alignment strategies in LLMs. Code is available at https://github.com/glorgao/SelectiveDPO} }
Endnote
%0 Conference Paper %T Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples %A Chengqian Gao %A Haonan Li %A Liu Liu %A Zeke Xie %A Peilin Zhao %A Zhiqiang Xu %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-gao25f %I PMLR %P 18386--18409 %U https://proceedings.mlr.press/v267/gao25f.html %V 267 %X The alignment of large language models (LLMs) often assumes that using more clean data yields better outcomes, overlooking the match between model capacity and example difficulty. Challenging this, we propose a new principle: Preference data vary in difficulty, and overly difficult examples hinder alignment, by exceeding the model’s capacity. Through systematic experimentation, we validate this principle with three key findings: (1) preference examples vary in difficulty, as evidenced by consistent learning orders across alignment runs; (2) overly difficult examples significantly degrade performance across four LLMs and two datasets; and (3) the capacity of a model dictates its threshold for handling difficult examples, underscoring a critical relationship between data selection and model capacity. Building on this principle, we introduce Selective DPO, which filters out overly difficult examples. This simple adjustment improves alignment performance by 9-16% in win rates on the AlpacaEval 2 benchmark compared to the DPO baseline, surpassing a series of DPO variants with different algorithmic adjustments. These results together illuminate the importance of aligning data difficulty with model capacity, offering a transformative perspective for improving alignment strategies in LLMs. Code is available at https://github.com/glorgao/SelectiveDPO
APA
Gao, C., Li, H., Liu, L., Xie, Z., Zhao, P. & Xu, Z.. (2025). Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:18386-18409 Available from https://proceedings.mlr.press/v267/gao25f.html.

Related Material