Comparing Few to Rank Many: Active Human Preference Learning Using Randomized Frank-Wolfe Method

Kiran Koshy Thekumparampil; Gaurush Hiranandani; Kousha Kalantari; Shoham Sabach; Branislav Kveton

Comparing Few to Rank Many: Active Human Preference Learning Using Randomized Frank-Wolfe Method

Kiran Koshy Thekumparampil, Gaurush Hiranandani, Kousha Kalantari, Shoham Sabach, Branislav Kveton

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:59355-59376, 2025.

Abstract

We study learning human preferences from limited comparison feedback, a core machine learning problem that is at the center of reinforcement learning from human feedback (RLHF). We formulate the problem as learning a Plackett-Luce (PL) model from a limited number of $K$-subset comparisons over a universe of $N$ items, where typically $K \ll N$. Our objective is to select the $K$-subsets such that all items can be ranked with minimal mistakes within the budget. We solve the problem using the D-optimal design, which minimizes the worst-case ranking loss under the estimated PL model. All known algorithms for this problem are computationally infeasible in our setting because we consider exponentially many subsets in $K$. To address this challenge, we propose a randomized Frank-Wolfe algorithm with memoization and sparse updates that has a low $O(N^2 + K^2)$ per-iteration complexity. We analyze it and demonstrate its empirical superiority on synthetic and open-source NLP datasets.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-thekumparampil25a,
  title = 	 {Comparing Few to Rank Many: Active Human Preference Learning Using Randomized Frank-{W}olfe Method},
  author =       {Thekumparampil, Kiran Koshy and Hiranandani, Gaurush and Kalantari, Kousha and Sabach, Shoham and Kveton, Branislav},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {59355--59376},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/thekumparampil25a/thekumparampil25a.pdf},
  url = 	 {https://proceedings.mlr.press/v267/thekumparampil25a.html},
  abstract = 	 {We study learning human preferences from limited comparison feedback, a core machine learning problem that is at the center of reinforcement learning from human feedback (RLHF). We formulate the problem as learning a Plackett-Luce (PL) model from a limited number of $K$-subset comparisons over a universe of $N$ items, where typically $K \ll N$. Our objective is to select the $K$-subsets such that all items can be ranked with minimal mistakes within the budget. We solve the problem using the D-optimal design, which minimizes the worst-case ranking loss under the estimated PL model. All known algorithms for this problem are computationally infeasible in our setting because we consider exponentially many subsets in $K$. To address this challenge, we propose a randomized Frank-Wolfe algorithm with memoization and sparse updates that has a low $O(N^2 + K^2)$ per-iteration complexity. We analyze it and demonstrate its empirical superiority on synthetic and open-source NLP datasets.}
}

Endnote

%0 Conference Paper
%T Comparing Few to Rank Many: Active Human Preference Learning Using Randomized Frank-Wolfe Method
%A Kiran Koshy Thekumparampil
%A Gaurush Hiranandani
%A Kousha Kalantari
%A Shoham Sabach
%A Branislav Kveton
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-thekumparampil25a
%I PMLR
%P 59355--59376
%U https://proceedings.mlr.press/v267/thekumparampil25a.html
%V 267
%X We study learning human preferences from limited comparison feedback, a core machine learning problem that is at the center of reinforcement learning from human feedback (RLHF). We formulate the problem as learning a Plackett-Luce (PL) model from a limited number of $K$-subset comparisons over a universe of $N$ items, where typically $K \ll N$. Our objective is to select the $K$-subsets such that all items can be ranked with minimal mistakes within the budget. We solve the problem using the D-optimal design, which minimizes the worst-case ranking loss under the estimated PL model. All known algorithms for this problem are computationally infeasible in our setting because we consider exponentially many subsets in $K$. To address this challenge, we propose a randomized Frank-Wolfe algorithm with memoization and sparse updates that has a low $O(N^2 + K^2)$ per-iteration complexity. We analyze it and demonstrate its empirical superiority on synthetic and open-source NLP datasets.

APA

Thekumparampil, K.K., Hiranandani, G., Kalantari, K., Sabach, S. & Kveton, B.. (2025). Comparing Few to Rank Many: Active Human Preference Learning Using Randomized Frank-Wolfe Method. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:59355-59376 Available from https://proceedings.mlr.press/v267/thekumparampil25a.html.

Comparing Few to Rank Many: Active Human Preference Learning Using Randomized Frank-Wolfe Method

Abstract

Cite this Paper

Related Material