Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds

Aya Kayal, Sattar Vakili, Laura Toni, Da-Shan Shiu, Alberto Bernacchia
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:29449-29468, 2025.

Abstract

Bayesian optimization (BO) with preference-based feedback has recently garnered significant attention due to its emerging applications. We refer to this problem as Bayesian Optimization from Human Feedback (BOHF), which differs from conventional BO by learning the best actions from a reduced feedback model, where only the preference between two actions is revealed to the learner at each time step. The objective is to identify the best action using a limited number of preference queries, typically obtained through costly human feedback. Existing work, which adopts the Bradley-Terry-Luce (BTL) feedback model, provides regret bounds for the performance of several algorithms. In this work, within the same framework we develop tighter performance guarantees. Specifically, we derive regret bounds of $\tilde{\mathcal{O}}(\sqrt{\Gamma(T)T})$, where $\Gamma(T)$ represents the maximum information gain—a kernel-specific complexity term—and $T$ is the number of queries. Our results significantly improve upon existing bounds. Notably, for common kernels, we show that the order-optimal sample complexities of conventional BO—achieved with richer feedback models—are recovered. In other words, the same number of preferential samples as scalar-valued samples is sufficient to find a nearly optimal solution.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-kayal25a, title = {{B}ayesian Optimization from Human Feedback: Near-Optimal Regret Bounds}, author = {Kayal, Aya and Vakili, Sattar and Toni, Laura and Shiu, Da-Shan and Bernacchia, Alberto}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {29449--29468}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/kayal25a/kayal25a.pdf}, url = {https://proceedings.mlr.press/v267/kayal25a.html}, abstract = {Bayesian optimization (BO) with preference-based feedback has recently garnered significant attention due to its emerging applications. We refer to this problem as Bayesian Optimization from Human Feedback (BOHF), which differs from conventional BO by learning the best actions from a reduced feedback model, where only the preference between two actions is revealed to the learner at each time step. The objective is to identify the best action using a limited number of preference queries, typically obtained through costly human feedback. Existing work, which adopts the Bradley-Terry-Luce (BTL) feedback model, provides regret bounds for the performance of several algorithms. In this work, within the same framework we develop tighter performance guarantees. Specifically, we derive regret bounds of $\tilde{\mathcal{O}}(\sqrt{\Gamma(T)T})$, where $\Gamma(T)$ represents the maximum information gain—a kernel-specific complexity term—and $T$ is the number of queries. Our results significantly improve upon existing bounds. Notably, for common kernels, we show that the order-optimal sample complexities of conventional BO—achieved with richer feedback models—are recovered. In other words, the same number of preferential samples as scalar-valued samples is sufficient to find a nearly optimal solution.} }
Endnote
%0 Conference Paper %T Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds %A Aya Kayal %A Sattar Vakili %A Laura Toni %A Da-Shan Shiu %A Alberto Bernacchia %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-kayal25a %I PMLR %P 29449--29468 %U https://proceedings.mlr.press/v267/kayal25a.html %V 267 %X Bayesian optimization (BO) with preference-based feedback has recently garnered significant attention due to its emerging applications. We refer to this problem as Bayesian Optimization from Human Feedback (BOHF), which differs from conventional BO by learning the best actions from a reduced feedback model, where only the preference between two actions is revealed to the learner at each time step. The objective is to identify the best action using a limited number of preference queries, typically obtained through costly human feedback. Existing work, which adopts the Bradley-Terry-Luce (BTL) feedback model, provides regret bounds for the performance of several algorithms. In this work, within the same framework we develop tighter performance guarantees. Specifically, we derive regret bounds of $\tilde{\mathcal{O}}(\sqrt{\Gamma(T)T})$, where $\Gamma(T)$ represents the maximum information gain—a kernel-specific complexity term—and $T$ is the number of queries. Our results significantly improve upon existing bounds. Notably, for common kernels, we show that the order-optimal sample complexities of conventional BO—achieved with richer feedback models—are recovered. In other words, the same number of preferential samples as scalar-valued samples is sufficient to find a nearly optimal solution.
APA
Kayal, A., Vakili, S., Toni, L., Shiu, D. & Bernacchia, A.. (2025). Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:29449-29468 Available from https://proceedings.mlr.press/v267/kayal25a.html.

Related Material