Multi-Player Approaches for Dueling Bandits

Or Raveh, Junya Honda, Masashi Sugiyama
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:1540-1548, 2025.

Abstract

Fine-tuning large deep networks with preference-based human feedback has seen promising results. As user numbers grow and tasks shift to complex datasets like images or videos, distributed approaches become essential for efficiently gathering feedback. To address this, we introduce a multiplayer dueling bandit problem, highlighting that exploring non-informative candidate pairs becomes especially challenging in a collaborative environment. We demonstrate that the use of a Follow Your Leader black-box approach matches the asymptotic regret lower-bound when utilizing known dueling bandit algorithms as a foundation. Additionally, we propose and analyze a message-passing fully distributed approach with a novel Condorcet-Winner recommendation protocol, resulting in expedited exploration in the non-asymptotic regime which reduces regret. Our experimental comparisons reveal that our multiplayer algorithms surpass single-player benchmark algorithms, underscoring their efficacy in addressing the nuanced challenges of this setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-raveh25a, title = {Multi-Player Approaches for Dueling Bandits}, author = {Raveh, Or and Honda, Junya and Sugiyama, Masashi}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {1540--1548}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/raveh25a/raveh25a.pdf}, url = {https://proceedings.mlr.press/v258/raveh25a.html}, abstract = {Fine-tuning large deep networks with preference-based human feedback has seen promising results. As user numbers grow and tasks shift to complex datasets like images or videos, distributed approaches become essential for efficiently gathering feedback. To address this, we introduce a multiplayer dueling bandit problem, highlighting that exploring non-informative candidate pairs becomes especially challenging in a collaborative environment. We demonstrate that the use of a Follow Your Leader black-box approach matches the asymptotic regret lower-bound when utilizing known dueling bandit algorithms as a foundation. Additionally, we propose and analyze a message-passing fully distributed approach with a novel Condorcet-Winner recommendation protocol, resulting in expedited exploration in the non-asymptotic regime which reduces regret. Our experimental comparisons reveal that our multiplayer algorithms surpass single-player benchmark algorithms, underscoring their efficacy in addressing the nuanced challenges of this setting.} }
Endnote
%0 Conference Paper %T Multi-Player Approaches for Dueling Bandits %A Or Raveh %A Junya Honda %A Masashi Sugiyama %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-raveh25a %I PMLR %P 1540--1548 %U https://proceedings.mlr.press/v258/raveh25a.html %V 258 %X Fine-tuning large deep networks with preference-based human feedback has seen promising results. As user numbers grow and tasks shift to complex datasets like images or videos, distributed approaches become essential for efficiently gathering feedback. To address this, we introduce a multiplayer dueling bandit problem, highlighting that exploring non-informative candidate pairs becomes especially challenging in a collaborative environment. We demonstrate that the use of a Follow Your Leader black-box approach matches the asymptotic regret lower-bound when utilizing known dueling bandit algorithms as a foundation. Additionally, we propose and analyze a message-passing fully distributed approach with a novel Condorcet-Winner recommendation protocol, resulting in expedited exploration in the non-asymptotic regime which reduces regret. Our experimental comparisons reveal that our multiplayer algorithms surpass single-player benchmark algorithms, underscoring their efficacy in addressing the nuanced challenges of this setting.
APA
Raveh, O., Honda, J. & Sugiyama, M.. (2025). Multi-Player Approaches for Dueling Bandits. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:1540-1548 Available from https://proceedings.mlr.press/v258/raveh25a.html.

Related Material