Order-Optimal Regret in Distributed Kernel Bandits using Uniform Sampling with Shared Randomness

Nikola Pavlovic, Sudeep Salgia, Qing Zhao
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:4699-4707, 2025.

Abstract

We consider distributed kernel bandits where $N$ agents aim to collaboratively maximize an unknown reward function that lies in a reproducing kernel Hilbert space. Each agent sequentially queries the function to obtain noisy observations at the query points. Agents can share information through a central server, with the objective of minimizing regret that is accumulating over time $T$ and aggregating over agents. We develop the first algorithm that achieves the optimal regret order (as defined by centralized learning) with a communication cost that is sublinear in both $N$ and $T$. The key features of the proposed algorithm are the uniform exploration at the local agents and shared randomness with the central server. Working together with the sparse approximation of the GP model, these two key components make it possible to preserve the learning rate of the centralized setting at a diminishing rate of communication.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-pavlovic25b, title = {Order-Optimal Regret in Distributed Kernel Bandits using Uniform Sampling with Shared Randomness}, author = {Pavlovic, Nikola and Salgia, Sudeep and Zhao, Qing}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {4699--4707}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/pavlovic25b/pavlovic25b.pdf}, url = {https://proceedings.mlr.press/v258/pavlovic25b.html}, abstract = {We consider distributed kernel bandits where $N$ agents aim to collaboratively maximize an unknown reward function that lies in a reproducing kernel Hilbert space. Each agent sequentially queries the function to obtain noisy observations at the query points. Agents can share information through a central server, with the objective of minimizing regret that is accumulating over time $T$ and aggregating over agents. We develop the first algorithm that achieves the optimal regret order (as defined by centralized learning) with a communication cost that is sublinear in both $N$ and $T$. The key features of the proposed algorithm are the uniform exploration at the local agents and shared randomness with the central server. Working together with the sparse approximation of the GP model, these two key components make it possible to preserve the learning rate of the centralized setting at a diminishing rate of communication.} }
Endnote
%0 Conference Paper %T Order-Optimal Regret in Distributed Kernel Bandits using Uniform Sampling with Shared Randomness %A Nikola Pavlovic %A Sudeep Salgia %A Qing Zhao %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-pavlovic25b %I PMLR %P 4699--4707 %U https://proceedings.mlr.press/v258/pavlovic25b.html %V 258 %X We consider distributed kernel bandits where $N$ agents aim to collaboratively maximize an unknown reward function that lies in a reproducing kernel Hilbert space. Each agent sequentially queries the function to obtain noisy observations at the query points. Agents can share information through a central server, with the objective of minimizing regret that is accumulating over time $T$ and aggregating over agents. We develop the first algorithm that achieves the optimal regret order (as defined by centralized learning) with a communication cost that is sublinear in both $N$ and $T$. The key features of the proposed algorithm are the uniform exploration at the local agents and shared randomness with the central server. Working together with the sparse approximation of the GP model, these two key components make it possible to preserve the learning rate of the centralized setting at a diminishing rate of communication.
APA
Pavlovic, N., Salgia, S. & Zhao, Q.. (2025). Order-Optimal Regret in Distributed Kernel Bandits using Uniform Sampling with Shared Randomness. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:4699-4707 Available from https://proceedings.mlr.press/v258/pavlovic25b.html.

Related Material