Solving Multi-Arm Bandit Using a Few Bits of Communication

Osama A. Hanna, Lin Yang, Christina Fragouli
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:11215-11236, 2022.

Abstract

The multi-armed bandit (MAB) problem is an active learning framework that aims to select the best among a set of actions by sequentially observing rewards. Recently, it has become popular for a number of applications over wireless networks, where communication constraints can form a bottleneck. Existing works usually fail to address this issue and can become infeasible in certain applications. In this paper we address the communication problem by optimizing the communication of rewards collected by distributed agents. By providing nearly matching upper and lower bounds, we tightly characterize the number of bits needed per reward for the learner to accurately learn without suffering additional regret. In particular, we establish a generic reward quantization algorithm, QuBan, that can be applied on top of any (no-regret) MAB algorithm to form a new communication-efficient counterpart, that requires only a few (as low as 3) bits to be sent per iteration while preserving the same regret bound. Our lower bound is established via constructing hard instances from a subgaussian distribution. Our theory is further corroborated by numerically experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-hanna22a, title = { Solving Multi-Arm Bandit Using a Few Bits of Communication }, author = {Hanna, Osama A. and Yang, Lin and Fragouli, Christina}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {11215--11236}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/hanna22a/hanna22a.pdf}, url = {https://proceedings.mlr.press/v151/hanna22a.html}, abstract = { The multi-armed bandit (MAB) problem is an active learning framework that aims to select the best among a set of actions by sequentially observing rewards. Recently, it has become popular for a number of applications over wireless networks, where communication constraints can form a bottleneck. Existing works usually fail to address this issue and can become infeasible in certain applications. In this paper we address the communication problem by optimizing the communication of rewards collected by distributed agents. By providing nearly matching upper and lower bounds, we tightly characterize the number of bits needed per reward for the learner to accurately learn without suffering additional regret. In particular, we establish a generic reward quantization algorithm, QuBan, that can be applied on top of any (no-regret) MAB algorithm to form a new communication-efficient counterpart, that requires only a few (as low as 3) bits to be sent per iteration while preserving the same regret bound. Our lower bound is established via constructing hard instances from a subgaussian distribution. Our theory is further corroborated by numerically experiments. } }
Endnote
%0 Conference Paper %T Solving Multi-Arm Bandit Using a Few Bits of Communication %A Osama A. Hanna %A Lin Yang %A Christina Fragouli %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-hanna22a %I PMLR %P 11215--11236 %U https://proceedings.mlr.press/v151/hanna22a.html %V 151 %X The multi-armed bandit (MAB) problem is an active learning framework that aims to select the best among a set of actions by sequentially observing rewards. Recently, it has become popular for a number of applications over wireless networks, where communication constraints can form a bottleneck. Existing works usually fail to address this issue and can become infeasible in certain applications. In this paper we address the communication problem by optimizing the communication of rewards collected by distributed agents. By providing nearly matching upper and lower bounds, we tightly characterize the number of bits needed per reward for the learner to accurately learn without suffering additional regret. In particular, we establish a generic reward quantization algorithm, QuBan, that can be applied on top of any (no-regret) MAB algorithm to form a new communication-efficient counterpart, that requires only a few (as low as 3) bits to be sent per iteration while preserving the same regret bound. Our lower bound is established via constructing hard instances from a subgaussian distribution. Our theory is further corroborated by numerically experiments.
APA
Hanna, O.A., Yang, L. & Fragouli, C.. (2022). Solving Multi-Arm Bandit Using a Few Bits of Communication . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:11215-11236 Available from https://proceedings.mlr.press/v151/hanna22a.html.

Related Material