Problem-dependent Regret Bounds for Online Learning with Feedback Graphs

Bingshan Hu, Nishant A. Mehta, Jianping Pan
Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, PMLR 115:852-861, 2020.

Abstract

This paper addresses the stochastic multi-armed bandit problem with an undirected feedback graph. We devise a UCB-based algorithm, UCB-NE, to provide a problem-dependent regret bound that depends on a clique covering. Our algorithm obtains regret which provably scales linearly with the clique covering number. Additionally, we provide problem-dependent regret bounds for a Thompson Sampling-based algorithm, TS-N, where again the bounds are linear in the clique covering number. Finally, we present experimental results to see how UCB-NE, TS-N, and a few related algorithms perform practically.

Cite this Paper


BibTeX
@InProceedings{pmlr-v115-hu20b, title = {Problem-dependent Regret Bounds for Online Learning with Feedback Graphs}, author = {Hu, Bingshan and Mehta, Nishant A. and Pan, Jianping}, booktitle = {Proceedings of The 35th Uncertainty in Artificial Intelligence Conference}, pages = {852--861}, year = {2020}, editor = {Adams, Ryan P. and Gogate, Vibhav}, volume = {115}, series = {Proceedings of Machine Learning Research}, month = {22--25 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v115/hu20b/hu20b.pdf}, url = {https://proceedings.mlr.press/v115/hu20b.html}, abstract = {This paper addresses the stochastic multi-armed bandit problem with an undirected feedback graph. We devise a UCB-based algorithm, UCB-NE, to provide a problem-dependent regret bound that depends on a clique covering. Our algorithm obtains regret which provably scales linearly with the clique covering number. Additionally, we provide problem-dependent regret bounds for a Thompson Sampling-based algorithm, TS-N, where again the bounds are linear in the clique covering number. Finally, we present experimental results to see how UCB-NE, TS-N, and a few related algorithms perform practically.} }
Endnote
%0 Conference Paper %T Problem-dependent Regret Bounds for Online Learning with Feedback Graphs %A Bingshan Hu %A Nishant A. Mehta %A Jianping Pan %B Proceedings of The 35th Uncertainty in Artificial Intelligence Conference %C Proceedings of Machine Learning Research %D 2020 %E Ryan P. Adams %E Vibhav Gogate %F pmlr-v115-hu20b %I PMLR %P 852--861 %U https://proceedings.mlr.press/v115/hu20b.html %V 115 %X This paper addresses the stochastic multi-armed bandit problem with an undirected feedback graph. We devise a UCB-based algorithm, UCB-NE, to provide a problem-dependent regret bound that depends on a clique covering. Our algorithm obtains regret which provably scales linearly with the clique covering number. Additionally, we provide problem-dependent regret bounds for a Thompson Sampling-based algorithm, TS-N, where again the bounds are linear in the clique covering number. Finally, we present experimental results to see how UCB-NE, TS-N, and a few related algorithms perform practically.
APA
Hu, B., Mehta, N.A. & Pan, J.. (2020). Problem-dependent Regret Bounds for Online Learning with Feedback Graphs. Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, in Proceedings of Machine Learning Research 115:852-861 Available from https://proceedings.mlr.press/v115/hu20b.html.

Related Material