[edit]
Stochastic Graphical Bandits with Heavy-Tailed Rewards
Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:734-744, 2023.
Abstract
We consider stochastic graphical bandits, where after pulling an arm, the decision maker observes rewards of not only the chosen arm but also its neighbors in a feedback graph. Most of existing work assumes that the rewards are drawn from bounded or at least sub-Gaussian distributions, which however may be violated in many practical scenarios such as social advertising and financial markets. To settle this issue, we investigate stochastic graphical bandits with heavy-tailed rewards, where the distributions have finite moments of order $1+\epsilon$, for some $\epsilon\in(0, 1]$. Firstly, we develop one UCB-type algorithm, whose expected regret is upper bounded by a sum of gap-based quantities over the clique covering of the feedback graph. The key idea is to estimate the reward means of the selected arm’s neighbors by more refined robust estimators, and to construct a graph-based upper confidence bound for selecting candidates. Secondly, we design another elimination-based strategy and improve the regret bound to a gap-based sum with size controlled by the independence number of the feedback graph. For benign graphs, the independence number could be smaller than the size of the clique covering, resulting in tighter regret bounds. Finally, we conduct experiments on synthetic data to demonstrate the effectiveness of our methods.