[edit]
Be Greedy: How Chromatic Number meets Regret Minimization in Graph Bandits
Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, PMLR 115:595-605, 2020.
Abstract
We study the classical linear bandit problem on \emph{graphs} modelling arm rewards through an underlying graph structure G(N,E) such that rewards of neighboring nodes are similar. Previous attempts along this line have primarily considered the arm rewards to be a smooth function over graph Laplacian, which however failed to characterize the inherent problem complexity in terms of the graph structure. We bridge this gap by showing a regret guarantee of \tO(χ(¯G)√T) (\tO(⋅) notation hides dependencies on logT), that scales only with the chromatic number of the complement graph χ(¯G), assuming the rewards to be a smooth function over a general class of graph embeddings—\emph{Orthonormal Representations}. Our proposed algorithms yield a regret guarantee of ˜O(r√T) for any general embedding of rank r. Furthermore, if the rewards correspond to a minimum rank embedding, the regret boils down to O(χ(¯G)√T)—none of the existing works were able to bring out such influences of graph structures over arm rewards. Finally noting that computing the above minimum rank embedding is NP-Hard, we also propose an alternative O(N+|E|) time computable embedding scheme—{\it Greedy Embeddings}—based on greedy graph coloring, on which our algorithms perform optimally on a large family of graphs, e.g. union of cliques, complement of k-colorable graphs, regular graphs etc, and are also shown to outperform the state-of-the-art methods on real datasets. Our findings open up new roads for exploiting graph structures on regret performance.