Laplacian-Regularized Graph Bandits: Algorithms and Theoretical Analysis

Kaige Yang, Laura Toni, Xiaowen Dong
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:3133-3143, 2020.

Abstract

We consider a stochastic linear bandit problem with multiple users, where the relationship between users is captured by an underlying graph and user preferences are represented as smooth signals on the graph. We introduce a novel bandit algorithm where the smoothness prior is imposed via the random-walk graph Laplacian, which leads to a single-user cumulative regret scaling as $\Tilde{\mathcal{O}}(\Psi d \sqrt{T})$ with time horizon $T$, feature dimensionality $d$, and the scalar parameter $\Psi \in (0,1)$ that depends on the graph connectivity. This is an improvement over $\Tilde{\mathcal{O}}(d \sqrt{T})$ in \algo{LinUCB} \Ccite{li2010contextual}, where user relationship is not taken into account.In terms of network regret (sum of cumulative regret over $n$ users), the proposed algorithm leads to a scaling as $\Tilde{\mathcal{O}}(\Psi d\sqrt{nT})$, which is a significant improvement over $\Tilde{\mathcal{O}}(nd\sqrt{T})$ in the state-of-the-art algorithm \algo{Gob.Lin} \Ccite{cesa2013gang}. To improve scalability, we further propose a simplified algorithm with a linear computational complexity with respect to the number of users, while maintaining the same regret. Finally, we present a finite-time analysis on the proposed algorithms, and demonstrate their advantage in comparison with state-of-the-art graph-based bandit algorithms on both synthetic and real-world data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-yang20c, title = {Laplacian-Regularized Graph Bandits: Algorithms and Theoretical Analysis}, author = {Yang, Kaige and Toni, Laura and Dong, Xiaowen}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {3133--3143}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/yang20c/yang20c.pdf}, url = {https://proceedings.mlr.press/v108/yang20c.html}, abstract = {We consider a stochastic linear bandit problem with multiple users, where the relationship between users is captured by an underlying graph and user preferences are represented as smooth signals on the graph. We introduce a novel bandit algorithm where the smoothness prior is imposed via the random-walk graph Laplacian, which leads to a single-user cumulative regret scaling as $\Tilde{\mathcal{O}}(\Psi d \sqrt{T})$ with time horizon $T$, feature dimensionality $d$, and the scalar parameter $\Psi \in (0,1)$ that depends on the graph connectivity. This is an improvement over $\Tilde{\mathcal{O}}(d \sqrt{T})$ in \algo{LinUCB} \Ccite{li2010contextual}, where user relationship is not taken into account.In terms of network regret (sum of cumulative regret over $n$ users), the proposed algorithm leads to a scaling as $\Tilde{\mathcal{O}}(\Psi d\sqrt{nT})$, which is a significant improvement over $\Tilde{\mathcal{O}}(nd\sqrt{T})$ in the state-of-the-art algorithm \algo{Gob.Lin} \Ccite{cesa2013gang}. To improve scalability, we further propose a simplified algorithm with a linear computational complexity with respect to the number of users, while maintaining the same regret. Finally, we present a finite-time analysis on the proposed algorithms, and demonstrate their advantage in comparison with state-of-the-art graph-based bandit algorithms on both synthetic and real-world data.} }
Endnote
%0 Conference Paper %T Laplacian-Regularized Graph Bandits: Algorithms and Theoretical Analysis %A Kaige Yang %A Laura Toni %A Xiaowen Dong %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-yang20c %I PMLR %P 3133--3143 %U https://proceedings.mlr.press/v108/yang20c.html %V 108 %X We consider a stochastic linear bandit problem with multiple users, where the relationship between users is captured by an underlying graph and user preferences are represented as smooth signals on the graph. We introduce a novel bandit algorithm where the smoothness prior is imposed via the random-walk graph Laplacian, which leads to a single-user cumulative regret scaling as $\Tilde{\mathcal{O}}(\Psi d \sqrt{T})$ with time horizon $T$, feature dimensionality $d$, and the scalar parameter $\Psi \in (0,1)$ that depends on the graph connectivity. This is an improvement over $\Tilde{\mathcal{O}}(d \sqrt{T})$ in \algo{LinUCB} \Ccite{li2010contextual}, where user relationship is not taken into account.In terms of network regret (sum of cumulative regret over $n$ users), the proposed algorithm leads to a scaling as $\Tilde{\mathcal{O}}(\Psi d\sqrt{nT})$, which is a significant improvement over $\Tilde{\mathcal{O}}(nd\sqrt{T})$ in the state-of-the-art algorithm \algo{Gob.Lin} \Ccite{cesa2013gang}. To improve scalability, we further propose a simplified algorithm with a linear computational complexity with respect to the number of users, while maintaining the same regret. Finally, we present a finite-time analysis on the proposed algorithms, and demonstrate their advantage in comparison with state-of-the-art graph-based bandit algorithms on both synthetic and real-world data.
APA
Yang, K., Toni, L. & Dong, X.. (2020). Laplacian-Regularized Graph Bandits: Algorithms and Theoretical Analysis. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:3133-3143 Available from https://proceedings.mlr.press/v108/yang20c.html.

Related Material