Optimal Algorithms for Latent Bandits with Cluster Structure

Soumyabrata Pal, Arun Sai Suggala, Karthikeyan Shanmugam, Prateek Jain
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:7540-7577, 2023.

Abstract

We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem. These users are grouped into latent clusters such that the mean reward vectors of users within the same cluster are identical. At each round, a user, selected uniformly at random, pulls an arm and observes a corresponding noisy reward. The goal of the users is to maximize their cumulative rewards. This problem is central to practical recommendation systems and has received wide attention of late Gentile et al. (2014), Maillard and Mannor (2014). Now, if each user acts independently, then they would have to explore each arm independently and a regret of $\Omega(\sqrt{\mathrm{MNT}})$ is unavoidable, where M, N are the number of arms and users, respectively. Instead, we propose LATTICE (Latent bAndiTs via maTrIx ComplEtion) which allows exploration of the latent cluster structure to provide the minimax optimal regret of $\widetilde{O}(\sqrt{(M+N)T})$ when the number of clusters is $\tilde O(1)$. This is the first algorithm to guarantee such strong regret bound. LATTICE is based on a careful exploitation of arm information within a cluster while simultaneously clustering users. Furthermore, it is computationally efficient and requires only $O(\log \mathrm{T})$ calls to an offline matrix completion oracle across all T rounds.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-pal23a, title = {Optimal Algorithms for Latent Bandits with Cluster Structure}, author = {Pal, Soumyabrata and Sai Suggala, Arun and Shanmugam, Karthikeyan and Jain, Prateek}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {7540--7577}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/pal23a/pal23a.pdf}, url = {https://proceedings.mlr.press/v206/pal23a.html}, abstract = {We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem. These users are grouped into latent clusters such that the mean reward vectors of users within the same cluster are identical. At each round, a user, selected uniformly at random, pulls an arm and observes a corresponding noisy reward. The goal of the users is to maximize their cumulative rewards. This problem is central to practical recommendation systems and has received wide attention of late Gentile et al. (2014), Maillard and Mannor (2014). Now, if each user acts independently, then they would have to explore each arm independently and a regret of $\Omega(\sqrt{\mathrm{MNT}})$ is unavoidable, where M, N are the number of arms and users, respectively. Instead, we propose LATTICE (Latent bAndiTs via maTrIx ComplEtion) which allows exploration of the latent cluster structure to provide the minimax optimal regret of $\widetilde{O}(\sqrt{(M+N)T})$ when the number of clusters is $\tilde O(1)$. This is the first algorithm to guarantee such strong regret bound. LATTICE is based on a careful exploitation of arm information within a cluster while simultaneously clustering users. Furthermore, it is computationally efficient and requires only $O(\log \mathrm{T})$ calls to an offline matrix completion oracle across all T rounds.} }
Endnote
%0 Conference Paper %T Optimal Algorithms for Latent Bandits with Cluster Structure %A Soumyabrata Pal %A Arun Sai Suggala %A Karthikeyan Shanmugam %A Prateek Jain %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-pal23a %I PMLR %P 7540--7577 %U https://proceedings.mlr.press/v206/pal23a.html %V 206 %X We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem. These users are grouped into latent clusters such that the mean reward vectors of users within the same cluster are identical. At each round, a user, selected uniformly at random, pulls an arm and observes a corresponding noisy reward. The goal of the users is to maximize their cumulative rewards. This problem is central to practical recommendation systems and has received wide attention of late Gentile et al. (2014), Maillard and Mannor (2014). Now, if each user acts independently, then they would have to explore each arm independently and a regret of $\Omega(\sqrt{\mathrm{MNT}})$ is unavoidable, where M, N are the number of arms and users, respectively. Instead, we propose LATTICE (Latent bAndiTs via maTrIx ComplEtion) which allows exploration of the latent cluster structure to provide the minimax optimal regret of $\widetilde{O}(\sqrt{(M+N)T})$ when the number of clusters is $\tilde O(1)$. This is the first algorithm to guarantee such strong regret bound. LATTICE is based on a careful exploitation of arm information within a cluster while simultaneously clustering users. Furthermore, it is computationally efficient and requires only $O(\log \mathrm{T})$ calls to an offline matrix completion oracle across all T rounds.
APA
Pal, S., Sai Suggala, A., Shanmugam, K. & Jain, P.. (2023). Optimal Algorithms for Latent Bandits with Cluster Structure. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:7540-7577 Available from https://proceedings.mlr.press/v206/pal23a.html.

Related Material