Linear Bandit Algorithms with Sublinear Time Complexity

Shuo Yang, Tongzheng Ren, Sanjay Shakkottai, Eric Price, Inderjit S. Dhillon, Sujay Sanghavi
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:25241-25260, 2022.

Abstract

We propose two linear bandits algorithms with per-step complexity sublinear in the number of arms $K$. The algorithms are designed for applications where the arm set is extremely large and slowly changing. Our key realization is that choosing an arm reduces to a maximum inner product search (MIPS) problem, which can be solved approximately without breaking regret guarantees. Existing approximate MIPS solvers run in sublinear time. We extend those solvers and present theoretical guarantees for online learning problems, where adaptivity (i.e., a later step depends on the feedback in previous steps) becomes a unique challenge. We then explicitly characterize the tradeoff between the per-step complexity and regret. For sufficiently large $K$, our algorithms have sublinear per-step complexity and $\widetilde O(\sqrt{T})$ regret. Empirically, we evaluate our proposed algorithms in a synthetic environment and a real-world online movie recommendation problem. Our proposed algorithms can deliver a more than 72 times speedup compared to the linear time baselines while retaining similar regret.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-yang22m, title = {Linear Bandit Algorithms with Sublinear Time Complexity}, author = {Yang, Shuo and Ren, Tongzheng and Shakkottai, Sanjay and Price, Eric and Dhillon, Inderjit S. and Sanghavi, Sujay}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {25241--25260}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/yang22m/yang22m.pdf}, url = {https://proceedings.mlr.press/v162/yang22m.html}, abstract = {We propose two linear bandits algorithms with per-step complexity sublinear in the number of arms $K$. The algorithms are designed for applications where the arm set is extremely large and slowly changing. Our key realization is that choosing an arm reduces to a maximum inner product search (MIPS) problem, which can be solved approximately without breaking regret guarantees. Existing approximate MIPS solvers run in sublinear time. We extend those solvers and present theoretical guarantees for online learning problems, where adaptivity (i.e., a later step depends on the feedback in previous steps) becomes a unique challenge. We then explicitly characterize the tradeoff between the per-step complexity and regret. For sufficiently large $K$, our algorithms have sublinear per-step complexity and $\widetilde O(\sqrt{T})$ regret. Empirically, we evaluate our proposed algorithms in a synthetic environment and a real-world online movie recommendation problem. Our proposed algorithms can deliver a more than 72 times speedup compared to the linear time baselines while retaining similar regret.} }
Endnote
%0 Conference Paper %T Linear Bandit Algorithms with Sublinear Time Complexity %A Shuo Yang %A Tongzheng Ren %A Sanjay Shakkottai %A Eric Price %A Inderjit S. Dhillon %A Sujay Sanghavi %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-yang22m %I PMLR %P 25241--25260 %U https://proceedings.mlr.press/v162/yang22m.html %V 162 %X We propose two linear bandits algorithms with per-step complexity sublinear in the number of arms $K$. The algorithms are designed for applications where the arm set is extremely large and slowly changing. Our key realization is that choosing an arm reduces to a maximum inner product search (MIPS) problem, which can be solved approximately without breaking regret guarantees. Existing approximate MIPS solvers run in sublinear time. We extend those solvers and present theoretical guarantees for online learning problems, where adaptivity (i.e., a later step depends on the feedback in previous steps) becomes a unique challenge. We then explicitly characterize the tradeoff between the per-step complexity and regret. For sufficiently large $K$, our algorithms have sublinear per-step complexity and $\widetilde O(\sqrt{T})$ regret. Empirically, we evaluate our proposed algorithms in a synthetic environment and a real-world online movie recommendation problem. Our proposed algorithms can deliver a more than 72 times speedup compared to the linear time baselines while retaining similar regret.
APA
Yang, S., Ren, T., Shakkottai, S., Price, E., Dhillon, I.S. & Sanghavi, S.. (2022). Linear Bandit Algorithms with Sublinear Time Complexity. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:25241-25260 Available from https://proceedings.mlr.press/v162/yang22m.html.

Related Material