Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits

Wonyoung Kim, Myunghee Cho Paik, Min-Hwan Oh
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:3098-3124, 2023.

Abstract

We propose a linear contextual bandit algorithm for linear contextual bandits with $O(\sqrt{dT \log T})$ regret bound, where $d$ is the dimension of contexts and $T$ is the time horizon. Our proposed algorithm is equipped with a novel estimator in which exploration is embedded through explicit randomization. Depending on the randomization, our proposed estimator takes contribution either from contexts of all arms or from selected contexts. We establish a self-normalized bound for our estimator, which allows a novel decomposition of the cumulative regret into additive dimension-dependent terms instead of multiplicative terms. We also prove a novel lower bound of $\Omega(\sqrt{dT})$ under our problem setting. Hence, the regret of our proposed algorithm matches the lower bound up to logarithmic factors. The numerical experiments support the theoretical guarantees and show that our proposed method outperforms the existing linear bandit algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-kim23d, title = {Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits}, author = {Kim, Wonyoung and Paik, Myunghee Cho and Oh, Min-Hwan}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {3098--3124}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/kim23d/kim23d.pdf}, url = {https://proceedings.mlr.press/v206/kim23d.html}, abstract = {We propose a linear contextual bandit algorithm for linear contextual bandits with $O(\sqrt{dT \log T})$ regret bound, where $d$ is the dimension of contexts and $T$ is the time horizon. Our proposed algorithm is equipped with a novel estimator in which exploration is embedded through explicit randomization. Depending on the randomization, our proposed estimator takes contribution either from contexts of all arms or from selected contexts. We establish a self-normalized bound for our estimator, which allows a novel decomposition of the cumulative regret into additive dimension-dependent terms instead of multiplicative terms. We also prove a novel lower bound of $\Omega(\sqrt{dT})$ under our problem setting. Hence, the regret of our proposed algorithm matches the lower bound up to logarithmic factors. The numerical experiments support the theoretical guarantees and show that our proposed method outperforms the existing linear bandit algorithms.} }
Endnote
%0 Conference Paper %T Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits %A Wonyoung Kim %A Myunghee Cho Paik %A Min-Hwan Oh %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-kim23d %I PMLR %P 3098--3124 %U https://proceedings.mlr.press/v206/kim23d.html %V 206 %X We propose a linear contextual bandit algorithm for linear contextual bandits with $O(\sqrt{dT \log T})$ regret bound, where $d$ is the dimension of contexts and $T$ is the time horizon. Our proposed algorithm is equipped with a novel estimator in which exploration is embedded through explicit randomization. Depending on the randomization, our proposed estimator takes contribution either from contexts of all arms or from selected contexts. We establish a self-normalized bound for our estimator, which allows a novel decomposition of the cumulative regret into additive dimension-dependent terms instead of multiplicative terms. We also prove a novel lower bound of $\Omega(\sqrt{dT})$ under our problem setting. Hence, the regret of our proposed algorithm matches the lower bound up to logarithmic factors. The numerical experiments support the theoretical guarantees and show that our proposed method outperforms the existing linear bandit algorithms.
APA
Kim, W., Paik, M.C. & Oh, M.. (2023). Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:3098-3124 Available from https://proceedings.mlr.press/v206/kim23d.html.

Related Material