Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis

Vidyashankar Sivakumar, Steven Wu, Arindam Banerjee
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:9026-9035, 2020.

Abstract

Bandit learning algorithms typically involve the balance of exploration and exploitation. However, in many practical applications, worst-case scenarios needing systematic exploration are seldom encountered. In this work, we consider a smoothed setting for structured linear contextual bandits where the adversarial contexts are perturbed by Gaussian noise and the unknown parameter $\theta^*$ has structure, e.g., sparsity, group sparsity, low rank, etc. We propose simple greedy algorithms for both the single- and multi-parameter (i.e., different parameter for each context) settings and provide a unified regret analysis for $\theta^*$ with any assumed structure. The regret bounds are expressed in terms of geometric quantities such as Gaussian widths associated with the structure of $\theta^*$. We also obtain sharper regret bounds compared to earlier work for the unstructured $\theta^*$ setting as a consequence of our improved analysis. We show there is implicit exploration in the smoothed setting where a simple greedy algorithm works.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-sivakumar20a, title = {Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis}, author = {Sivakumar, Vidyashankar and Wu, Steven and Banerjee, Arindam}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {9026--9035}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/sivakumar20a/sivakumar20a.pdf}, url = {https://proceedings.mlr.press/v119/sivakumar20a.html}, abstract = {Bandit learning algorithms typically involve the balance of exploration and exploitation. However, in many practical applications, worst-case scenarios needing systematic exploration are seldom encountered. In this work, we consider a smoothed setting for structured linear contextual bandits where the adversarial contexts are perturbed by Gaussian noise and the unknown parameter $\theta^*$ has structure, e.g., sparsity, group sparsity, low rank, etc. We propose simple greedy algorithms for both the single- and multi-parameter (i.e., different parameter for each context) settings and provide a unified regret analysis for $\theta^*$ with any assumed structure. The regret bounds are expressed in terms of geometric quantities such as Gaussian widths associated with the structure of $\theta^*$. We also obtain sharper regret bounds compared to earlier work for the unstructured $\theta^*$ setting as a consequence of our improved analysis. We show there is implicit exploration in the smoothed setting where a simple greedy algorithm works.} }
Endnote
%0 Conference Paper %T Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis %A Vidyashankar Sivakumar %A Steven Wu %A Arindam Banerjee %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-sivakumar20a %I PMLR %P 9026--9035 %U https://proceedings.mlr.press/v119/sivakumar20a.html %V 119 %X Bandit learning algorithms typically involve the balance of exploration and exploitation. However, in many practical applications, worst-case scenarios needing systematic exploration are seldom encountered. In this work, we consider a smoothed setting for structured linear contextual bandits where the adversarial contexts are perturbed by Gaussian noise and the unknown parameter $\theta^*$ has structure, e.g., sparsity, group sparsity, low rank, etc. We propose simple greedy algorithms for both the single- and multi-parameter (i.e., different parameter for each context) settings and provide a unified regret analysis for $\theta^*$ with any assumed structure. The regret bounds are expressed in terms of geometric quantities such as Gaussian widths associated with the structure of $\theta^*$. We also obtain sharper regret bounds compared to earlier work for the unstructured $\theta^*$ setting as a consequence of our improved analysis. We show there is implicit exploration in the smoothed setting where a simple greedy algorithm works.
APA
Sivakumar, V., Wu, S. & Banerjee, A.. (2020). Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:9026-9035 Available from https://proceedings.mlr.press/v119/sivakumar20a.html.

Related Material