OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits

Niladri Chatterji; Vidya Muthukumar; Peter Bartlett

OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits

Niladri Chatterji, Vidya Muthukumar, Peter Bartlett

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:1844-1854, 2020.

Abstract

We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information. Algorithms that are designed solely for one of the regimes are known to be sub-optimal for their alternate regime. We design a single computationally efficient algorithm that simultaneously obtains problem-dependent optimal regret rates in the simple multi-armed bandit regime and minimax optimal regret rates in the linear contextual bandit regime, without knowing a priori which of the two models generates the rewards. These results are proved under the condition of stochasticity of contextual information over multiple rounds. Our results should be viewed as a step towards principled data-dependent policy class selection for contextual bandits.

Cite this Paper

BibTeX


@InProceedings{pmlr-v108-chatterji20b,
  title = 	 {OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits},
  author =       {Chatterji, Niladri and Muthukumar, Vidya and Bartlett, Peter},
  booktitle = 	 {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1844--1854},
  year = 	 {2020},
  editor = 	 {Chiappa, Silvia and Calandra, Roberto},
  volume = 	 {108},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {26--28 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v108/chatterji20b/chatterji20b.pdf},
  url = 	 {https://proceedings.mlr.press/v108/chatterji20b.html},
  abstract = 	 {We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information. Algorithms that are designed solely for one of the regimes are known to be sub-optimal for their alternate regime. We design a single computationally efficient algorithm that simultaneously obtains problem-dependent optimal regret rates in the simple multi-armed bandit regime and minimax optimal regret rates in the linear contextual bandit regime, without knowing a priori which of the two models generates the rewards. These results are proved under the condition of stochasticity of contextual information over multiple rounds. Our results should be viewed as a step towards principled data-dependent policy class selection for contextual bandits.}
}

Endnote

%0 Conference Paper
%T OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits
%A Niladri Chatterji
%A Vidya Muthukumar
%A Peter Bartlett
%B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2020
%E Silvia Chiappa
%E Roberto Calandra	
%F pmlr-v108-chatterji20b
%I PMLR
%P 1844--1854
%U https://proceedings.mlr.press/v108/chatterji20b.html
%V 108
%X We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information. Algorithms that are designed solely for one of the regimes are known to be sub-optimal for their alternate regime. We design a single computationally efficient algorithm that simultaneously obtains problem-dependent optimal regret rates in the simple multi-armed bandit regime and minimax optimal regret rates in the linear contextual bandit regime, without knowing a priori which of the two models generates the rewards. These results are proved under the condition of stochasticity of contextual information over multiple rounds. Our results should be viewed as a step towards principled data-dependent policy class selection for contextual bandits.

APA


Chatterji, N., Muthukumar, V. & Bartlett, P.. (2020). OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:1844-1854 Available from https://proceedings.mlr.press/v108/chatterji20b.html.

OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits

Abstract

Cite this Paper

Related Material