Tractable contextual bandits beyond realizability

Sanath Kumar Krishnamurthy; Vitor Hadad; Susan Athey

Tractable contextual bandits beyond realizability

Sanath Kumar Krishnamurthy, Vitor Hadad, Susan Athey

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:1423-1431, 2021.

Abstract

Tractable contextual bandit algorithms often rely on the realizability assumption – i.e., that the true expected reward model belongs to a known class, such as linear functions. In this work, we present a tractable bandit algorithm that is not sensitive to the realizability assumption and computationally reduces to solving a constrained regression problem in every epoch. When realizability does not hold, our algorithm ensures the same guarantees on regret achieved by realizability-based algorithms under realizability, up to an additive term that accounts for the misspecification error. This extra term is proportional to T times a function of the mean squared error between the best model in the class and the true model, where T is the total number of time-steps. Our work sheds light on the bias-variance trade-off for tractable contextual bandits. This trade-off is not captured by algorithms that assume realizability, since under this assumption there exists an estimator in the class that attains zero bias.

Cite this Paper

BibTeX


@InProceedings{pmlr-v130-kumar-krishnamurthy21a,
  title = 	 { Tractable contextual bandits beyond realizability },
  author =       {Kumar Krishnamurthy, Sanath and Hadad, Vitor and Athey, Susan},
  booktitle = 	 {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1423--1431},
  year = 	 {2021},
  editor = 	 {Banerjee, Arindam and Fukumizu, Kenji},
  volume = 	 {130},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--15 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v130/kumar-krishnamurthy21a/kumar-krishnamurthy21a.pdf},
  url = 	 {https://proceedings.mlr.press/v130/kumar-krishnamurthy21a.html},
  abstract = 	 { Tractable contextual bandit algorithms often rely on the realizability assumption – i.e., that the true expected reward model belongs to a known class, such as linear functions. In this work, we present a tractable bandit algorithm that is not sensitive to the realizability assumption and computationally reduces to solving a constrained regression problem in every epoch. When realizability does not hold, our algorithm ensures the same guarantees on regret achieved by realizability-based algorithms under realizability, up to an additive term that accounts for the misspecification error. This extra term is proportional to T times a function of the mean squared error between the best model in the class and the true model, where T is the total number of time-steps. Our work sheds light on the bias-variance trade-off for tractable contextual bandits. This trade-off is not captured by algorithms that assume realizability, since under this assumption there exists an estimator in the class that attains zero bias. }
}

Endnote

%0 Conference Paper
%T  Tractable contextual bandits beyond realizability 
%A Sanath Kumar Krishnamurthy
%A Vitor Hadad
%A Susan Athey
%B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2021
%E Arindam Banerjee
%E Kenji Fukumizu	
%F pmlr-v130-kumar-krishnamurthy21a
%I PMLR
%P 1423--1431
%U https://proceedings.mlr.press/v130/kumar-krishnamurthy21a.html
%V 130
%X  Tractable contextual bandit algorithms often rely on the realizability assumption – i.e., that the true expected reward model belongs to a known class, such as linear functions. In this work, we present a tractable bandit algorithm that is not sensitive to the realizability assumption and computationally reduces to solving a constrained regression problem in every epoch. When realizability does not hold, our algorithm ensures the same guarantees on regret achieved by realizability-based algorithms under realizability, up to an additive term that accounts for the misspecification error. This extra term is proportional to T times a function of the mean squared error between the best model in the class and the true model, where T is the total number of time-steps. Our work sheds light on the bias-variance trade-off for tractable contextual bandits. This trade-off is not captured by algorithms that assume realizability, since under this assumption there exists an estimator in the class that attains zero bias.

APA


Kumar Krishnamurthy, S., Hadad, V. & Athey, S.. (2021).  Tractable contextual bandits beyond realizability . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:1423-1431 Available from https://proceedings.mlr.press/v130/kumar-krishnamurthy21a.html.

Tractable contextual bandits beyond realizability

Abstract

Cite this Paper

Related Material