Tractable contextual bandits beyond realizability

Sanath Kumar Krishnamurthy, Vitor Hadad, Susan Athey
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:1423-1431, 2021.

Abstract

Tractable contextual bandit algorithms often rely on the realizability assumption – i.e., that the true expected reward model belongs to a known class, such as linear functions. In this work, we present a tractable bandit algorithm that is not sensitive to the realizability assumption and computationally reduces to solving a constrained regression problem in every epoch. When realizability does not hold, our algorithm ensures the same guarantees on regret achieved by realizability-based algorithms under realizability, up to an additive term that accounts for the misspecification error. This extra term is proportional to T times a function of the mean squared error between the best model in the class and the true model, where T is the total number of time-steps. Our work sheds light on the bias-variance trade-off for tractable contextual bandits. This trade-off is not captured by algorithms that assume realizability, since under this assumption there exists an estimator in the class that attains zero bias.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-kumar-krishnamurthy21a, title = { Tractable contextual bandits beyond realizability }, author = {Kumar Krishnamurthy, Sanath and Hadad, Vitor and Athey, Susan}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {1423--1431}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/kumar-krishnamurthy21a/kumar-krishnamurthy21a.pdf}, url = {https://proceedings.mlr.press/v130/kumar-krishnamurthy21a.html}, abstract = { Tractable contextual bandit algorithms often rely on the realizability assumption – i.e., that the true expected reward model belongs to a known class, such as linear functions. In this work, we present a tractable bandit algorithm that is not sensitive to the realizability assumption and computationally reduces to solving a constrained regression problem in every epoch. When realizability does not hold, our algorithm ensures the same guarantees on regret achieved by realizability-based algorithms under realizability, up to an additive term that accounts for the misspecification error. This extra term is proportional to T times a function of the mean squared error between the best model in the class and the true model, where T is the total number of time-steps. Our work sheds light on the bias-variance trade-off for tractable contextual bandits. This trade-off is not captured by algorithms that assume realizability, since under this assumption there exists an estimator in the class that attains zero bias. } }
Endnote
%0 Conference Paper %T Tractable contextual bandits beyond realizability %A Sanath Kumar Krishnamurthy %A Vitor Hadad %A Susan Athey %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-kumar-krishnamurthy21a %I PMLR %P 1423--1431 %U https://proceedings.mlr.press/v130/kumar-krishnamurthy21a.html %V 130 %X Tractable contextual bandit algorithms often rely on the realizability assumption – i.e., that the true expected reward model belongs to a known class, such as linear functions. In this work, we present a tractable bandit algorithm that is not sensitive to the realizability assumption and computationally reduces to solving a constrained regression problem in every epoch. When realizability does not hold, our algorithm ensures the same guarantees on regret achieved by realizability-based algorithms under realizability, up to an additive term that accounts for the misspecification error. This extra term is proportional to T times a function of the mean squared error between the best model in the class and the true model, where T is the total number of time-steps. Our work sheds light on the bias-variance trade-off for tractable contextual bandits. This trade-off is not captured by algorithms that assume realizability, since under this assumption there exists an estimator in the class that attains zero bias.
APA
Kumar Krishnamurthy, S., Hadad, V. & Athey, S.. (2021). Tractable contextual bandits beyond realizability . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:1423-1431 Available from https://proceedings.mlr.press/v130/kumar-krishnamurthy21a.html.

Related Material