Semiparametric Contextual Bandits

Akshay Krishnamurthy; Zhiwei Steven Wu; Vasilis Syrgkanis

Semiparametric Contextual Bandits

Akshay Krishnamurthy, Zhiwei Steven Wu, Vasilis Syrgkanis

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2776-2785, 2018.

Abstract

This paper studies semiparametric contextual bandits, a generalization of the linear stochastic bandit problem where the reward for a chosen action is modeled as a linear function of known action features confounded by a non-linear action-independent term. We design new algorithms that achieve $\tilde{O}(d\sqrt{T})$ regret over $T$ rounds, when the linear function is $d$-dimensional, which matches the best known bounds for the simpler unconfounded case and improves on a recent result of Greenwald et al. (2017). Via an empirical evaluation, we show that our algorithms outperform prior approaches when there are non-linear confounding effects on the rewards. Technically, our algorithms use a new reward estimator inspired by doubly-robust approaches and our proofs require new concentration inequalities for self-normalized martingales.

Cite this Paper

BibTeX


@InProceedings{pmlr-v80-krishnamurthy18a,
  title = 	 {Semiparametric Contextual Bandits},
  author =       {Krishnamurthy, Akshay and Wu, Zhiwei Steven and Syrgkanis, Vasilis},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {2776--2785},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/krishnamurthy18a/krishnamurthy18a.pdf},
  url = 	 {https://proceedings.mlr.press/v80/krishnamurthy18a.html},
  abstract = 	 {This paper studies semiparametric contextual bandits, a generalization of the linear stochastic bandit problem where the reward for a chosen action is modeled as a linear function of known action features confounded by a non-linear action-independent term. We design new algorithms that achieve $\tilde{O}(d\sqrt{T})$ regret over $T$ rounds, when the linear function is $d$-dimensional, which matches the best known bounds for the simpler unconfounded case and improves on a recent result of Greenwald et al. (2017). Via an empirical evaluation, we show that our algorithms outperform prior approaches when there are non-linear confounding effects on the rewards. Technically, our algorithms use a new reward estimator inspired by doubly-robust approaches and our proofs require new concentration inequalities for self-normalized martingales.}
}

Endnote

%0 Conference Paper
%T Semiparametric Contextual Bandits
%A Akshay Krishnamurthy
%A Zhiwei Steven Wu
%A Vasilis Syrgkanis
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-krishnamurthy18a
%I PMLR
%P 2776--2785
%U https://proceedings.mlr.press/v80/krishnamurthy18a.html
%V 80
%X This paper studies semiparametric contextual bandits, a generalization of the linear stochastic bandit problem where the reward for a chosen action is modeled as a linear function of known action features confounded by a non-linear action-independent term. We design new algorithms that achieve $\tilde{O}(d\sqrt{T})$ regret over $T$ rounds, when the linear function is $d$-dimensional, which matches the best known bounds for the simpler unconfounded case and improves on a recent result of Greenwald et al. (2017). Via an empirical evaluation, we show that our algorithms outperform prior approaches when there are non-linear confounding effects on the rewards. Technically, our algorithms use a new reward estimator inspired by doubly-robust approaches and our proofs require new concentration inequalities for self-normalized martingales.

APA


Krishnamurthy, A., Wu, Z.S. & Syrgkanis, V.. (2018). Semiparametric Contextual Bandits. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:2776-2785 Available from https://proceedings.mlr.press/v80/krishnamurthy18a.html.

Semiparametric Contextual Bandits

Abstract

Cite this Paper

Related Material