Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits

Zeyuan Allen-Zhu; Sebastien Bubeck; Yuanzhi Li

Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits

Zeyuan Allen-Zhu, Sebastien Bubeck, Yuanzhi Li

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:186-194, 2018.

Abstract

Regret bounds in online learning compare the player’s performance to $L*$, the optimal performance in hindsight with a fixed strategy. Typically such bounds scale with the square root of the time horizon $T$. The more refined concept of first-order regret bound replaces this with a scaling $\sqrt{L*}$, which may be much smaller than $\sqrt{T}$. It is well known that minor variants of standard algorithms satisfy first-order regret bounds in the full information and multi-armed bandit settings. In a COLT 2017 open problem, Agarwal, Krishnamurthy, Langford, Luo, and Schapire raised the issue that existing techniques do not seem sufficient to obtain first-order regret bounds for the contextual bandit problem. In the present paper, we resolve this open problem by presenting a new strategy based on augmenting the policy space.

Cite this Paper

BibTeX


@InProceedings{pmlr-v80-allen-zhu18b,
  title = 	 {Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits},
  author =       {Allen-Zhu, Zeyuan and Bubeck, Sebastien and Li, Yuanzhi},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {186--194},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/allen-zhu18b/allen-zhu18b.pdf},
  url = 	 {https://proceedings.mlr.press/v80/allen-zhu18b.html},
  abstract = 	 {Regret bounds in online learning compare the player’s performance to $L*$, the optimal performance in hindsight with a fixed strategy. Typically such bounds scale with the square root of the time horizon $T$. The more refined concept of first-order regret bound replaces this with a scaling $\sqrt{L*}$, which may be much smaller than $\sqrt{T}$. It is well known that minor variants of standard algorithms satisfy first-order regret bounds in the full information and multi-armed bandit settings. In a COLT 2017 open problem, Agarwal, Krishnamurthy, Langford, Luo, and Schapire raised the issue that existing techniques do not seem sufficient to obtain first-order regret bounds for the contextual bandit problem. In the present paper, we resolve this open problem by presenting a new strategy based on augmenting the policy space.}
}

Endnote

%0 Conference Paper
%T Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits
%A Zeyuan Allen-Zhu
%A Sebastien Bubeck
%A Yuanzhi Li
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-allen-zhu18b
%I PMLR
%P 186--194
%U https://proceedings.mlr.press/v80/allen-zhu18b.html
%V 80
%X Regret bounds in online learning compare the player’s performance to $L*$, the optimal performance in hindsight with a fixed strategy. Typically such bounds scale with the square root of the time horizon $T$. The more refined concept of first-order regret bound replaces this with a scaling $\sqrt{L*}$, which may be much smaller than $\sqrt{T}$. It is well known that minor variants of standard algorithms satisfy first-order regret bounds in the full information and multi-armed bandit settings. In a COLT 2017 open problem, Agarwal, Krishnamurthy, Langford, Luo, and Schapire raised the issue that existing techniques do not seem sufficient to obtain first-order regret bounds for the contextual bandit problem. In the present paper, we resolve this open problem by presenting a new strategy based on augmenting the policy space.

APA


Allen-Zhu, Z., Bubeck, S. & Li, Y.. (2018). Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:186-194 Available from https://proceedings.mlr.press/v80/allen-zhu18b.html.

Related Material

Download PDF