Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits

Zeyuan Allen-Zhu, Sebastien Bubeck, Yuanzhi Li
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:186-194, 2018.

Abstract

Regret bounds in online learning compare the player’s performance to $L*$, the optimal performance in hindsight with a fixed strategy. Typically such bounds scale with the square root of the time horizon $T$. The more refined concept of first-order regret bound replaces this with a scaling $\sqrt{L*}$, which may be much smaller than $\sqrt{T}$. It is well known that minor variants of standard algorithms satisfy first-order regret bounds in the full information and multi-armed bandit settings. In a COLT 2017 open problem, Agarwal, Krishnamurthy, Langford, Luo, and Schapire raised the issue that existing techniques do not seem sufficient to obtain first-order regret bounds for the contextual bandit problem. In the present paper, we resolve this open problem by presenting a new strategy based on augmenting the policy space.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-allen-zhu18b, title = {Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits}, author = {Allen-Zhu, Zeyuan and Bubeck, Sebastien and Li, Yuanzhi}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {186--194}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/allen-zhu18b/allen-zhu18b.pdf}, url = {https://proceedings.mlr.press/v80/allen-zhu18b.html}, abstract = {Regret bounds in online learning compare the player’s performance to $L*$, the optimal performance in hindsight with a fixed strategy. Typically such bounds scale with the square root of the time horizon $T$. The more refined concept of first-order regret bound replaces this with a scaling $\sqrt{L*}$, which may be much smaller than $\sqrt{T}$. It is well known that minor variants of standard algorithms satisfy first-order regret bounds in the full information and multi-armed bandit settings. In a COLT 2017 open problem, Agarwal, Krishnamurthy, Langford, Luo, and Schapire raised the issue that existing techniques do not seem sufficient to obtain first-order regret bounds for the contextual bandit problem. In the present paper, we resolve this open problem by presenting a new strategy based on augmenting the policy space.} }
Endnote
%0 Conference Paper %T Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits %A Zeyuan Allen-Zhu %A Sebastien Bubeck %A Yuanzhi Li %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-allen-zhu18b %I PMLR %P 186--194 %U https://proceedings.mlr.press/v80/allen-zhu18b.html %V 80 %X Regret bounds in online learning compare the player’s performance to $L*$, the optimal performance in hindsight with a fixed strategy. Typically such bounds scale with the square root of the time horizon $T$. The more refined concept of first-order regret bound replaces this with a scaling $\sqrt{L*}$, which may be much smaller than $\sqrt{T}$. It is well known that minor variants of standard algorithms satisfy first-order regret bounds in the full information and multi-armed bandit settings. In a COLT 2017 open problem, Agarwal, Krishnamurthy, Langford, Luo, and Schapire raised the issue that existing techniques do not seem sufficient to obtain first-order regret bounds for the contextual bandit problem. In the present paper, we resolve this open problem by presenting a new strategy based on augmenting the policy space.
APA
Allen-Zhu, Z., Bubeck, S. & Li, Y.. (2018). Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:186-194 Available from https://proceedings.mlr.press/v80/allen-zhu18b.html.

Related Material