Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback

Ankan Saha, Ambuj Tewari
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, PMLR 15:636-642, 2011.

Abstract

The study of online convex optimization in the bandit setting was initiated by Kleinberg (2004) and Flaxman et al. (2005). Such a setting models a decision maker that has to make decisions in the face of adversarially chosen convex loss functions. Moreover, the only information the decision maker receives are the losses. The identity of the loss functions themselves is not revealed. In this setting, we reduce the gap between the best known lower and upper bounds for the class of smooth convex functions, i.e. convex functions with a Lipschitz continuous gradient. Building upon existing work on self-concordant regularizers and one-point gradient estimation, we give the first algorithm whose expected regret, ignoring constant and logarithmic factors, is $O(T^{2/3})$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v15-saha11a, title = {Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback}, author = {Saha, Ankan and Tewari, Ambuj}, booktitle = {Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics}, pages = {636--642}, year = {2011}, editor = {Gordon, Geoffrey and Dunson, David and Dudík, Miroslav}, volume = {15}, series = {Proceedings of Machine Learning Research}, address = {Fort Lauderdale, FL, USA}, month = {11--13 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v15/saha11a/saha11a.pdf}, url = {https://proceedings.mlr.press/v15/saha11a.html}, abstract = {The study of online convex optimization in the bandit setting was initiated by Kleinberg (2004) and Flaxman et al. (2005). Such a setting models a decision maker that has to make decisions in the face of adversarially chosen convex loss functions. Moreover, the only information the decision maker receives are the losses. The identity of the loss functions themselves is not revealed. In this setting, we reduce the gap between the best known lower and upper bounds for the class of smooth convex functions, i.e. convex functions with a Lipschitz continuous gradient. Building upon existing work on self-concordant regularizers and one-point gradient estimation, we give the first algorithm whose expected regret, ignoring constant and logarithmic factors, is $O(T^{2/3})$.} }
Endnote
%0 Conference Paper %T Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback %A Ankan Saha %A Ambuj Tewari %B Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2011 %E Geoffrey Gordon %E David Dunson %E Miroslav Dudík %F pmlr-v15-saha11a %I PMLR %P 636--642 %U https://proceedings.mlr.press/v15/saha11a.html %V 15 %X The study of online convex optimization in the bandit setting was initiated by Kleinberg (2004) and Flaxman et al. (2005). Such a setting models a decision maker that has to make decisions in the face of adversarially chosen convex loss functions. Moreover, the only information the decision maker receives are the losses. The identity of the loss functions themselves is not revealed. In this setting, we reduce the gap between the best known lower and upper bounds for the class of smooth convex functions, i.e. convex functions with a Lipschitz continuous gradient. Building upon existing work on self-concordant regularizers and one-point gradient estimation, we give the first algorithm whose expected regret, ignoring constant and logarithmic factors, is $O(T^{2/3})$.
RIS
TY - CPAPER TI - Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback AU - Ankan Saha AU - Ambuj Tewari BT - Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics DA - 2011/06/14 ED - Geoffrey Gordon ED - David Dunson ED - Miroslav Dudík ID - pmlr-v15-saha11a PB - PMLR DP - Proceedings of Machine Learning Research VL - 15 SP - 636 EP - 642 L1 - http://proceedings.mlr.press/v15/saha11a/saha11a.pdf UR - https://proceedings.mlr.press/v15/saha11a.html AB - The study of online convex optimization in the bandit setting was initiated by Kleinberg (2004) and Flaxman et al. (2005). Such a setting models a decision maker that has to make decisions in the face of adversarially chosen convex loss functions. Moreover, the only information the decision maker receives are the losses. The identity of the loss functions themselves is not revealed. In this setting, we reduce the gap between the best known lower and upper bounds for the class of smooth convex functions, i.e. convex functions with a Lipschitz continuous gradient. Building upon existing work on self-concordant regularizers and one-point gradient estimation, we give the first algorithm whose expected regret, ignoring constant and logarithmic factors, is $O(T^{2/3})$. ER -
APA
Saha, A. & Tewari, A.. (2011). Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 15:636-642 Available from https://proceedings.mlr.press/v15/saha11a.html.

Related Material