Online Learning with Composite Loss Functions

Ofer Dekel; Jian Ding; Tomer Koren; Yuval Peres

Online Learning with Composite Loss Functions

Ofer Dekel, Jian Ding, Tomer Koren, Yuval Peres

Proceedings of The 27th Conference on Learning Theory, PMLR 35:1214-1231, 2014.

Abstract

We study a new class of online learning problems where each of the online algorithm’s actions is assigned an adversarial value, and the loss of the algorithm at each step is a known and deterministic function of the values assigned to its recent actions. This class includes problems where the algorithm’s loss is the \emphminimum over the recent adversarial values, the \emphmaximum over the recent values, or a \emphlinear combination of the recent values. We analyze the minimax regret of this class of problems when the algorithm receives bandit feedback, and prove that when the \emphminimum or \emphmaximum functions are used, the minimax regret is \widetilde Ω(T^2/3) (so called \emphhard online learning problems), and when a linear function is used, the minimax regret is \widetilde O(\sqrtT) (so called \empheasy learning problems). Previously, the only online learning problem that was known to be provably hard was the multi-armed bandit with switching costs.

Cite this Paper

BibTeX


@InProceedings{pmlr-v35-dekel14,
  title = 	 {Online Learning with Composite Loss Functions},
  author = 	 {Dekel, Ofer and Ding, Jian and Koren, Tomer and Peres, Yuval},
  booktitle = 	 {Proceedings of The 27th Conference on Learning Theory},
  pages = 	 {1214--1231},
  year = 	 {2014},
  editor = 	 {Balcan, Maria Florina and Feldman, Vitaly and Szepesvári, Csaba},
  volume = 	 {35},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Barcelona, Spain},
  month = 	 {13--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v35/dekel14.pdf},
  url = 	 {https://proceedings.mlr.press/v35/dekel14.html},
  abstract = 	 {We study a new class of online learning problems where each of the online algorithm’s actions is assigned an adversarial value, and the loss of the algorithm at each step is a known and deterministic function of the values assigned to its recent actions. This class includes problems where the algorithm’s loss is the \emphminimum over the recent adversarial values, the \emphmaximum over the recent values, or a \emphlinear combination of the recent values. We analyze the minimax regret of this class of problems when the algorithm receives bandit feedback, and prove that when the    \emphminimum or \emphmaximum functions are used, the minimax regret is \widetilde Ω(T^2/3) (so called \emphhard online learning problems), and when a linear function is used, the minimax regret is \widetilde O(\sqrtT) (so called \empheasy learning problems). Previously, the only online learning problem that was known to be provably hard was the multi-armed bandit with switching costs.}
}

Endnote

%0 Conference Paper
%T Online Learning with Composite Loss Functions
%A Ofer Dekel
%A Jian Ding
%A Tomer Koren
%A Yuval Peres
%B Proceedings of The 27th Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2014
%E Maria Florina Balcan
%E Vitaly Feldman
%E Csaba Szepesvári	
%F pmlr-v35-dekel14
%I PMLR
%P 1214--1231
%U https://proceedings.mlr.press/v35/dekel14.html
%V 35
%X We study a new class of online learning problems where each of the online algorithm’s actions is assigned an adversarial value, and the loss of the algorithm at each step is a known and deterministic function of the values assigned to its recent actions. This class includes problems where the algorithm’s loss is the \emphminimum over the recent adversarial values, the \emphmaximum over the recent values, or a \emphlinear combination of the recent values. We analyze the minimax regret of this class of problems when the algorithm receives bandit feedback, and prove that when the    \emphminimum or \emphmaximum functions are used, the minimax regret is \widetilde Ω(T^2/3) (so called \emphhard online learning problems), and when a linear function is used, the minimax regret is \widetilde O(\sqrtT) (so called \empheasy learning problems). Previously, the only online learning problem that was known to be provably hard was the multi-armed bandit with switching costs.

RIS


TY  - CPAPER
TI  - Online Learning with Composite Loss Functions
AU  - Ofer Dekel
AU  - Jian Ding
AU  - Tomer Koren
AU  - Yuval Peres
BT  - Proceedings of The 27th Conference on Learning Theory
DA  - 2014/05/29
ED  - Maria Florina Balcan
ED  - Vitaly Feldman
ED  - Csaba Szepesvári	
ID  - pmlr-v35-dekel14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 35
SP  - 1214
EP  - 1231
L1  - http://proceedings.mlr.press/v35/dekel14.pdf
UR  - https://proceedings.mlr.press/v35/dekel14.html
AB  - We study a new class of online learning problems where each of the online algorithm’s actions is assigned an adversarial value, and the loss of the algorithm at each step is a known and deterministic function of the values assigned to its recent actions. This class includes problems where the algorithm’s loss is the \emphminimum over the recent adversarial values, the \emphmaximum over the recent values, or a \emphlinear combination of the recent values. We analyze the minimax regret of this class of problems when the algorithm receives bandit feedback, and prove that when the    \emphminimum or \emphmaximum functions are used, the minimax regret is \widetilde Ω(T^2/3) (so called \emphhard online learning problems), and when a linear function is used, the minimax regret is \widetilde O(\sqrtT) (so called \empheasy learning problems). Previously, the only online learning problem that was known to be provably hard was the multi-armed bandit with switching costs.
ER  -

APA


Dekel, O., Ding, J., Koren, T. & Peres, Y.. (2014). Online Learning with Composite Loss Functions. Proceedings of The 27th Conference on Learning Theory, in Proceedings of Machine Learning Research 35:1214-1231 Available from https://proceedings.mlr.press/v35/dekel14.html.

Related Material

Download PDF