Online Learning with Feedback Graphs: Beyond Bandits

Noga Alon; Nicolò Cesa-Bianchi; Ofer Dekel; Tomer Koren

Online Learning with Feedback Graphs: Beyond Bandits

Noga Alon, Nicolò Cesa-Bianchi, Ofer Dekel, Tomer Koren

Proceedings of The 28th Conference on Learning Theory, PMLR 40:23-35, 2015.

Abstract

We study a general class of online learning problems where the feedback is specified by a graph. This class includes online prediction with expert advice and the multi-armed bandit problem, but also several learning problems where the online player does not necessarily observe his own loss. We analyze how the structure of the feedback graph controls the inherent difficulty of the induced T-round learning problem. Specifically, we show that any feedback graph belongs to one of three classes: \emphstrongly observable graphs, \emphweakly observable graphs, and \emphunobservable graphs. We prove that the first class induces learning problems with \widetildeΘ(α^1/2 T^1/2) minimax regret, where αis the independence number of the underlying graph; the second class induces problems with \widetildeΘ(δ^1/3T^2/3) minimax regret, where δis the domination number of a certain portion of the graph; and the third class induces problems with linear minimax regret. Our results subsume much of the previous work on learning with feedback graphs and reveal new connections to partial monitoring games. We also show how the regret is affected if the graphs are allowed to vary with time.

Cite this Paper

BibTeX


@InProceedings{pmlr-v40-Alon15,
  title = 	 {Online Learning with Feedback Graphs: Beyond Bandits},
  author = 	 {Alon, Noga and Cesa-Bianchi, Nicolò and Dekel, Ofer and Koren, Tomer},
  booktitle = 	 {Proceedings of The 28th Conference on Learning Theory},
  pages = 	 {23--35},
  year = 	 {2015},
  editor = 	 {Grünwald, Peter and Hazan, Elad and Kale, Satyen},
  volume = 	 {40},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Paris, France},
  month = 	 {03--06 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v40/Alon15.pdf},
  url = 	 {https://proceedings.mlr.press/v40/Alon15.html},
  abstract = 	 {We study a general class of online learning problems where the feedback is specified by a graph. This class includes online prediction with expert advice and the multi-armed bandit problem, but also several learning problems where the online player does not necessarily observe his own loss. We analyze how the structure of the feedback graph controls the inherent difficulty of the induced T-round learning problem. Specifically, we show that any feedback graph belongs to one of three classes: \emphstrongly observable graphs, \emphweakly observable graphs, and \emphunobservable graphs. We prove that the first class induces learning problems with \widetildeΘ(α^1/2 T^1/2) minimax regret, where αis the independence number of the underlying graph; the second class induces problems with \widetildeΘ(δ^1/3T^2/3) minimax regret, where δis the domination number of a certain portion of the graph; and the third class induces problems with linear minimax regret. Our results subsume much of the previous work on learning with feedback graphs and reveal new connections to partial monitoring games. We also show how the regret is affected if the graphs are allowed to vary with time. }
}

Endnote

%0 Conference Paper
%T Online Learning with Feedback Graphs: Beyond Bandits
%A Noga Alon
%A Nicolò Cesa-Bianchi
%A Ofer Dekel
%A Tomer Koren
%B Proceedings of The 28th Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2015
%E Peter Grünwald
%E Elad Hazan
%E Satyen Kale	
%F pmlr-v40-Alon15
%I PMLR
%P 23--35
%U https://proceedings.mlr.press/v40/Alon15.html
%V 40
%X We study a general class of online learning problems where the feedback is specified by a graph. This class includes online prediction with expert advice and the multi-armed bandit problem, but also several learning problems where the online player does not necessarily observe his own loss. We analyze how the structure of the feedback graph controls the inherent difficulty of the induced T-round learning problem. Specifically, we show that any feedback graph belongs to one of three classes: \emphstrongly observable graphs, \emphweakly observable graphs, and \emphunobservable graphs. We prove that the first class induces learning problems with \widetildeΘ(α^1/2 T^1/2) minimax regret, where αis the independence number of the underlying graph; the second class induces problems with \widetildeΘ(δ^1/3T^2/3) minimax regret, where δis the domination number of a certain portion of the graph; and the third class induces problems with linear minimax regret. Our results subsume much of the previous work on learning with feedback graphs and reveal new connections to partial monitoring games. We also show how the regret is affected if the graphs are allowed to vary with time.

RIS


TY  - CPAPER
TI  - Online Learning with Feedback Graphs: Beyond Bandits
AU  - Noga Alon
AU  - Nicolò Cesa-Bianchi
AU  - Ofer Dekel
AU  - Tomer Koren
BT  - Proceedings of The 28th Conference on Learning Theory
DA  - 2015/06/26
ED  - Peter Grünwald
ED  - Elad Hazan
ED  - Satyen Kale	
ID  - pmlr-v40-Alon15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 40
SP  - 23
EP  - 35
L1  - http://proceedings.mlr.press/v40/Alon15.pdf
UR  - https://proceedings.mlr.press/v40/Alon15.html
AB  - We study a general class of online learning problems where the feedback is specified by a graph. This class includes online prediction with expert advice and the multi-armed bandit problem, but also several learning problems where the online player does not necessarily observe his own loss. We analyze how the structure of the feedback graph controls the inherent difficulty of the induced T-round learning problem. Specifically, we show that any feedback graph belongs to one of three classes: \emphstrongly observable graphs, \emphweakly observable graphs, and \emphunobservable graphs. We prove that the first class induces learning problems with \widetildeΘ(α^1/2 T^1/2) minimax regret, where αis the independence number of the underlying graph; the second class induces problems with \widetildeΘ(δ^1/3T^2/3) minimax regret, where δis the domination number of a certain portion of the graph; and the third class induces problems with linear minimax regret. Our results subsume much of the previous work on learning with feedback graphs and reveal new connections to partial monitoring games. We also show how the regret is affected if the graphs are allowed to vary with time. 
ER  -

APA


Alon, N., Cesa-Bianchi, N., Dekel, O. & Koren, T.. (2015). Online Learning with Feedback Graphs: Beyond Bandits. Proceedings of The 28th Conference on Learning Theory, in Proceedings of Machine Learning Research 40:23-35 Available from https://proceedings.mlr.press/v40/Alon15.html.

Related Material

Download PDF