Approachability in unknown games: Online learning meets multi-objective optimization

Shie Mannor; Vianney Perchet; Gilles Stoltz

Approachability in unknown games: Online learning meets multi-objective optimization

Shie Mannor, Vianney Perchet, Gilles Stoltz

Proceedings of The 27th Conference on Learning Theory, PMLR 35:339-355, 2014.

Abstract

In the standard setting of approachability there are two players and a target set. The players play a repeated vector-valued game where one of them wants to have the average vector-valued payoff converge to the target set which the other player tries to exclude. We revisit the classical setting and consider the setting where the player has a preference relation between target sets: she wishes to approach the smallest (“best”) set possible given the observed average payoffs in hindsight. Moreover, as opposed to previous works on approachability, and in the spirit of online learning, we do not assume that there is a known game structure with actions for two players. Rather, the player receives an arbitrary vector-valued reward vector at every round. We show that it is impossible, in general, to approach the best target set in hindsight. We further propose a concrete strategy that approaches a non-trivial relaxation of the best-in-hindsight given the actual rewards. Our approach does not require projection onto a target set and amounts to switching between scalar regret minimization algorithms that are performed in episodes.

Cite this Paper

BibTeX


@InProceedings{pmlr-v35-mannor14,
  title = 	 {Approachability in unknown games: {O}nline learning meets multi-objective optimization},
  author = 	 {Mannor, Shie and Perchet, Vianney and Stoltz, Gilles},
  booktitle = 	 {Proceedings of The 27th Conference on Learning Theory},
  pages = 	 {339--355},
  year = 	 {2014},
  editor = 	 {Balcan, Maria Florina and Feldman, Vitaly and Szepesvári, Csaba},
  volume = 	 {35},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Barcelona, Spain},
  month = 	 {13--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v35/mannor14.pdf},
  url = 	 {https://proceedings.mlr.press/v35/mannor14.html},
  abstract = 	 {In the standard setting of approachability there are two players and a target set. The players play a repeated vector-valued game where one of them wants to have the average vector-valued payoff converge to the target set which the other player tries to exclude. We revisit the classical setting and consider the setting where the player has a preference relation between target sets: she wishes to approach the smallest (“best”) set possible given the observed average payoffs in hindsight. Moreover, as opposed to previous works on approachability, and in the spirit of online learning, we do not assume that there is a known game structure with actions for two players. Rather, the player receives an arbitrary vector-valued reward vector at every round. We show that it is impossible, in general, to approach the best target set in hindsight. We further propose a concrete strategy that approaches a non-trivial relaxation of the best-in-hindsight given the actual rewards. Our approach does not require projection onto a target set and amounts to switching between scalar regret minimization algorithms that are performed in episodes.}
}

Endnote

%0 Conference Paper
%T Approachability in unknown games: Online learning meets multi-objective optimization
%A Shie Mannor
%A Vianney Perchet
%A Gilles Stoltz
%B Proceedings of The 27th Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2014
%E Maria Florina Balcan
%E Vitaly Feldman
%E Csaba Szepesvári	
%F pmlr-v35-mannor14
%I PMLR
%P 339--355
%U https://proceedings.mlr.press/v35/mannor14.html
%V 35
%X In the standard setting of approachability there are two players and a target set. The players play a repeated vector-valued game where one of them wants to have the average vector-valued payoff converge to the target set which the other player tries to exclude. We revisit the classical setting and consider the setting where the player has a preference relation between target sets: she wishes to approach the smallest (“best”) set possible given the observed average payoffs in hindsight. Moreover, as opposed to previous works on approachability, and in the spirit of online learning, we do not assume that there is a known game structure with actions for two players. Rather, the player receives an arbitrary vector-valued reward vector at every round. We show that it is impossible, in general, to approach the best target set in hindsight. We further propose a concrete strategy that approaches a non-trivial relaxation of the best-in-hindsight given the actual rewards. Our approach does not require projection onto a target set and amounts to switching between scalar regret minimization algorithms that are performed in episodes.

RIS


TY  - CPAPER
TI  - Approachability in unknown games: Online learning meets multi-objective optimization
AU  - Shie Mannor
AU  - Vianney Perchet
AU  - Gilles Stoltz
BT  - Proceedings of The 27th Conference on Learning Theory
DA  - 2014/05/29
ED  - Maria Florina Balcan
ED  - Vitaly Feldman
ED  - Csaba Szepesvári	
ID  - pmlr-v35-mannor14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 35
SP  - 339
EP  - 355
L1  - http://proceedings.mlr.press/v35/mannor14.pdf
UR  - https://proceedings.mlr.press/v35/mannor14.html
AB  - In the standard setting of approachability there are two players and a target set. The players play a repeated vector-valued game where one of them wants to have the average vector-valued payoff converge to the target set which the other player tries to exclude. We revisit the classical setting and consider the setting where the player has a preference relation between target sets: she wishes to approach the smallest (“best”) set possible given the observed average payoffs in hindsight. Moreover, as opposed to previous works on approachability, and in the spirit of online learning, we do not assume that there is a known game structure with actions for two players. Rather, the player receives an arbitrary vector-valued reward vector at every round. We show that it is impossible, in general, to approach the best target set in hindsight. We further propose a concrete strategy that approaches a non-trivial relaxation of the best-in-hindsight given the actual rewards. Our approach does not require projection onto a target set and amounts to switching between scalar regret minimization algorithms that are performed in episodes.
ER  -

APA


Mannor, S., Perchet, V. & Stoltz, G.. (2014). Approachability in unknown games: Online learning meets multi-objective optimization. Proceedings of The 27th Conference on Learning Theory, in Proceedings of Machine Learning Research 35:339-355 Available from https://proceedings.mlr.press/v35/mannor14.html.

Related Material

Download PDF