Combinatorial Partial Monitoring Game with Linear Feedback and Its Applications

Tian Lin; Bruno Abrahao; Robert Kleinberg; John Lui; Wei Chen

Combinatorial Partial Monitoring Game with Linear Feedback and Its Applications

Tian Lin, Bruno Abrahao, Robert Kleinberg, John Lui, Wei Chen

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):901-909, 2014.

Abstract

In online learning, a player chooses actions to play and receives reward and feedback from the environment with the goal of maximizing her reward over time. In this paper, we propose the model of combinatorial partial monitoring games with linear feedback, a model which simultaneously addresses limited feedback, infinite outcome space of the environment and exponentially large action space of the player. We present the Global Confidence Bound (GCB) algorithm, which integrates ideas from both combinatorial multi-armed bandits and finite partial monitoring games to handle all the above issues. GCB only requires feedback on a small set of actions and achieves O(T^\frac23\log T) distribution-independent regret and O(\log T) distribution-dependent regret (the latter assuming unique optimal action), where T is the total time steps played. Moreover, the regret bounds only depend linearly on \log |X| rather than |X|, where X is the action space. GCB isolates offline optimization tasks from online learning and avoids explicit enumeration of all actions in the online learning part. We demonstrate that our model and algorithm can be applied to a crowdsourcing application leading to both an efficient learning algorithm and low regret, and argue that they can be applied to a wide range of combinatorial applications constrained with limited feedback.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-lind14,
  title = 	 {Combinatorial Partial Monitoring Game with Linear Feedback and Its Applications},
  author = 	 {Lin, Tian and Abrahao, Bruno and Kleinberg, Robert and Lui, John and Chen, Wei},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {901--909},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/lind14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/lind14.html},
  abstract = 	 {In online learning, a player chooses actions to play and receives reward and feedback from the environment with the goal of maximizing her reward over time. In this paper, we propose the model of combinatorial partial monitoring games with linear feedback, a model which simultaneously addresses limited feedback, infinite outcome space of the environment and exponentially large action space of the player. We present the Global Confidence Bound (GCB) algorithm, which integrates ideas from both combinatorial multi-armed bandits and finite partial monitoring games to handle all the above issues. GCB only requires feedback on a small set of actions and achieves O(T^\frac23\log T) distribution-independent regret and O(\log T) distribution-dependent regret (the latter assuming unique optimal action), where T is the total time steps played. Moreover, the regret bounds only depend linearly on \log |X| rather than |X|, where X is the action space. GCB isolates offline optimization tasks from online learning and avoids explicit enumeration of all actions in the online learning part. We demonstrate that our model and algorithm can be applied to a crowdsourcing application leading to both an efficient learning algorithm and low regret, and argue that they can be applied to a wide range of combinatorial applications constrained with limited feedback.}
}

Endnote

%0 Conference Paper
%T Combinatorial Partial Monitoring Game with Linear Feedback and Its Applications
%A Tian Lin
%A Bruno Abrahao
%A Robert Kleinberg
%A John Lui
%A Wei Chen
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-lind14
%I PMLR
%P 901--909
%U https://proceedings.mlr.press/v32/lind14.html
%V 32
%N 2
%X In online learning, a player chooses actions to play and receives reward and feedback from the environment with the goal of maximizing her reward over time. In this paper, we propose the model of combinatorial partial monitoring games with linear feedback, a model which simultaneously addresses limited feedback, infinite outcome space of the environment and exponentially large action space of the player. We present the Global Confidence Bound (GCB) algorithm, which integrates ideas from both combinatorial multi-armed bandits and finite partial monitoring games to handle all the above issues. GCB only requires feedback on a small set of actions and achieves O(T^\frac23\log T) distribution-independent regret and O(\log T) distribution-dependent regret (the latter assuming unique optimal action), where T is the total time steps played. Moreover, the regret bounds only depend linearly on \log |X| rather than |X|, where X is the action space. GCB isolates offline optimization tasks from online learning and avoids explicit enumeration of all actions in the online learning part. We demonstrate that our model and algorithm can be applied to a crowdsourcing application leading to both an efficient learning algorithm and low regret, and argue that they can be applied to a wide range of combinatorial applications constrained with limited feedback.

RIS


TY  - CPAPER
TI  - Combinatorial Partial Monitoring Game with Linear Feedback and Its Applications
AU  - Tian Lin
AU  - Bruno Abrahao
AU  - Robert Kleinberg
AU  - John Lui
AU  - Wei Chen
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/06/18
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-lind14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 2
SP  - 901
EP  - 909
L1  - http://proceedings.mlr.press/v32/lind14.pdf
UR  - https://proceedings.mlr.press/v32/lind14.html
AB  - In online learning, a player chooses actions to play and receives reward and feedback from the environment with the goal of maximizing her reward over time. In this paper, we propose the model of combinatorial partial monitoring games with linear feedback, a model which simultaneously addresses limited feedback, infinite outcome space of the environment and exponentially large action space of the player. We present the Global Confidence Bound (GCB) algorithm, which integrates ideas from both combinatorial multi-armed bandits and finite partial monitoring games to handle all the above issues. GCB only requires feedback on a small set of actions and achieves O(T^\frac23\log T) distribution-independent regret and O(\log T) distribution-dependent regret (the latter assuming unique optimal action), where T is the total time steps played. Moreover, the regret bounds only depend linearly on \log |X| rather than |X|, where X is the action space. GCB isolates offline optimization tasks from online learning and avoids explicit enumeration of all actions in the online learning part. We demonstrate that our model and algorithm can be applied to a crowdsourcing application leading to both an efficient learning algorithm and low regret, and argue that they can be applied to a wide range of combinatorial applications constrained with limited feedback.
ER  -

APA


Lin, T., Abrahao, B., Kleinberg, R., Lui, J. & Chen, W.. (2014). Combinatorial Partial Monitoring Game with Linear Feedback and Its Applications. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):901-909 Available from https://proceedings.mlr.press/v32/lind14.html.

Combinatorial Partial Monitoring Game with Linear Feedback and Its Applications

Abstract

Cite this Paper

Related Material