Latent Bandits.

Odalric-Ambrym Maillard; Shie Mannor

Latent Bandits.

Odalric-Ambrym Maillard, Shie Mannor

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(1):136-144, 2014.

Abstract

We consider a multi-armed bandit problem where the reward distributions are indexed by two sets –one for arms, one for type– and can be partitioned into a small number of clusters according to the type. First, we consider the setting where all reward distributions are known and all types have the same underlying cluster, the type’s identity is, however, unknown. Second, we study the case where types may come from different classes, which is significantly more challenging. Finally, we tackle the case where the reward distributions are completely unknown. In each setting, we introduce specific algorithms and derive non-trivial regret performance. Numerical experiments show that, in the most challenging agnostic case, the proposed algorithm achieves excellent performance in several difficult scenarios.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-maillard14,
  title = 	 {Latent Bandits.},
  author = 	 {Maillard, Odalric-Ambrym and Mannor, Shie},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {136--144},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {1},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/maillard14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/maillard14.html},
  abstract = 	 {We consider a multi-armed bandit problem where the reward distributions are indexed by two sets –one for arms, one for type– and can be partitioned into a small number of clusters according to the type. First, we consider the setting where all reward distributions are known and all types have the same underlying cluster, the type’s identity is, however, unknown. Second, we study the case  where types may come from different classes, which is significantly more challenging. Finally, we tackle the case where the reward distributions are completely unknown. In each setting, we introduce specific algorithms and derive non-trivial regret performance. Numerical experiments show that,  in the most challenging agnostic case, the proposed algorithm  achieves excellent performance in several difficult scenarios.}
}

Endnote

%0 Conference Paper
%T Latent Bandits.
%A Odalric-Ambrym Maillard
%A Shie Mannor
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-maillard14
%I PMLR
%P 136--144
%U https://proceedings.mlr.press/v32/maillard14.html
%V 32
%N 1
%X We consider a multi-armed bandit problem where the reward distributions are indexed by two sets –one for arms, one for type– and can be partitioned into a small number of clusters according to the type. First, we consider the setting where all reward distributions are known and all types have the same underlying cluster, the type’s identity is, however, unknown. Second, we study the case  where types may come from different classes, which is significantly more challenging. Finally, we tackle the case where the reward distributions are completely unknown. In each setting, we introduce specific algorithms and derive non-trivial regret performance. Numerical experiments show that,  in the most challenging agnostic case, the proposed algorithm  achieves excellent performance in several difficult scenarios.

RIS


TY  - CPAPER
TI  - Latent Bandits.
AU  - Odalric-Ambrym Maillard
AU  - Shie Mannor
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/01/27
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-maillard14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 1
SP  - 136
EP  - 144
L1  - http://proceedings.mlr.press/v32/maillard14.pdf
UR  - https://proceedings.mlr.press/v32/maillard14.html
AB  - We consider a multi-armed bandit problem where the reward distributions are indexed by two sets –one for arms, one for type– and can be partitioned into a small number of clusters according to the type. First, we consider the setting where all reward distributions are known and all types have the same underlying cluster, the type’s identity is, however, unknown. Second, we study the case  where types may come from different classes, which is significantly more challenging. Finally, we tackle the case where the reward distributions are completely unknown. In each setting, we introduce specific algorithms and derive non-trivial regret performance. Numerical experiments show that,  in the most challenging agnostic case, the proposed algorithm  achieves excellent performance in several difficult scenarios.
ER  -

APA


Maillard, O. & Mannor, S.. (2014). Latent Bandits.. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(1):136-144 Available from https://proceedings.mlr.press/v32/maillard14.html.

Related Material

Download PDF