Generic Exploration and K-armed Voting Bandits

Tanguy Urvoy; Fabrice Clerot; Raphael Féraud; Sami Naamane

Generic Exploration and K-armed Voting Bandits

Tanguy Urvoy, Fabrice Clerot, Raphael Féraud, Sami Naamane

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(2):91-99, 2013.

Abstract

We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pure-exploration algorithm, able to cope with various utility functions from multi-armed bandits settings to dueling bandits. The primary application of this setting is to offer a natural generalization of dueling bandits for situations where the environment parameters reflect the idiosyncratic preferences of a mixed crowd.

Cite this Paper

BibTeX

@InProceedings{pmlr-v28-urvoy13,
  title = 	 {Generic Exploration and {K}-armed Voting Bandits},
  author = 	 {Urvoy, Tanguy and Clerot, Fabrice and Féraud, Raphael and Naamane, Sami},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {91--99},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/urvoy13.pdf},
  url = 	 {https://proceedings.mlr.press/v28/urvoy13.html},
  abstract = 	 {We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pure-exploration algorithm, able to cope with various utility functions from multi-armed bandits settings to dueling bandits. The primary application of this setting is to offer a natural generalization of dueling bandits for situations where the environment parameters reflect the idiosyncratic preferences of a mixed crowd.}
}

Endnote

%0 Conference Paper
%T Generic Exploration and K-armed Voting Bandits
%A Tanguy Urvoy
%A Fabrice Clerot
%A Raphael Féraud
%A Sami Naamane
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-urvoy13
%I PMLR
%P 91--99
%U https://proceedings.mlr.press/v28/urvoy13.html
%V 28
%N 2
%X We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pure-exploration algorithm, able to cope with various utility functions from multi-armed bandits settings to dueling bandits. The primary application of this setting is to offer a natural generalization of dueling bandits for situations where the environment parameters reflect the idiosyncratic preferences of a mixed crowd.

RIS

TY  - CPAPER
TI  - Generic Exploration and K-armed Voting Bandits
AU  - Tanguy Urvoy
AU  - Fabrice Clerot
AU  - Raphael Féraud
AU  - Sami Naamane
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/05/13
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-urvoy13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 2
SP  - 91
EP  - 99
L1  - http://proceedings.mlr.press/v28/urvoy13.pdf
UR  - https://proceedings.mlr.press/v28/urvoy13.html
AB  - We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pure-exploration algorithm, able to cope with various utility functions from multi-armed bandits settings to dueling bandits. The primary application of this setting is to offer a natural generalization of dueling bandits for situations where the environment parameters reflect the idiosyncratic preferences of a mixed crowd.
ER  -

APA

Urvoy, T., Clerot, F., Féraud, R. & Naamane, S.. (2013). Generic Exploration and K-armed Voting Bandits. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(2):91-99 Available from https://proceedings.mlr.press/v28/urvoy13.html.

Generic Exploration and K-armed Voting Bandits

Abstract

Cite this Paper

Related Material