Generic Exploration and K-armed Voting Bandits

Tanguy Urvoy, Fabrice Clerot, Raphael Féraud, Sami Naamane
Proceedings of the 30th International Conference on Machine Learning, PMLR 28(2):91-99, 2013.

Abstract

We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pure-exploration algorithm, able to cope with various utility functions from multi-armed bandits settings to dueling bandits. The primary application of this setting is to offer a natural generalization of dueling bandits for situations where the environment parameters reflect the idiosyncratic preferences of a mixed crowd.

Cite this Paper


BibTeX
@InProceedings{pmlr-v28-urvoy13, title = {Generic Exploration and {K}-armed Voting Bandits}, author = {Urvoy, Tanguy and Clerot, Fabrice and Féraud, Raphael and Naamane, Sami}, booktitle = {Proceedings of the 30th International Conference on Machine Learning}, pages = {91--99}, year = {2013}, editor = {Dasgupta, Sanjoy and McAllester, David}, volume = {28}, number = {2}, series = {Proceedings of Machine Learning Research}, address = {Atlanta, Georgia, USA}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v28/urvoy13.pdf}, url = {https://proceedings.mlr.press/v28/urvoy13.html}, abstract = {We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pure-exploration algorithm, able to cope with various utility functions from multi-armed bandits settings to dueling bandits. The primary application of this setting is to offer a natural generalization of dueling bandits for situations where the environment parameters reflect the idiosyncratic preferences of a mixed crowd.} }
Endnote
%0 Conference Paper %T Generic Exploration and K-armed Voting Bandits %A Tanguy Urvoy %A Fabrice Clerot %A Raphael Féraud %A Sami Naamane %B Proceedings of the 30th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Sanjoy Dasgupta %E David McAllester %F pmlr-v28-urvoy13 %I PMLR %P 91--99 %U https://proceedings.mlr.press/v28/urvoy13.html %V 28 %N 2 %X We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pure-exploration algorithm, able to cope with various utility functions from multi-armed bandits settings to dueling bandits. The primary application of this setting is to offer a natural generalization of dueling bandits for situations where the environment parameters reflect the idiosyncratic preferences of a mixed crowd.
RIS
TY - CPAPER TI - Generic Exploration and K-armed Voting Bandits AU - Tanguy Urvoy AU - Fabrice Clerot AU - Raphael Féraud AU - Sami Naamane BT - Proceedings of the 30th International Conference on Machine Learning DA - 2013/05/13 ED - Sanjoy Dasgupta ED - David McAllester ID - pmlr-v28-urvoy13 PB - PMLR DP - Proceedings of Machine Learning Research VL - 28 IS - 2 SP - 91 EP - 99 L1 - http://proceedings.mlr.press/v28/urvoy13.pdf UR - https://proceedings.mlr.press/v28/urvoy13.html AB - We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pure-exploration algorithm, able to cope with various utility functions from multi-armed bandits settings to dueling bandits. The primary application of this setting is to offer a natural generalization of dueling bandits for situations where the environment parameters reflect the idiosyncratic preferences of a mixed crowd. ER -
APA
Urvoy, T., Clerot, F., Féraud, R. & Naamane, S.. (2013). Generic Exploration and K-armed Voting Bandits. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(2):91-99 Available from https://proceedings.mlr.press/v28/urvoy13.html.

Related Material