Qualitative Multi-Armed Bandits: A Quantile-Based Approach

Balazs Szorenyi; Robert Busa-Fekete; Paul Weng; Eyke Hüllermeier

Qualitative Multi-Armed Bandits: A Quantile-Based Approach

Balazs Szorenyi, Robert Busa-Fekete, Paul Weng, Eyke Hüllermeier

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:1660-1668, 2015.

Abstract

We formalize and study the multi-armed bandit (MAB) problem in a generalized stochastic setting, in which rewards are not assumed to be numerical. Instead, rewards are measured on a qualitative scale that allows for comparison but invalidates arithmetic operations such as averaging. Correspondingly, instead of characterizing an arm in terms of the mean of the underlying distribution, we opt for using a quantile of that distribution as a representative value. We address the problem of quantile-based online learning both for the case of a finite (pure exploration) and infinite time horizon (cumulative regret minimization). For both cases, we propose suitable algorithms and analyze their properties. These properties are also illustrated by means of first experimental studies.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-szorenyi15,
  title = 	 {Qualitative Multi-Armed Bandits: A Quantile-Based Approach},
  author = 	 {Szorenyi, Balazs and Busa-Fekete, Robert and Weng, Paul and Hüllermeier, Eyke},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {1660--1668},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/szorenyi15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/szorenyi15.html},
  abstract = 	 {We formalize and study the multi-armed bandit (MAB) problem in a generalized stochastic setting, in which rewards are not assumed to be numerical. Instead, rewards are measured on a qualitative scale that allows for comparison but invalidates arithmetic operations such as averaging. Correspondingly, instead of characterizing an arm in terms of the mean of the underlying distribution, we opt for using a quantile of that distribution as a representative value. We address the problem of quantile-based online learning both for the case of a finite (pure exploration) and infinite time horizon (cumulative regret minimization). For both cases, we propose suitable algorithms and analyze their properties. These properties are also illustrated by means of first experimental studies.}
}

Endnote

%0 Conference Paper
%T Qualitative Multi-Armed Bandits: A Quantile-Based Approach
%A Balazs Szorenyi
%A Robert Busa-Fekete
%A Paul Weng
%A Eyke Hüllermeier
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-szorenyi15
%I PMLR
%P 1660--1668
%U https://proceedings.mlr.press/v37/szorenyi15.html
%V 37
%X We formalize and study the multi-armed bandit (MAB) problem in a generalized stochastic setting, in which rewards are not assumed to be numerical. Instead, rewards are measured on a qualitative scale that allows for comparison but invalidates arithmetic operations such as averaging. Correspondingly, instead of characterizing an arm in terms of the mean of the underlying distribution, we opt for using a quantile of that distribution as a representative value. We address the problem of quantile-based online learning both for the case of a finite (pure exploration) and infinite time horizon (cumulative regret minimization). For both cases, we propose suitable algorithms and analyze their properties. These properties are also illustrated by means of first experimental studies.

RIS


TY  - CPAPER
TI  - Qualitative Multi-Armed Bandits: A Quantile-Based Approach
AU  - Balazs Szorenyi
AU  - Robert Busa-Fekete
AU  - Paul Weng
AU  - Eyke Hüllermeier
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-szorenyi15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 1660
EP  - 1668
L1  - http://proceedings.mlr.press/v37/szorenyi15.pdf
UR  - https://proceedings.mlr.press/v37/szorenyi15.html
AB  - We formalize and study the multi-armed bandit (MAB) problem in a generalized stochastic setting, in which rewards are not assumed to be numerical. Instead, rewards are measured on a qualitative scale that allows for comparison but invalidates arithmetic operations such as averaging. Correspondingly, instead of characterizing an arm in terms of the mean of the underlying distribution, we opt for using a quantile of that distribution as a representative value. We address the problem of quantile-based online learning both for the case of a finite (pure exploration) and infinite time horizon (cumulative regret minimization). For both cases, we propose suitable algorithms and analyze their properties. These properties are also illustrated by means of first experimental studies.
ER  -

APA


Szorenyi, B., Busa-Fekete, R., Weng, P. & Hüllermeier, E.. (2015). Qualitative Multi-Armed Bandits: A Quantile-Based Approach. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:1660-1668 Available from https://proceedings.mlr.press/v37/szorenyi15.html.

Qualitative Multi-Armed Bandits: A Quantile-Based Approach

Abstract

Cite this Paper

Related Material