A General Approach to Multi-Armed Bandits Under Risk Criteria

Asaf Cassel; Shie Mannor; Assaf Zeevi

A General Approach to Multi-Armed Bandits Under Risk Criteria

Asaf Cassel, Shie Mannor, Assaf Zeevi

Proceedings of the 31st Conference On Learning Theory, PMLR 75:1295-1306, 2018.

Abstract

Different risk-related criteria have received recent interest in learning problems, where typically each case is treated in a customized manner. In this paper we provide a more systematic approach to analyzing such risk criteria within a stochastic multi-armed bandit (MAB) formulation. We identify a set of general conditions that yield a simple characterization of the oracle rule (which serves as the regret benchmark), and facilitate the design of upper confidence bound (UCB) learning policies. The conditions are derived from problem primitives, primarily focusing on the relation between the arm reward distributions and the (risk criteria) performance metric. Among other things, the work highlights some (possibly non-intuitive) subtleties that differentiate various criteria in conjunction with statistical properties of the arms. Our main findings are illustrated on several widely used objectives such as conditional value-at-risk, mean-variance, Sharpe-ratio, and more.

Cite this Paper

BibTeX


@InProceedings{pmlr-v75-cassel18a,
  title = 	 {A General Approach to Multi-Armed Bandits Under Risk Criteria},
  author =       {Cassel, Asaf and Mannor, Shie and Zeevi, Assaf},
  booktitle = 	 {Proceedings of the 31st  Conference On Learning Theory},
  pages = 	 {1295--1306},
  year = 	 {2018},
  editor = 	 {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe},
  volume = 	 {75},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v75/cassel18a/cassel18a.pdf},
  url = 	 {https://proceedings.mlr.press/v75/cassel18a.html},
  abstract = 	 {Different risk-related criteria have received recent interest in learning problems, where typically each case is  treated in a customized manner. In this paper we provide a more systematic approach to analyzing  such risk criteria within a stochastic multi-armed bandit (MAB) formulation. We identify a set of general conditions that yield a simple characterization of the oracle rule (which serves as the regret benchmark), and facilitate the design of upper confidence bound (UCB) learning policies. The conditions are derived from problem primitives, primarily focusing on the relation between the arm reward distributions and the (risk criteria) performance metric. Among other things, the work highlights some (possibly non-intuitive) subtleties that differentiate various criteria in conjunction with statistical properties of the arms. Our main findings are illustrated on several widely used objectives such as conditional value-at-risk, mean-variance, Sharpe-ratio, and more.}
}

Endnote

%0 Conference Paper
%T A General Approach to Multi-Armed Bandits Under Risk Criteria
%A Asaf Cassel
%A Shie Mannor
%A Assaf Zeevi
%B Proceedings of the 31st  Conference On Learning Theory
%C Proceedings of Machine Learning Research
%D 2018
%E Sébastien Bubeck
%E Vianney Perchet
%E Philippe Rigollet	
%F pmlr-v75-cassel18a
%I PMLR
%P 1295--1306
%U https://proceedings.mlr.press/v75/cassel18a.html
%V 75
%X Different risk-related criteria have received recent interest in learning problems, where typically each case is  treated in a customized manner. In this paper we provide a more systematic approach to analyzing  such risk criteria within a stochastic multi-armed bandit (MAB) formulation. We identify a set of general conditions that yield a simple characterization of the oracle rule (which serves as the regret benchmark), and facilitate the design of upper confidence bound (UCB) learning policies. The conditions are derived from problem primitives, primarily focusing on the relation between the arm reward distributions and the (risk criteria) performance metric. Among other things, the work highlights some (possibly non-intuitive) subtleties that differentiate various criteria in conjunction with statistical properties of the arms. Our main findings are illustrated on several widely used objectives such as conditional value-at-risk, mean-variance, Sharpe-ratio, and more.

APA


Cassel, A., Mannor, S. & Zeevi, A.. (2018). A General Approach to Multi-Armed Bandits Under Risk Criteria. Proceedings of the 31st  Conference On Learning Theory, in Proceedings of Machine Learning Research 75:1295-1306 Available from https://proceedings.mlr.press/v75/cassel18a.html.

Related Material

Download PDF