Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits

Nicolas Galichet; Michèle Sebag; Olivier Teytaud

Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits

Nicolas Galichet, Michèle Sebag, Olivier Teytaud

Proceedings of the 5th Asian Conference on Machine Learning, PMLR 29:245-260, 2013.

Abstract

Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MaRaB) algorithm. With the goal of limiting the exploration of risky arms, MaRaB takes as arm quality its conditional value at risk. When the user-supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MaRaB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimental validation of MIN and MaRaB compared to UCB and state-of-art risk-aware MAB algorithms on artificial and real-world problems.

Cite this Paper

BibTeX


@InProceedings{pmlr-v29-Galichet13,
  title = 	 {Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits},
  author = 	 {Galichet, Nicolas and Sebag, Michèle and Teytaud, Olivier},
  booktitle = 	 {Proceedings of the 5th Asian Conference on Machine Learning},
  pages = 	 {245--260},
  year = 	 {2013},
  editor = 	 {Ong, Cheng Soon and Ho, Tu Bao},
  volume = 	 {29},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Australian National University, Canberra, Australia},
  month = 	 {13--15 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v29/Galichet13.pdf},
  url = 	 {https://proceedings.mlr.press/v29/Galichet13.html},
  abstract = 	 {Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MaRaB) algorithm. With the goal of limiting the exploration of risky arms, MaRaB takes as arm quality its conditional value at risk. When the user-supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MaRaB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is  supported by extensive experimental validation of MIN and MaRaB compared to UCB and  state-of-art risk-aware MAB algorithms on  artificial and real-world problems. }
}

Endnote

%0 Conference Paper
%T Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits
%A Nicolas Galichet
%A Michèle Sebag
%A Olivier Teytaud
%B Proceedings of the 5th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Cheng Soon Ong
%E Tu Bao Ho	
%F pmlr-v29-Galichet13
%I PMLR
%P 245--260
%U https://proceedings.mlr.press/v29/Galichet13.html
%V 29
%X Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MaRaB) algorithm. With the goal of limiting the exploration of risky arms, MaRaB takes as arm quality its conditional value at risk. When the user-supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MaRaB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is  supported by extensive experimental validation of MIN and MaRaB compared to UCB and  state-of-art risk-aware MAB algorithms on  artificial and real-world problems.

RIS


TY  - CPAPER
TI  - Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits
AU  - Nicolas Galichet
AU  - Michèle Sebag
AU  - Olivier Teytaud
BT  - Proceedings of the 5th Asian Conference on Machine Learning
DA  - 2013/10/21
ED  - Cheng Soon Ong
ED  - Tu Bao Ho	
ID  - pmlr-v29-Galichet13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 29
SP  - 245
EP  - 260
L1  - http://proceedings.mlr.press/v29/Galichet13.pdf
UR  - https://proceedings.mlr.press/v29/Galichet13.html
AB  - Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MaRaB) algorithm. With the goal of limiting the exploration of risky arms, MaRaB takes as arm quality its conditional value at risk. When the user-supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MaRaB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is  supported by extensive experimental validation of MIN and MaRaB compared to UCB and  state-of-art risk-aware MAB algorithms on  artificial and real-world problems. 
ER  -

APA


Galichet, N., Sebag, M. & Teytaud, O.. (2013). Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits. Proceedings of the 5th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 29:245-260 Available from https://proceedings.mlr.press/v29/Galichet13.html.

Related Material

Download PDF