Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits

Nicolas Galichet, Michèle Sebag, Olivier Teytaud
Proceedings of the 5th Asian Conference on Machine Learning, PMLR 29:245-260, 2013.

Abstract

Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MaRaB) algorithm. With the goal of limiting the exploration of risky arms, MaRaB takes as arm quality its conditional value at risk. When the user-supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MaRaB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimental validation of MIN and MaRaB compared to UCB and state-of-art risk-aware MAB algorithms on artificial and real-world problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v29-Galichet13, title = {Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits}, author = {Galichet, Nicolas and Sebag, Michèle and Teytaud, Olivier}, booktitle = {Proceedings of the 5th Asian Conference on Machine Learning}, pages = {245--260}, year = {2013}, editor = {Ong, Cheng Soon and Ho, Tu Bao}, volume = {29}, series = {Proceedings of Machine Learning Research}, address = {Australian National University, Canberra, Australia}, month = {13--15 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v29/Galichet13.pdf}, url = {https://proceedings.mlr.press/v29/Galichet13.html}, abstract = {Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MaRaB) algorithm. With the goal of limiting the exploration of risky arms, MaRaB takes as arm quality its conditional value at risk. When the user-supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MaRaB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimental validation of MIN and MaRaB compared to UCB and state-of-art risk-aware MAB algorithms on artificial and real-world problems. } }
Endnote
%0 Conference Paper %T Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits %A Nicolas Galichet %A Michèle Sebag %A Olivier Teytaud %B Proceedings of the 5th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Cheng Soon Ong %E Tu Bao Ho %F pmlr-v29-Galichet13 %I PMLR %P 245--260 %U https://proceedings.mlr.press/v29/Galichet13.html %V 29 %X Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MaRaB) algorithm. With the goal of limiting the exploration of risky arms, MaRaB takes as arm quality its conditional value at risk. When the user-supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MaRaB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimental validation of MIN and MaRaB compared to UCB and state-of-art risk-aware MAB algorithms on artificial and real-world problems.
RIS
TY - CPAPER TI - Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits AU - Nicolas Galichet AU - Michèle Sebag AU - Olivier Teytaud BT - Proceedings of the 5th Asian Conference on Machine Learning DA - 2013/10/21 ED - Cheng Soon Ong ED - Tu Bao Ho ID - pmlr-v29-Galichet13 PB - PMLR DP - Proceedings of Machine Learning Research VL - 29 SP - 245 EP - 260 L1 - http://proceedings.mlr.press/v29/Galichet13.pdf UR - https://proceedings.mlr.press/v29/Galichet13.html AB - Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MaRaB) algorithm. With the goal of limiting the exploration of risky arms, MaRaB takes as arm quality its conditional value at risk. When the user-supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MaRaB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimental validation of MIN and MaRaB compared to UCB and state-of-art risk-aware MAB algorithms on artificial and real-world problems. ER -
APA
Galichet, N., Sebag, M. & Teytaud, O.. (2013). Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits. Proceedings of the 5th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 29:245-260 Available from https://proceedings.mlr.press/v29/Galichet13.html.

Related Material