$\mathcal{X}$-Armed Bandits: Optimizing Quantiles, CVaR and Other Risks

Léonard Torossian, Aurélien Garivier, Victor Picheny
; Proceedings of The Eleventh Asian Conference on Machine Learning, PMLR 101:252-267, 2019.

Abstract

We propose and analyze StoROO, an algorithm for risk optimization on stochastic black-box functions derived from StoOO. Motivated by risk-averse decision making fields like agriculture, medicine, biology or finance, we do not focus on the mean payoff but on generic functionals of the return distribution. We provide a generic regret analysis of StoROO and illustrate its applicability with two examples: the optimization of quantiles and CVaR. Inspired by the bandit literature and black-box mean optimizers, StoROO relies on the possibility to construct confidence intervals for the targeted functional based on random-size samples. We detail their construction in the case of quantiles, providing tight bounds based on Kullback-Leibler divergence. We finally present numerical experiments that show a dramatic impact of tight bounds for the optimization of quantiles and CVaR.

Cite this Paper


BibTeX
@InProceedings{pmlr-v101-torossian19a, title = {$\mathcal{X}$-Armed Bandits: Optimizing Quantiles, CVaR and Other Risks}, author = {Torossian, L\'eonard and Garivier, Aur\'elien and Picheny, Victor}, pages = {252--267}, year = {2019}, editor = {Wee Sun Lee and Taiji Suzuki}, volume = {101}, series = {Proceedings of Machine Learning Research}, address = {Nagoya, Japan}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v101/torossian19a/torossian19a.pdf}, url = {http://proceedings.mlr.press/v101/torossian19a.html}, abstract = {We propose and analyze StoROO, an algorithm for risk optimization on stochastic black-box functions derived from StoOO. Motivated by risk-averse decision making fields like agriculture, medicine, biology or finance, we do not focus on the mean payoff but on generic functionals of the return distribution. We provide a generic regret analysis of StoROO and illustrate its applicability with two examples: the optimization of quantiles and CVaR. Inspired by the bandit literature and black-box mean optimizers, StoROO relies on the possibility to construct confidence intervals for the targeted functional based on random-size samples. We detail their construction in the case of quantiles, providing tight bounds based on Kullback-Leibler divergence. We finally present numerical experiments that show a dramatic impact of tight bounds for the optimization of quantiles and CVaR.} }
Endnote
%0 Conference Paper %T $\mathcal{X}$-Armed Bandits: Optimizing Quantiles, CVaR and Other Risks %A Léonard Torossian %A Aurélien Garivier %A Victor Picheny %B Proceedings of The Eleventh Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Wee Sun Lee %E Taiji Suzuki %F pmlr-v101-torossian19a %I PMLR %J Proceedings of Machine Learning Research %P 252--267 %U http://proceedings.mlr.press %V 101 %W PMLR %X We propose and analyze StoROO, an algorithm for risk optimization on stochastic black-box functions derived from StoOO. Motivated by risk-averse decision making fields like agriculture, medicine, biology or finance, we do not focus on the mean payoff but on generic functionals of the return distribution. We provide a generic regret analysis of StoROO and illustrate its applicability with two examples: the optimization of quantiles and CVaR. Inspired by the bandit literature and black-box mean optimizers, StoROO relies on the possibility to construct confidence intervals for the targeted functional based on random-size samples. We detail their construction in the case of quantiles, providing tight bounds based on Kullback-Leibler divergence. We finally present numerical experiments that show a dramatic impact of tight bounds for the optimization of quantiles and CVaR.
APA
Torossian, L., Garivier, A. & Picheny, V.. (2019). $\mathcal{X}$-Armed Bandits: Optimizing Quantiles, CVaR and Other Risks. Proceedings of The Eleventh Asian Conference on Machine Learning, in PMLR 101:252-267

Related Material