Risk-Averse Stochastic Convex Bandit

Adrian Rivera Cardoso, Huan Xu
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:39-47, 2019.

Abstract

Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse. We provide two algorithms to solve this problem. The first one is a descent-type algorithm which is easy to implement. The second algorithm, which combines the ellipsoid method and a center point device, achieves (almost) optimal regret bounds with respect to the number of rounds. To the best of our knowledge this is the first attempt to address risk-aversion in the online convex bandit problem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v89-cardoso19a, title = {Risk-Averse Stochastic Convex Bandit}, author = {Cardoso, Adrian Rivera and Xu, Huan}, booktitle = {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics}, pages = {39--47}, year = {2019}, editor = {Chaudhuri, Kamalika and Sugiyama, Masashi}, volume = {89}, series = {Proceedings of Machine Learning Research}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v89/cardoso19a/cardoso19a.pdf}, url = {https://proceedings.mlr.press/v89/cardoso19a.html}, abstract = {Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse. We provide two algorithms to solve this problem. The first one is a descent-type algorithm which is easy to implement. The second algorithm, which combines the ellipsoid method and a center point device, achieves (almost) optimal regret bounds with respect to the number of rounds. To the best of our knowledge this is the first attempt to address risk-aversion in the online convex bandit problem.} }
Endnote
%0 Conference Paper %T Risk-Averse Stochastic Convex Bandit %A Adrian Rivera Cardoso %A Huan Xu %B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Masashi Sugiyama %F pmlr-v89-cardoso19a %I PMLR %P 39--47 %U https://proceedings.mlr.press/v89/cardoso19a.html %V 89 %X Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse. We provide two algorithms to solve this problem. The first one is a descent-type algorithm which is easy to implement. The second algorithm, which combines the ellipsoid method and a center point device, achieves (almost) optimal regret bounds with respect to the number of rounds. To the best of our knowledge this is the first attempt to address risk-aversion in the online convex bandit problem.
APA
Cardoso, A.R. & Xu, H.. (2019). Risk-Averse Stochastic Convex Bandit. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:39-47 Available from https://proceedings.mlr.press/v89/cardoso19a.html.

Related Material