Risk-Averse Stochastic Convex Bandit

Adrian Rivera Cardoso; Huan Xu

Risk-Averse Stochastic Convex Bandit

Adrian Rivera Cardoso, Huan Xu

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:39-47, 2019.

Abstract

Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse. We provide two algorithms to solve this problem. The first one is a descent-type algorithm which is easy to implement. The second algorithm, which combines the ellipsoid method and a center point device, achieves (almost) optimal regret bounds with respect to the number of rounds. To the best of our knowledge this is the first attempt to address risk-aversion in the online convex bandit problem.

Cite this Paper

BibTeX

@InProceedings{pmlr-v89-cardoso19a,
  title = 	 {Risk-Averse Stochastic Convex Bandit},
  author =       {Cardoso, Adrian Rivera and Xu, Huan},
  booktitle = 	 {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics},
  pages = 	 {39--47},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Sugiyama, Masashi},
  volume = 	 {89},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v89/cardoso19a/cardoso19a.pdf},
  url = 	 {https://proceedings.mlr.press/v89/cardoso19a.html},
  abstract = 	 {Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse. We provide two algorithms to solve this problem. The first one is a descent-type algorithm which is easy to implement. The second algorithm, which combines the ellipsoid method and a center point device, achieves (almost) optimal regret bounds with respect to the number of rounds. To the best of our knowledge this is the first attempt to address risk-aversion in the online convex bandit problem.}
}

Endnote

%0 Conference Paper
%T Risk-Averse Stochastic Convex Bandit
%A Adrian Rivera Cardoso
%A Huan Xu
%B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Masashi Sugiyama	
%F pmlr-v89-cardoso19a
%I PMLR
%P 39--47
%U https://proceedings.mlr.press/v89/cardoso19a.html
%V 89
%X Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse. We provide two algorithms to solve this problem. The first one is a descent-type algorithm which is easy to implement. The second algorithm, which combines the ellipsoid method and a center point device, achieves (almost) optimal regret bounds with respect to the number of rounds. To the best of our knowledge this is the first attempt to address risk-aversion in the online convex bandit problem.

APA

Cardoso, A.R. & Xu, H.. (2019). Risk-Averse Stochastic Convex Bandit. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:39-47 Available from https://proceedings.mlr.press/v89/cardoso19a.html.

Risk-Averse Stochastic Convex Bandit

Abstract

Cite this Paper

Related Material