Adaptive Bandit Convex Optimization with Heterogeneous Curvature

Haipeng Luo; Mengxiao Zhang; Peng Zhao

Adaptive Bandit Convex Optimization with Heterogeneous Curvature

Haipeng Luo, Mengxiao Zhang, Peng Zhao

Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:1576-1612, 2022.

Abstract

We consider the problem of adversarial bandit convex optimization, that is, online learning over a sequence of arbitrary convex loss functions with only one function evaluation for each of them. While all previous works assume known and homogeneous curvature on these loss functions, we study a heterogeneous setting where each function has its own curvature that is only revealed after the learner makes a decision. We develop an efficient algorithm that is able to adapt to the curvature on the fly. Specifically, our algorithm not only recovers or \emph{even improves} existing results for several homogeneous settings, but also leads to surprising results for some heterogeneous settings — for example, while Hazan and Levy (2014) showed that

$\tilde{O}(d^{\frac{3}{2}}\sqrt{T})$ regret is achievable for a sequence of

$T$ smooth and strongly convex

$d$ -dimensional functions, our algorithm reveals that the same is achievable even if

$T^{\frac{3}{4}}$ of them are not strongly convex, and sometimes even if a constant fraction of them are not strongly convex. Our approach is inspired by the framework of Bartlett et al. (2007) who studied a similar heterogeneous setting but with stronger gradient feedback. Extending their framework to the bandit feedback setting requires novel ideas such as lifting the feasible domain and using a logarithmically homogeneous self-concordant barrier regularizer.

Cite this Paper

BibTeX


@InProceedings{pmlr-v178-luo22a,
  title = 	 {Adaptive Bandit Convex Optimization with Heterogeneous Curvature},
  author =       {Luo, Haipeng and Zhang, Mengxiao and Zhao, Peng},
  booktitle = 	 {Proceedings of Thirty Fifth Conference on Learning Theory},
  pages = 	 {1576--1612},
  year = 	 {2022},
  editor = 	 {Loh, Po-Ling and Raginsky, Maxim},
  volume = 	 {178},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--05 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v178/luo22a/luo22a.pdf},
  url = 	 {https://proceedings.mlr.press/v178/luo22a.html},
  abstract = 	 {We consider the problem of adversarial bandit convex optimization, that is, online learning over a sequence of arbitrary convex loss functions with only one function evaluation for each of them. While all previous works assume known and homogeneous curvature on these loss functions, we study a heterogeneous setting where each function has its own curvature that is only revealed after the learner makes a decision. We develop an efficient algorithm that is able to adapt to the curvature on the fly. Specifically, our algorithm not only recovers or \emph{even improves} existing results for several homogeneous settings, but also leads to surprising results for some heterogeneous settings — for example, while  Hazan and Levy (2014) showed that $\tilde{O}(d^{\frac{3}{2}}\sqrt{T})$ regret is achievable for a sequence of $T$ smooth and strongly convex $d$-dimensional functions, our algorithm reveals that the same is achievable even if $T^{\frac{3}{4}}$ of them are not strongly convex, and sometimes even if a constant fraction of them are not strongly convex. Our approach is inspired by the framework of Bartlett et al. (2007) who studied a similar heterogeneous setting but with stronger gradient feedback. Extending their framework to the bandit feedback setting requires novel ideas such as lifting the feasible domain and using a logarithmically homogeneous self-concordant barrier regularizer.}
}

Endnote

%0 Conference Paper
%T Adaptive Bandit Convex Optimization with Heterogeneous Curvature
%A Haipeng Luo
%A Mengxiao Zhang
%A Peng Zhao
%B Proceedings of Thirty Fifth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2022
%E Po-Ling Loh
%E Maxim Raginsky	
%F pmlr-v178-luo22a
%I PMLR
%P 1576--1612
%U https://proceedings.mlr.press/v178/luo22a.html
%V 178
%X We consider the problem of adversarial bandit convex optimization, that is, online learning over a sequence of arbitrary convex loss functions with only one function evaluation for each of them. While all previous works assume known and homogeneous curvature on these loss functions, we study a heterogeneous setting where each function has its own curvature that is only revealed after the learner makes a decision. We develop an efficient algorithm that is able to adapt to the curvature on the fly. Specifically, our algorithm not only recovers or \emph{even improves} existing results for several homogeneous settings, but also leads to surprising results for some heterogeneous settings — for example, while  Hazan and Levy (2014) showed that $\tilde{O}(d^{\frac{3}{2}}\sqrt{T})$ regret is achievable for a sequence of $T$ smooth and strongly convex $d$-dimensional functions, our algorithm reveals that the same is achievable even if $T^{\frac{3}{4}}$ of them are not strongly convex, and sometimes even if a constant fraction of them are not strongly convex. Our approach is inspired by the framework of Bartlett et al. (2007) who studied a similar heterogeneous setting but with stronger gradient feedback. Extending their framework to the bandit feedback setting requires novel ideas such as lifting the feasible domain and using a logarithmically homogeneous self-concordant barrier regularizer.

APA


Luo, H., Zhang, M. & Zhao, P.. (2022). Adaptive Bandit Convex Optimization with Heterogeneous Curvature. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:1576-1612 Available from https://proceedings.mlr.press/v178/luo22a.html.

Related Material

Download PDF