An Optimal Algorithm for Bandit Convex Optimization with Strongly-Convex and Smooth Loss

Shinji Ito
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:2229-2239, 2020.

Abstract

We consider non-stochastic bandit convex optimization with strongly-convex and smooth loss functions. For this problem, Hazan and Levy have proposed an algorithm with a regret bound of ˜O(d3/2T) given access to an O(d)-self-concordant barrier over the feasible region, where d and T stand for the dimensionality of the feasible region and the number of rounds, respectively. However, there are no known efficient ways for constructing self-concordant barriers for general convex sets, and a ˜O(d) gap has remained between the upper and lower bounds, as the known regret lower bound is Ω(dT). Our study resolves these two issues by introducing an algorithm that achieves an optimal regret bound of ˜O(dT) under a mild assumption, without self-concordant barriers. More precisely, the algorithm requires only a membership oracle for the feasible region, and it achieves an optimal regret bound of ˜O(dT) under the assumption that the optimal solution is an interior of the feasible region. Even without this assumption, our algorithm achieves ˜O(d3/2T)-regret.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-ito20a, title = {An Optimal Algorithm for Bandit Convex Optimization with Strongly-Convex and Smooth Loss}, author = {Ito, Shinji}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {2229--2239}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/ito20a/ito20a.pdf}, url = {https://proceedings.mlr.press/v108/ito20a.html}, abstract = {We consider non-stochastic bandit convex optimization with strongly-convex and smooth loss functions. For this problem, Hazan and Levy have proposed an algorithm with a regret bound of $\tilde{O}(d^{3/2} \sqrt{T})$ given access to an $O(d)$-self-concordant barrier over the feasible region, where $d$ and $T$ stand for the dimensionality of the feasible region and the number of rounds, respectively. However, there are no known efficient ways for constructing self-concordant barriers for general convex sets, and a $\tilde{O}(\sqrt{d})$ gap has remained between the upper and lower bounds, as the known regret lower bound is $\Omega(d\sqrt{T})$. Our study resolves these two issues by introducing an algorithm that achieves an optimal regret bound of $\tilde{O}(d \sqrt{T})$ under a mild assumption, without self-concordant barriers. More precisely, the algorithm requires only a membership oracle for the feasible region, and it achieves an optimal regret bound of $\tilde{O}(d\sqrt{T})$ under the assumption that the optimal solution is an interior of the feasible region. Even without this assumption, our algorithm achieves $\tilde{O}(d^{3/2}\sqrt{T})$-regret.} }
Endnote
%0 Conference Paper %T An Optimal Algorithm for Bandit Convex Optimization with Strongly-Convex and Smooth Loss %A Shinji Ito %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-ito20a %I PMLR %P 2229--2239 %U https://proceedings.mlr.press/v108/ito20a.html %V 108 %X We consider non-stochastic bandit convex optimization with strongly-convex and smooth loss functions. For this problem, Hazan and Levy have proposed an algorithm with a regret bound of $\tilde{O}(d^{3/2} \sqrt{T})$ given access to an $O(d)$-self-concordant barrier over the feasible region, where $d$ and $T$ stand for the dimensionality of the feasible region and the number of rounds, respectively. However, there are no known efficient ways for constructing self-concordant barriers for general convex sets, and a $\tilde{O}(\sqrt{d})$ gap has remained between the upper and lower bounds, as the known regret lower bound is $\Omega(d\sqrt{T})$. Our study resolves these two issues by introducing an algorithm that achieves an optimal regret bound of $\tilde{O}(d \sqrt{T})$ under a mild assumption, without self-concordant barriers. More precisely, the algorithm requires only a membership oracle for the feasible region, and it achieves an optimal regret bound of $\tilde{O}(d\sqrt{T})$ under the assumption that the optimal solution is an interior of the feasible region. Even without this assumption, our algorithm achieves $\tilde{O}(d^{3/2}\sqrt{T})$-regret.
APA
Ito, S.. (2020). An Optimal Algorithm for Bandit Convex Optimization with Strongly-Convex and Smooth Loss. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:2229-2239 Available from https://proceedings.mlr.press/v108/ito20a.html.

Related Material