Second Order Methods for Bandit Optimization and Control

Arun Suggala, Y Jennifer Sun, Praneeth Netrapalli, Elad Hazan
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:4691-4763, 2024.

Abstract

Bandit convex optimization (BCO) is a general framework for online decision making under uncertainty. While tight regret bounds for general convex losses have been established, existing algorithms achieving these bounds have prohibitive computational costs for high dimensional data. In this paper, we propose a simple and practical BCO algorithm inspired by the online Newton step algorithm. We show that our algorithm achieves optimal (in terms of horizon) regret bounds for a large class of convex functions that satisfy a condition we call $\kappa$-convexity. This class contains a wide range of practically relevant loss functions including linear losses, quadratic losses, and generalized linear models. In addition to optimal regret, this method is the most efficient known algorithm for several well-studied applications including bandit logistic regression. Furthermore, we investigate the adaptation of our second-order bandit algorithm to online convex optimization with memory. We show that for loss functions with a certain affine structure, the extended algorithm attains optimal regret. This leads to an algorithm with optimal regret for bandit LQ problem under a fully adversarial noise model, thereby resolving an open question posed in Grade et. al. 2020 and Sun et. al. 2023. Finally, we show that the more general problem of BCO with (non-affine) memory is harder. We derive a $\tilde{\Omega}(T^{2/3})$ regret lower bound, even under the assumption of smooth and quadratic losses.

Cite this Paper


BibTeX
@InProceedings{pmlr-v247-suggala24a, title = {Second Order Methods for Bandit Optimization and Control}, author = {Suggala, Arun and Sun, Y Jennifer and Netrapalli, Praneeth and Hazan, Elad}, booktitle = {Proceedings of Thirty Seventh Conference on Learning Theory}, pages = {4691--4763}, year = {2024}, editor = {Agrawal, Shipra and Roth, Aaron}, volume = {247}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v247/suggala24a/suggala24a.pdf}, url = {https://proceedings.mlr.press/v247/suggala24a.html}, abstract = {Bandit convex optimization (BCO) is a general framework for online decision making under uncertainty. While tight regret bounds for general convex losses have been established, existing algorithms achieving these bounds have prohibitive computational costs for high dimensional data. In this paper, we propose a simple and practical BCO algorithm inspired by the online Newton step algorithm. We show that our algorithm achieves optimal (in terms of horizon) regret bounds for a large class of convex functions that satisfy a condition we call $\kappa$-convexity. This class contains a wide range of practically relevant loss functions including linear losses, quadratic losses, and generalized linear models. In addition to optimal regret, this method is the most efficient known algorithm for several well-studied applications including bandit logistic regression. Furthermore, we investigate the adaptation of our second-order bandit algorithm to online convex optimization with memory. We show that for loss functions with a certain affine structure, the extended algorithm attains optimal regret. This leads to an algorithm with optimal regret for bandit LQ problem under a fully adversarial noise model, thereby resolving an open question posed in Grade et. al. 2020 and Sun et. al. 2023. Finally, we show that the more general problem of BCO with (non-affine) memory is harder. We derive a $\tilde{\Omega}(T^{2/3})$ regret lower bound, even under the assumption of smooth and quadratic losses.} }
Endnote
%0 Conference Paper %T Second Order Methods for Bandit Optimization and Control %A Arun Suggala %A Y Jennifer Sun %A Praneeth Netrapalli %A Elad Hazan %B Proceedings of Thirty Seventh Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Shipra Agrawal %E Aaron Roth %F pmlr-v247-suggala24a %I PMLR %P 4691--4763 %U https://proceedings.mlr.press/v247/suggala24a.html %V 247 %X Bandit convex optimization (BCO) is a general framework for online decision making under uncertainty. While tight regret bounds for general convex losses have been established, existing algorithms achieving these bounds have prohibitive computational costs for high dimensional data. In this paper, we propose a simple and practical BCO algorithm inspired by the online Newton step algorithm. We show that our algorithm achieves optimal (in terms of horizon) regret bounds for a large class of convex functions that satisfy a condition we call $\kappa$-convexity. This class contains a wide range of practically relevant loss functions including linear losses, quadratic losses, and generalized linear models. In addition to optimal regret, this method is the most efficient known algorithm for several well-studied applications including bandit logistic regression. Furthermore, we investigate the adaptation of our second-order bandit algorithm to online convex optimization with memory. We show that for loss functions with a certain affine structure, the extended algorithm attains optimal regret. This leads to an algorithm with optimal regret for bandit LQ problem under a fully adversarial noise model, thereby resolving an open question posed in Grade et. al. 2020 and Sun et. al. 2023. Finally, we show that the more general problem of BCO with (non-affine) memory is harder. We derive a $\tilde{\Omega}(T^{2/3})$ regret lower bound, even under the assumption of smooth and quadratic losses.
APA
Suggala, A., Sun, Y.J., Netrapalli, P. & Hazan, E.. (2024). Second Order Methods for Bandit Optimization and Control. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:4691-4763 Available from https://proceedings.mlr.press/v247/suggala24a.html.

Related Material