Handling Heterogeneous Curvatures in Bandit LQR Control

Yu-Hu Yan, Jing Wang, Peng Zhao
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:55839-55858, 2024.

Abstract

We investigate online Linear Quadratic Regulator (LQR) with bandit feedback and semi-adversarial disturbances. Previous works assume costs with homogeneous curvatures (i.e., with a uniform strong convexity lower bound), which can be hard to satisfy in many real scenarios and prohibits adapting to true curvatures for better performance. In this paper, we initiate the study of bandit LQR control with heterogeneous cost curvatures, aiming to strengthen the algorithm’s adaptivity. To achieve this, we reduce the problem to bandit convex optimization with memory via a “with-history” reduction to avoid hard-to-control truncation errors. Then we provide a novel analysis for an important stability term that appeared in both regret and memory, using Newton decrement developed in interior-point methods. The analysis enables us to guarantee memory-related terms introduced in the reduction and also provide a simplified analysis for handling heterogeneous curvatures in bandit convex optimization. Finally, we achieve interpolated guarantees that can not only recover existing bounds for convex and quadratic costs but also attain new implications for cases of corrupted and decaying quadraticity.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-yan24f, title = {Handling Heterogeneous Curvatures in Bandit {LQR} Control}, author = {Yan, Yu-Hu and Wang, Jing and Zhao, Peng}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {55839--55858}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/yan24f/yan24f.pdf}, url = {https://proceedings.mlr.press/v235/yan24f.html}, abstract = {We investigate online Linear Quadratic Regulator (LQR) with bandit feedback and semi-adversarial disturbances. Previous works assume costs with homogeneous curvatures (i.e., with a uniform strong convexity lower bound), which can be hard to satisfy in many real scenarios and prohibits adapting to true curvatures for better performance. In this paper, we initiate the study of bandit LQR control with heterogeneous cost curvatures, aiming to strengthen the algorithm’s adaptivity. To achieve this, we reduce the problem to bandit convex optimization with memory via a “with-history” reduction to avoid hard-to-control truncation errors. Then we provide a novel analysis for an important stability term that appeared in both regret and memory, using Newton decrement developed in interior-point methods. The analysis enables us to guarantee memory-related terms introduced in the reduction and also provide a simplified analysis for handling heterogeneous curvatures in bandit convex optimization. Finally, we achieve interpolated guarantees that can not only recover existing bounds for convex and quadratic costs but also attain new implications for cases of corrupted and decaying quadraticity.} }
Endnote
%0 Conference Paper %T Handling Heterogeneous Curvatures in Bandit LQR Control %A Yu-Hu Yan %A Jing Wang %A Peng Zhao %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-yan24f %I PMLR %P 55839--55858 %U https://proceedings.mlr.press/v235/yan24f.html %V 235 %X We investigate online Linear Quadratic Regulator (LQR) with bandit feedback and semi-adversarial disturbances. Previous works assume costs with homogeneous curvatures (i.e., with a uniform strong convexity lower bound), which can be hard to satisfy in many real scenarios and prohibits adapting to true curvatures for better performance. In this paper, we initiate the study of bandit LQR control with heterogeneous cost curvatures, aiming to strengthen the algorithm’s adaptivity. To achieve this, we reduce the problem to bandit convex optimization with memory via a “with-history” reduction to avoid hard-to-control truncation errors. Then we provide a novel analysis for an important stability term that appeared in both regret and memory, using Newton decrement developed in interior-point methods. The analysis enables us to guarantee memory-related terms introduced in the reduction and also provide a simplified analysis for handling heterogeneous curvatures in bandit convex optimization. Finally, we achieve interpolated guarantees that can not only recover existing bounds for convex and quadratic costs but also attain new implications for cases of corrupted and decaying quadraticity.
APA
Yan, Y., Wang, J. & Zhao, P.. (2024). Handling Heterogeneous Curvatures in Bandit LQR Control. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:55839-55858 Available from https://proceedings.mlr.press/v235/yan24f.html.

Related Material