[edit]
Handling Heterogeneous Curvatures in Bandit LQR Control
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:55839-55858, 2024.
Abstract
We investigate online Linear Quadratic Regulator (LQR) with bandit feedback and semi-adversarial disturbances. Previous works assume costs with homogeneous curvatures (i.e., with a uniform strong convexity lower bound), which can be hard to satisfy in many real scenarios and prohibits adapting to true curvatures for better performance. In this paper, we initiate the study of bandit LQR control with heterogeneous cost curvatures, aiming to strengthen the algorithm’s adaptivity. To achieve this, we reduce the problem to bandit convex optimization with memory via a “with-history” reduction to avoid hard-to-control truncation errors. Then we provide a novel analysis for an important stability term that appeared in both regret and memory, using Newton decrement developed in interior-point methods. The analysis enables us to guarantee memory-related terms introduced in the reduction and also provide a simplified analysis for handling heterogeneous curvatures in bandit convex optimization. Finally, we achieve interpolated guarantees that can not only recover existing bounds for convex and quadratic costs but also attain new implications for cases of corrupted and decaying quadraticity.