Online Estimation and Control with Optimal Pathlength Regret

Gautam Goel, Babak Hassibi
Proceedings of The 4th Annual Learning for Dynamics and Control Conference, PMLR 168:404-414, 2022.

Abstract

A natural goal when designing online learning algorithms for non-stationary environments is to bound the regret of the algorithm in terms of the temporal variation of the input sequence. Intuitively, when the variation is small, it should be easier for the algorithm to achieve low regret, since past observations are predictive of future inputs. Such data-dependent "pathlength" regret bounds have recently been obtained for a wide variety of online learning problems, including online convex optimization (OCO) and bandits. We obtain the first pathlength regret bounds for online control and estimation (e.g. Kalman filtering) in linear dynamical systems. The key idea in our derivation is to reduce pathlength-optimal filtering and control to certain variational problems in robust estimation and control; these reductions may be of independent interest. Numerical simulations confirm that our pathlength-optimal algorithms outperform traditional H-2 and H-infinity algorithms when the environment varies over time.

Cite this Paper


BibTeX
@InProceedings{pmlr-v168-goel22a, title = {Online Estimation and Control with Optimal Pathlength Regret}, author = {Goel, Gautam and Hassibi, Babak}, booktitle = {Proceedings of The 4th Annual Learning for Dynamics and Control Conference}, pages = {404--414}, year = {2022}, editor = {Firoozi, Roya and Mehr, Negar and Yel, Esen and Antonova, Rika and Bohg, Jeannette and Schwager, Mac and Kochenderfer, Mykel}, volume = {168}, series = {Proceedings of Machine Learning Research}, month = {23--24 Jun}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v168/goel22a/goel22a.pdf}, url = {https://proceedings.mlr.press/v168/goel22a.html}, abstract = {A natural goal when designing online learning algorithms for non-stationary environments is to bound the regret of the algorithm in terms of the temporal variation of the input sequence. Intuitively, when the variation is small, it should be easier for the algorithm to achieve low regret, since past observations are predictive of future inputs. Such data-dependent "pathlength" regret bounds have recently been obtained for a wide variety of online learning problems, including online convex optimization (OCO) and bandits. We obtain the first pathlength regret bounds for online control and estimation (e.g. Kalman filtering) in linear dynamical systems. The key idea in our derivation is to reduce pathlength-optimal filtering and control to certain variational problems in robust estimation and control; these reductions may be of independent interest. Numerical simulations confirm that our pathlength-optimal algorithms outperform traditional H-2 and H-infinity algorithms when the environment varies over time.} }
Endnote
%0 Conference Paper %T Online Estimation and Control with Optimal Pathlength Regret %A Gautam Goel %A Babak Hassibi %B Proceedings of The 4th Annual Learning for Dynamics and Control Conference %C Proceedings of Machine Learning Research %D 2022 %E Roya Firoozi %E Negar Mehr %E Esen Yel %E Rika Antonova %E Jeannette Bohg %E Mac Schwager %E Mykel Kochenderfer %F pmlr-v168-goel22a %I PMLR %P 404--414 %U https://proceedings.mlr.press/v168/goel22a.html %V 168 %X A natural goal when designing online learning algorithms for non-stationary environments is to bound the regret of the algorithm in terms of the temporal variation of the input sequence. Intuitively, when the variation is small, it should be easier for the algorithm to achieve low regret, since past observations are predictive of future inputs. Such data-dependent "pathlength" regret bounds have recently been obtained for a wide variety of online learning problems, including online convex optimization (OCO) and bandits. We obtain the first pathlength regret bounds for online control and estimation (e.g. Kalman filtering) in linear dynamical systems. The key idea in our derivation is to reduce pathlength-optimal filtering and control to certain variational problems in robust estimation and control; these reductions may be of independent interest. Numerical simulations confirm that our pathlength-optimal algorithms outperform traditional H-2 and H-infinity algorithms when the environment varies over time.
APA
Goel, G. & Hassibi, B.. (2022). Online Estimation and Control with Optimal Pathlength Regret. Proceedings of The 4th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 168:404-414 Available from https://proceedings.mlr.press/v168/goel22a.html.

Related Material