Finite Time Logarithmic Regret Bounds for Self-Tuning Regulation

Rahul Singh, Akshay Mete, Avik Kar, Panganamala Kumar
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:45571-45636, 2024.

Abstract

We establish the first finite-time logarithmic regret bounds for the self-tuning regulation problem. We introduce a modified version of the certainty equivalence algorithm, which we call PIECE, that clips inputs in addition to utilizing probing inputs for exploration. We show that it has a $C \log T$ upper bound on the regret after $T$ time-steps for bounded noise, and $C\log^3 T$ in the case of sub-Gaussian noise, unlike the LQ problem where logarithmic regret is shown to be not possible. The PIECE algorithm is also designed to address the critical challenge of poor initial transient performance of reinforcement learning algorithms for linear systems. Comparative simulation results illustrate the improved performance of PIECE.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-singh24b, title = {Finite Time Logarithmic Regret Bounds for Self-Tuning Regulation}, author = {Singh, Rahul and Mete, Akshay and Kar, Avik and Kumar, Panganamala}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {45571--45636}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/singh24b/singh24b.pdf}, url = {https://proceedings.mlr.press/v235/singh24b.html}, abstract = {We establish the first finite-time logarithmic regret bounds for the self-tuning regulation problem. We introduce a modified version of the certainty equivalence algorithm, which we call PIECE, that clips inputs in addition to utilizing probing inputs for exploration. We show that it has a $C \log T$ upper bound on the regret after $T$ time-steps for bounded noise, and $C\log^3 T$ in the case of sub-Gaussian noise, unlike the LQ problem where logarithmic regret is shown to be not possible. The PIECE algorithm is also designed to address the critical challenge of poor initial transient performance of reinforcement learning algorithms for linear systems. Comparative simulation results illustrate the improved performance of PIECE.} }
Endnote
%0 Conference Paper %T Finite Time Logarithmic Regret Bounds for Self-Tuning Regulation %A Rahul Singh %A Akshay Mete %A Avik Kar %A Panganamala Kumar %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-singh24b %I PMLR %P 45571--45636 %U https://proceedings.mlr.press/v235/singh24b.html %V 235 %X We establish the first finite-time logarithmic regret bounds for the self-tuning regulation problem. We introduce a modified version of the certainty equivalence algorithm, which we call PIECE, that clips inputs in addition to utilizing probing inputs for exploration. We show that it has a $C \log T$ upper bound on the regret after $T$ time-steps for bounded noise, and $C\log^3 T$ in the case of sub-Gaussian noise, unlike the LQ problem where logarithmic regret is shown to be not possible. The PIECE algorithm is also designed to address the critical challenge of poor initial transient performance of reinforcement learning algorithms for linear systems. Comparative simulation results illustrate the improved performance of PIECE.
APA
Singh, R., Mete, A., Kar, A. & Kumar, P.. (2024). Finite Time Logarithmic Regret Bounds for Self-Tuning Regulation. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:45571-45636 Available from https://proceedings.mlr.press/v235/singh24b.html.

Related Material