Control-Tutored Reinforcement Learning: Towards the Integration of Data-Driven and Model-Based Control

Francesco De Lellis, Marco Coraggio, Giovanni Russo, Mirco Musolesi, Mario di Bernardo
Proceedings of The 4th Annual Learning for Dynamics and Control Conference, PMLR 168:1048-1059, 2022.

Abstract

We present an architecture where a feedback controller derived on an approximate model of the environment assists the learning process to enhance its data efficiency. This architecture, which we term as Control-Tutored Q-learning (CTQL), is presented in two alternative flavours. The former is based on defining the reward function so that a Boolean condition can be used to determine when the control tutor policy is adopted, while the latter, termed as probabilistic CTQL (pCTQL), is instead based on executing calls to the tutor with a certain probability during learning. Both approaches are validated, and thoroughly benchmarked against Q-Learning, by considering the stabilization of an inverted pendulum as defined in OpenAI Gym as a representative problem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v168-lellis22a, title = {Control-Tutored Reinforcement Learning: Towards the Integration of Data-Driven and Model-Based Control}, author = {Lellis, Francesco De and Coraggio, Marco and Russo, Giovanni and Musolesi, Mirco and di Bernardo, Mario}, booktitle = {Proceedings of The 4th Annual Learning for Dynamics and Control Conference}, pages = {1048--1059}, year = {2022}, editor = {Firoozi, Roya and Mehr, Negar and Yel, Esen and Antonova, Rika and Bohg, Jeannette and Schwager, Mac and Kochenderfer, Mykel}, volume = {168}, series = {Proceedings of Machine Learning Research}, month = {23--24 Jun}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v168/lellis22a/lellis22a.pdf}, url = {https://proceedings.mlr.press/v168/lellis22a.html}, abstract = {We present an architecture where a feedback controller derived on an approximate model of the environment assists the learning process to enhance its data efficiency. This architecture, which we term as Control-Tutored Q-learning (CTQL), is presented in two alternative flavours. The former is based on defining the reward function so that a Boolean condition can be used to determine when the control tutor policy is adopted, while the latter, termed as probabilistic CTQL (pCTQL), is instead based on executing calls to the tutor with a certain probability during learning. Both approaches are validated, and thoroughly benchmarked against Q-Learning, by considering the stabilization of an inverted pendulum as defined in OpenAI Gym as a representative problem.} }
Endnote
%0 Conference Paper %T Control-Tutored Reinforcement Learning: Towards the Integration of Data-Driven and Model-Based Control %A Francesco De Lellis %A Marco Coraggio %A Giovanni Russo %A Mirco Musolesi %A Mario di Bernardo %B Proceedings of The 4th Annual Learning for Dynamics and Control Conference %C Proceedings of Machine Learning Research %D 2022 %E Roya Firoozi %E Negar Mehr %E Esen Yel %E Rika Antonova %E Jeannette Bohg %E Mac Schwager %E Mykel Kochenderfer %F pmlr-v168-lellis22a %I PMLR %P 1048--1059 %U https://proceedings.mlr.press/v168/lellis22a.html %V 168 %X We present an architecture where a feedback controller derived on an approximate model of the environment assists the learning process to enhance its data efficiency. This architecture, which we term as Control-Tutored Q-learning (CTQL), is presented in two alternative flavours. The former is based on defining the reward function so that a Boolean condition can be used to determine when the control tutor policy is adopted, while the latter, termed as probabilistic CTQL (pCTQL), is instead based on executing calls to the tutor with a certain probability during learning. Both approaches are validated, and thoroughly benchmarked against Q-Learning, by considering the stabilization of an inverted pendulum as defined in OpenAI Gym as a representative problem.
APA
Lellis, F.D., Coraggio, M., Russo, G., Musolesi, M. & di Bernardo, M.. (2022). Control-Tutored Reinforcement Learning: Towards the Integration of Data-Driven and Model-Based Control. Proceedings of The 4th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 168:1048-1059 Available from https://proceedings.mlr.press/v168/lellis22a.html.

Related Material