CT-DQN: Control-Tutored Deep Reinforcement Learning

Francesco De Lellis, Marco Coraggio, Giovanni Russo, Mirco Musolesi, Mario di Bernardo
Proceedings of The 5th Annual Learning for Dynamics and Control Conference, PMLR 211:941-953, 2023.

Abstract

One of the major challenges in Deep Reinforcement Learning for control is the need for extensive training to learn the policy. Motivated by this, we present the design of the Control-Tutored Deep Q-Networks (CT-DQN) algorithm, a Deep Reinforcement Learning algorithm that leverages a control tutor, i.e., an exogenous control law, to reduce learning time. The tutor can be designed using an approximate model of the system, without any assumption about the knowledge of the system’s dynamics. There is no expectation that it will be able to achieve the control objective if used stand-alone. During learning, the tutor occasionally suggests an action, thus partially guiding exploration. We validate our approach on three scenarios from OpenAI Gym: the inverted pendulum, lunar lander, and car racing. We demonstrate that CT-DQN is able to achieve better or equivalent data efficiency with respect to the classic function approximation solutions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v211-de-lellis23a, title = {CT-DQN: Control-Tutored Deep Reinforcement Learning}, author = {{De Lellis}, Francesco and Coraggio, Marco and Russo, Giovanni and Musolesi, Mirco and Bernardo, Mario di}, booktitle = {Proceedings of The 5th Annual Learning for Dynamics and Control Conference}, pages = {941--953}, year = {2023}, editor = {Matni, Nikolai and Morari, Manfred and Pappas, George J.}, volume = {211}, series = {Proceedings of Machine Learning Research}, month = {15--16 Jun}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v211/de-lellis23a/de-lellis23a.pdf}, url = {https://proceedings.mlr.press/v211/de-lellis23a.html}, abstract = {One of the major challenges in Deep Reinforcement Learning for control is the need for extensive training to learn the policy. Motivated by this, we present the design of the Control-Tutored Deep Q-Networks (CT-DQN) algorithm, a Deep Reinforcement Learning algorithm that leverages a control tutor, i.e., an exogenous control law, to reduce learning time. The tutor can be designed using an approximate model of the system, without any assumption about the knowledge of the system’s dynamics. There is no expectation that it will be able to achieve the control objective if used stand-alone. During learning, the tutor occasionally suggests an action, thus partially guiding exploration. We validate our approach on three scenarios from OpenAI Gym: the inverted pendulum, lunar lander, and car racing. We demonstrate that CT-DQN is able to achieve better or equivalent data efficiency with respect to the classic function approximation solutions.} }
Endnote
%0 Conference Paper %T CT-DQN: Control-Tutored Deep Reinforcement Learning %A Francesco De Lellis %A Marco Coraggio %A Giovanni Russo %A Mirco Musolesi %A Mario di Bernardo %B Proceedings of The 5th Annual Learning for Dynamics and Control Conference %C Proceedings of Machine Learning Research %D 2023 %E Nikolai Matni %E Manfred Morari %E George J. Pappas %F pmlr-v211-de-lellis23a %I PMLR %P 941--953 %U https://proceedings.mlr.press/v211/de-lellis23a.html %V 211 %X One of the major challenges in Deep Reinforcement Learning for control is the need for extensive training to learn the policy. Motivated by this, we present the design of the Control-Tutored Deep Q-Networks (CT-DQN) algorithm, a Deep Reinforcement Learning algorithm that leverages a control tutor, i.e., an exogenous control law, to reduce learning time. The tutor can be designed using an approximate model of the system, without any assumption about the knowledge of the system’s dynamics. There is no expectation that it will be able to achieve the control objective if used stand-alone. During learning, the tutor occasionally suggests an action, thus partially guiding exploration. We validate our approach on three scenarios from OpenAI Gym: the inverted pendulum, lunar lander, and car racing. We demonstrate that CT-DQN is able to achieve better or equivalent data efficiency with respect to the classic function approximation solutions.
APA
De Lellis, F., Coraggio, M., Russo, G., Musolesi, M. & Bernardo, M.d.. (2023). CT-DQN: Control-Tutored Deep Reinforcement Learning. Proceedings of The 5th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 211:941-953 Available from https://proceedings.mlr.press/v211/de-lellis23a.html.

Related Material