CACTO-SL: Using Sobolev learning to improve continuous actor-critic with trajectory optimization

Elisa Alboni, Gianluigi Grandesso, Gastone Pietro Rosati Papini, Justin Carpentier, Andrea Del Prete
Proceedings of the 6th Annual Learning for Dynamics & Control Conference, PMLR 242:1452-1463, 2024.

Abstract

Trajectory Optimization (TO) and Reinforcement Learning (RL) are powerful and complementary tools to solve optimal control problems. On the one hand, TO can efficiently compute locally-optimal solutions, but it tends to get stuck in local minima if the problem is not convex. On the other hand, RL is typically less sensitive to non-convexity, but it requires a much higher computational effort. Recently, we have proposed CACTO (Continuous Actor-Critic with Trajectory Optimization), an algorithm that uses TO to guide the exploration of an actor-critic RL algorithm. In turns, the policy encoded by the actor is used to warm-start TO, closing the loop between TO and RL. In this work, we present an extension of CACTO exploiting the idea of Sobolev learning. To make the training of the critic network faster and more data efficient, we enrich it with the gradient of the Value function, computed via a backward pass of the differential dynamic programming algorithm. Our results show that the new algorithm is more efficient than the original CACTO, reducing the number of TO episodes by a factor ranging from 3 to 10, and consequently the computation time. Moreover, we show that CACTO-SL helps TO to find better minima and to produce more consistent results.

Cite this Paper


BibTeX
@InProceedings{pmlr-v242-alboni24a, title = {{CACTO-SL}: {U}sing {S}obolev learning to improve continuous actor-critic with trajectory optimization}, author = {Alboni, Elisa and Grandesso, Gianluigi and Rosati Papini, Gastone Pietro and Carpentier, Justin and Del Prete, Andrea}, booktitle = {Proceedings of the 6th Annual Learning for Dynamics & Control Conference}, pages = {1452--1463}, year = {2024}, editor = {Abate, Alessandro and Cannon, Mark and Margellos, Kostas and Papachristodoulou, Antonis}, volume = {242}, series = {Proceedings of Machine Learning Research}, month = {15--17 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v242/alboni24a/alboni24a.pdf}, url = {https://proceedings.mlr.press/v242/alboni24a.html}, abstract = {Trajectory Optimization (TO) and Reinforcement Learning (RL) are powerful and complementary tools to solve optimal control problems. On the one hand, TO can efficiently compute locally-optimal solutions, but it tends to get stuck in local minima if the problem is not convex. On the other hand, RL is typically less sensitive to non-convexity, but it requires a much higher computational effort. Recently, we have proposed CACTO (Continuous Actor-Critic with Trajectory Optimization), an algorithm that uses TO to guide the exploration of an actor-critic RL algorithm. In turns, the policy encoded by the actor is used to warm-start TO, closing the loop between TO and RL. In this work, we present an extension of CACTO exploiting the idea of Sobolev learning. To make the training of the critic network faster and more data efficient, we enrich it with the gradient of the Value function, computed via a backward pass of the differential dynamic programming algorithm. Our results show that the new algorithm is more efficient than the original CACTO, reducing the number of TO episodes by a factor ranging from 3 to 10, and consequently the computation time. Moreover, we show that CACTO-SL helps TO to find better minima and to produce more consistent results.} }
Endnote
%0 Conference Paper %T CACTO-SL: Using Sobolev learning to improve continuous actor-critic with trajectory optimization %A Elisa Alboni %A Gianluigi Grandesso %A Gastone Pietro Rosati Papini %A Justin Carpentier %A Andrea Del Prete %B Proceedings of the 6th Annual Learning for Dynamics & Control Conference %C Proceedings of Machine Learning Research %D 2024 %E Alessandro Abate %E Mark Cannon %E Kostas Margellos %E Antonis Papachristodoulou %F pmlr-v242-alboni24a %I PMLR %P 1452--1463 %U https://proceedings.mlr.press/v242/alboni24a.html %V 242 %X Trajectory Optimization (TO) and Reinforcement Learning (RL) are powerful and complementary tools to solve optimal control problems. On the one hand, TO can efficiently compute locally-optimal solutions, but it tends to get stuck in local minima if the problem is not convex. On the other hand, RL is typically less sensitive to non-convexity, but it requires a much higher computational effort. Recently, we have proposed CACTO (Continuous Actor-Critic with Trajectory Optimization), an algorithm that uses TO to guide the exploration of an actor-critic RL algorithm. In turns, the policy encoded by the actor is used to warm-start TO, closing the loop between TO and RL. In this work, we present an extension of CACTO exploiting the idea of Sobolev learning. To make the training of the critic network faster and more data efficient, we enrich it with the gradient of the Value function, computed via a backward pass of the differential dynamic programming algorithm. Our results show that the new algorithm is more efficient than the original CACTO, reducing the number of TO episodes by a factor ranging from 3 to 10, and consequently the computation time. Moreover, we show that CACTO-SL helps TO to find better minima and to produce more consistent results.
APA
Alboni, E., Grandesso, G., Rosati Papini, G.P., Carpentier, J. & Del Prete, A.. (2024). CACTO-SL: Using Sobolev learning to improve continuous actor-critic with trajectory optimization. Proceedings of the 6th Annual Learning for Dynamics & Control Conference, in Proceedings of Machine Learning Research 242:1452-1463 Available from https://proceedings.mlr.press/v242/alboni24a.html.

Related Material