Finite-time complexity of incremental policy gradient methods for solving multi-task reinforcement learning

Yitao Bai, Thinh Doan
Proceedings of the 6th Annual Learning for Dynamics & Control Conference, PMLR 242:1046-1057, 2024.

Abstract

We consider a multi-task learning problem, where an agent is presented a number of $N$ reinforcement learning tasks. To solve this problem, we are interested in studying the gradient approach, which iteratively updates an estimate of the optimal policy using the gradients of the value functions. The classic policy gradient method, however, may be expensive to implement in the multi-task settings as it requires access to the gradients of all the tasks at every iteration. To circumvent this issue, in this paper we propose to study an incremental policy gradient method, where the agent only uses the gradient of only one task at each iteration. Our main contribution is to provide theoretical results to characterize the performance of the proposed method. In particular, we show that incremental policy gradient methods converge to the optimal value of the multi-task reinforcement learning objectives at a sublinear rate $O(1/\sqrt{k})$, where $k$ is the number of iterations. To illustrate its performance, we apply the proposed method to solve a simple multi-task variant of GridWorld problems, where an agent seeks to find an policy to navigate effectively in different environments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v242-bai24b, title = {Finite-time complexity of incremental policy gradient methods for solving multi-task reinforcement learning}, author = {Bai, Yitao and Doan, Thinh}, booktitle = {Proceedings of the 6th Annual Learning for Dynamics & Control Conference}, pages = {1046--1057}, year = {2024}, editor = {Abate, Alessandro and Cannon, Mark and Margellos, Kostas and Papachristodoulou, Antonis}, volume = {242}, series = {Proceedings of Machine Learning Research}, month = {15--17 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v242/bai24b/bai24b.pdf}, url = {https://proceedings.mlr.press/v242/bai24b.html}, abstract = {We consider a multi-task learning problem, where an agent is presented a number of $N$ reinforcement learning tasks. To solve this problem, we are interested in studying the gradient approach, which iteratively updates an estimate of the optimal policy using the gradients of the value functions. The classic policy gradient method, however, may be expensive to implement in the multi-task settings as it requires access to the gradients of all the tasks at every iteration. To circumvent this issue, in this paper we propose to study an incremental policy gradient method, where the agent only uses the gradient of only one task at each iteration. Our main contribution is to provide theoretical results to characterize the performance of the proposed method. In particular, we show that incremental policy gradient methods converge to the optimal value of the multi-task reinforcement learning objectives at a sublinear rate $O(1/\sqrt{k})$, where $k$ is the number of iterations. To illustrate its performance, we apply the proposed method to solve a simple multi-task variant of GridWorld problems, where an agent seeks to find an policy to navigate effectively in different environments.} }
Endnote
%0 Conference Paper %T Finite-time complexity of incremental policy gradient methods for solving multi-task reinforcement learning %A Yitao Bai %A Thinh Doan %B Proceedings of the 6th Annual Learning for Dynamics & Control Conference %C Proceedings of Machine Learning Research %D 2024 %E Alessandro Abate %E Mark Cannon %E Kostas Margellos %E Antonis Papachristodoulou %F pmlr-v242-bai24b %I PMLR %P 1046--1057 %U https://proceedings.mlr.press/v242/bai24b.html %V 242 %X We consider a multi-task learning problem, where an agent is presented a number of $N$ reinforcement learning tasks. To solve this problem, we are interested in studying the gradient approach, which iteratively updates an estimate of the optimal policy using the gradients of the value functions. The classic policy gradient method, however, may be expensive to implement in the multi-task settings as it requires access to the gradients of all the tasks at every iteration. To circumvent this issue, in this paper we propose to study an incremental policy gradient method, where the agent only uses the gradient of only one task at each iteration. Our main contribution is to provide theoretical results to characterize the performance of the proposed method. In particular, we show that incremental policy gradient methods converge to the optimal value of the multi-task reinforcement learning objectives at a sublinear rate $O(1/\sqrt{k})$, where $k$ is the number of iterations. To illustrate its performance, we apply the proposed method to solve a simple multi-task variant of GridWorld problems, where an agent seeks to find an policy to navigate effectively in different environments.
APA
Bai, Y. & Doan, T.. (2024). Finite-time complexity of incremental policy gradient methods for solving multi-task reinforcement learning. Proceedings of the 6th Annual Learning for Dynamics & Control Conference, in Proceedings of Machine Learning Research 242:1046-1057 Available from https://proceedings.mlr.press/v242/bai24b.html.

Related Material