[edit]
Finite-time complexity of incremental policy gradient methods for solving multi-task reinforcement learning
Proceedings of the 6th Annual Learning for Dynamics & Control Conference, PMLR 242:1046-1057, 2024.
Abstract
We consider a multi-task learning problem, where an agent is presented a number of $N$ reinforcement learning tasks. To solve this problem, we are interested in studying the gradient approach, which iteratively updates an estimate of the optimal policy using the gradients of the value functions. The classic policy gradient method, however, may be expensive to implement in the multi-task settings as it requires access to the gradients of all the tasks at every iteration. To circumvent this issue, in this paper we propose to study an incremental policy gradient method, where the agent only uses the gradient of only one task at each iteration. Our main contribution is to provide theoretical results to characterize the performance of the proposed method. In particular, we show that incremental policy gradient methods converge to the optimal value of the multi-task reinforcement learning objectives at a sublinear rate $O(1/\sqrt{k})$, where $k$ is the number of iterations. To illustrate its performance, we apply the proposed method to solve a simple multi-task variant of GridWorld problems, where an agent seeks to find an policy to navigate effectively in different environments.