The Impact of On-Policy Parallelized Data Collection on Deep Reinforcement Learning Networks

Walter Mayor, Johan Obando-Ceron, Aaron Courville, Pablo Samuel Castro
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:43331-43352, 2025.

Abstract

The use of parallel actors for data collection has been an effective technique used in reinforcement learning (RL) algorithms. The manner in which data is collected in these algorithms, controlled via the number of parallel environments and the rollout length, induces a form of bias-variance trade-off; the number of training passes over the collected data, on the other hand, must strike a balance between sample efficiency and overfitting. We conduct an empirical analysis of these trade-offs on PPO, one of the most popular RL algorithms that uses parallel actors, and establish connections to network plasticity and, more generally, optimization stability. We examine its impact on network architectures, as well as the hyper-parameter sensitivity when scaling data. Our analyses indicate that larger dataset sizes can increase final performance across a variety of settings, and that scaling parallel environments is more effective than increasing rollout lengths. These findings highlight the critical role of data collection strategies in improving agent performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-mayor25a, title = {The Impact of On-Policy Parallelized Data Collection on Deep Reinforcement Learning Networks}, author = {Mayor, Walter and Obando-Ceron, Johan and Courville, Aaron and Castro, Pablo Samuel}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {43331--43352}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/mayor25a/mayor25a.pdf}, url = {https://proceedings.mlr.press/v267/mayor25a.html}, abstract = {The use of parallel actors for data collection has been an effective technique used in reinforcement learning (RL) algorithms. The manner in which data is collected in these algorithms, controlled via the number of parallel environments and the rollout length, induces a form of bias-variance trade-off; the number of training passes over the collected data, on the other hand, must strike a balance between sample efficiency and overfitting. We conduct an empirical analysis of these trade-offs on PPO, one of the most popular RL algorithms that uses parallel actors, and establish connections to network plasticity and, more generally, optimization stability. We examine its impact on network architectures, as well as the hyper-parameter sensitivity when scaling data. Our analyses indicate that larger dataset sizes can increase final performance across a variety of settings, and that scaling parallel environments is more effective than increasing rollout lengths. These findings highlight the critical role of data collection strategies in improving agent performance.} }
Endnote
%0 Conference Paper %T The Impact of On-Policy Parallelized Data Collection on Deep Reinforcement Learning Networks %A Walter Mayor %A Johan Obando-Ceron %A Aaron Courville %A Pablo Samuel Castro %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-mayor25a %I PMLR %P 43331--43352 %U https://proceedings.mlr.press/v267/mayor25a.html %V 267 %X The use of parallel actors for data collection has been an effective technique used in reinforcement learning (RL) algorithms. The manner in which data is collected in these algorithms, controlled via the number of parallel environments and the rollout length, induces a form of bias-variance trade-off; the number of training passes over the collected data, on the other hand, must strike a balance between sample efficiency and overfitting. We conduct an empirical analysis of these trade-offs on PPO, one of the most popular RL algorithms that uses parallel actors, and establish connections to network plasticity and, more generally, optimization stability. We examine its impact on network architectures, as well as the hyper-parameter sensitivity when scaling data. Our analyses indicate that larger dataset sizes can increase final performance across a variety of settings, and that scaling parallel environments is more effective than increasing rollout lengths. These findings highlight the critical role of data collection strategies in improving agent performance.
APA
Mayor, W., Obando-Ceron, J., Courville, A. & Castro, P.S.. (2025). The Impact of On-Policy Parallelized Data Collection on Deep Reinforcement Learning Networks. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:43331-43352 Available from https://proceedings.mlr.press/v267/mayor25a.html.

Related Material