GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models

Hanjing Wang, Man-Kit Sit, Congjie He, Ying Wen, Weinan Zhang, Jun Wang, Yaodong Yang, Luo Mai
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:36380-36390, 2023.

Abstract

This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers). With such models, existing systems such as Reverb face considerable bottlenecks in memory, computation, and communication. GEAR, however, optimizes memory efficiency by enabling the memory resources on GPU servers (including host memory and device memory) to manage trajectory data. Furthermore, it facilitates decentralized GPU devices to expedite various trajectory selection strategies, circumventing computational bottlenecks. GEAR is equipped with GPU kernels capable of collecting trajectories using zero-copy access to host memory, along with remote-directed-memory access over InfiniBand, improving communication efficiency. Cluster experiments have shown that GEAR can achieve performance levels up to 6× greater than Reverb when training state-of-the-art large RL models. GEAR is open-sourced at https:// github.com/bigrl-team/gear.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-wang23aj, title = {{GEAR}: A {GPU}-Centric Experience Replay System for Large Reinforcement Learning Models}, author = {Wang, Hanjing and Sit, Man-Kit and He, Congjie and Wen, Ying and Zhang, Weinan and Wang, Jun and Yang, Yaodong and Mai, Luo}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {36380--36390}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/wang23aj/wang23aj.pdf}, url = {https://proceedings.mlr.press/v202/wang23aj.html}, abstract = {This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers). With such models, existing systems such as Reverb face considerable bottlenecks in memory, computation, and communication. GEAR, however, optimizes memory efficiency by enabling the memory resources on GPU servers (including host memory and device memory) to manage trajectory data. Furthermore, it facilitates decentralized GPU devices to expedite various trajectory selection strategies, circumventing computational bottlenecks. GEAR is equipped with GPU kernels capable of collecting trajectories using zero-copy access to host memory, along with remote-directed-memory access over InfiniBand, improving communication efficiency. Cluster experiments have shown that GEAR can achieve performance levels up to 6× greater than Reverb when training state-of-the-art large RL models. GEAR is open-sourced at https:// github.com/bigrl-team/gear.} }
Endnote
%0 Conference Paper %T GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models %A Hanjing Wang %A Man-Kit Sit %A Congjie He %A Ying Wen %A Weinan Zhang %A Jun Wang %A Yaodong Yang %A Luo Mai %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-wang23aj %I PMLR %P 36380--36390 %U https://proceedings.mlr.press/v202/wang23aj.html %V 202 %X This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers). With such models, existing systems such as Reverb face considerable bottlenecks in memory, computation, and communication. GEAR, however, optimizes memory efficiency by enabling the memory resources on GPU servers (including host memory and device memory) to manage trajectory data. Furthermore, it facilitates decentralized GPU devices to expedite various trajectory selection strategies, circumventing computational bottlenecks. GEAR is equipped with GPU kernels capable of collecting trajectories using zero-copy access to host memory, along with remote-directed-memory access over InfiniBand, improving communication efficiency. Cluster experiments have shown that GEAR can achieve performance levels up to 6× greater than Reverb when training state-of-the-art large RL models. GEAR is open-sourced at https:// github.com/bigrl-team/gear.
APA
Wang, H., Sit, M., He, C., Wen, Y., Zhang, W., Wang, J., Yang, Y. & Mai, L.. (2023). GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:36380-36390 Available from https://proceedings.mlr.press/v202/wang23aj.html.

Related Material