EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data

Shengjie Wang, Shaohuai Liu, Weirui Ye, Jiacheng You, Yang Gao
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:51041-51062, 2024.

Abstract

Sample efficiency remains a crucial challenge in applying Reinforcement Learning (RL) to real-world tasks. While recent algorithms have made significant strides in improving sample efficiency, none have achieved consistently superior performance across diverse domains. In this paper, we introduce EfficientZero V2, a general framework designed for sample-efficient RL algorithms. We have expanded the performance of EfficientZero to multiple domains, encompassing both continuous and discrete actions, as well as visual and low-dimensional inputs. With a series of improvements we propose, EfficientZero V2 outperforms the current state-of-the-art (SoTA) by a significant margin in diverse tasks under the limited data setting. EfficientZero V2 exhibits a notable advancement over the prevailing general algorithm, DreamerV3, achieving superior outcomes in 50 of 66 evaluated tasks across multiple benchmarks, including Atari 100k, Proprio Control, and Vision Control.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-wang24at, title = {{E}fficient{Z}ero V2: Mastering Discrete and Continuous Control with Limited Data}, author = {Wang, Shengjie and Liu, Shaohuai and Ye, Weirui and You, Jiacheng and Gao, Yang}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {51041--51062}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/wang24at/wang24at.pdf}, url = {https://proceedings.mlr.press/v235/wang24at.html}, abstract = {Sample efficiency remains a crucial challenge in applying Reinforcement Learning (RL) to real-world tasks. While recent algorithms have made significant strides in improving sample efficiency, none have achieved consistently superior performance across diverse domains. In this paper, we introduce EfficientZero V2, a general framework designed for sample-efficient RL algorithms. We have expanded the performance of EfficientZero to multiple domains, encompassing both continuous and discrete actions, as well as visual and low-dimensional inputs. With a series of improvements we propose, EfficientZero V2 outperforms the current state-of-the-art (SoTA) by a significant margin in diverse tasks under the limited data setting. EfficientZero V2 exhibits a notable advancement over the prevailing general algorithm, DreamerV3, achieving superior outcomes in 50 of 66 evaluated tasks across multiple benchmarks, including Atari 100k, Proprio Control, and Vision Control.} }
Endnote
%0 Conference Paper %T EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data %A Shengjie Wang %A Shaohuai Liu %A Weirui Ye %A Jiacheng You %A Yang Gao %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-wang24at %I PMLR %P 51041--51062 %U https://proceedings.mlr.press/v235/wang24at.html %V 235 %X Sample efficiency remains a crucial challenge in applying Reinforcement Learning (RL) to real-world tasks. While recent algorithms have made significant strides in improving sample efficiency, none have achieved consistently superior performance across diverse domains. In this paper, we introduce EfficientZero V2, a general framework designed for sample-efficient RL algorithms. We have expanded the performance of EfficientZero to multiple domains, encompassing both continuous and discrete actions, as well as visual and low-dimensional inputs. With a series of improvements we propose, EfficientZero V2 outperforms the current state-of-the-art (SoTA) by a significant margin in diverse tasks under the limited data setting. EfficientZero V2 exhibits a notable advancement over the prevailing general algorithm, DreamerV3, achieving superior outcomes in 50 of 66 evaluated tasks across multiple benchmarks, including Atari 100k, Proprio Control, and Vision Control.
APA
Wang, S., Liu, S., Ye, W., You, J. & Gao, Y.. (2024). EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:51041-51062 Available from https://proceedings.mlr.press/v235/wang24at.html.

Related Material