HPO-RL-Bench: A Zero-Cost Benchmark for HPO in Reinforcement Learning

Gresa Shala, Sebastian Pineda Arango, André Biedenkapp, Frank Hutter, Josif Grabocka
Proceedings of the Third International Conference on Automated Machine Learning, PMLR 256:18/1-31, 2024.

Abstract

Despite the undeniable importance of optimizing the hyperparameters of RL algorithms, existing state-of-the-art Hyperparameter Optimization (HPO) techniques are not frequently utilized by RL researchers. To catalyze HPO research in RL, we present a new large-scale benchmark that includes pre-computed reward curve evaluations of hyperparameter configurations for six established RL algorithms (PPO, DDPG, A2C, SAC, TD3, DQN) on 22 environments (Atari, Mujoco, Control), repeated for multiple seeds. We exhaustively computed the reward curves of all possible combinations of hyperparameters for the considered hyperparameter spaces for each RL algorithm in each environment. As a result, our benchmark permits zero-cost experiments for deploying and comparing new HPO methods. In addition, the benchmark offers a set of integrated HPO methods, enabling plug-and-play tuning of the hyperparameters of new RL algorithms, while pre-computed evaluations allow a zero-cost comparison of a new RL algorithm against the tuned RL baselines in our benchmark.

Cite this Paper


BibTeX
@InProceedings{pmlr-v256-shala24a, title = {HPO-RL-Bench: A Zero-Cost Benchmark for HPO in Reinforcement Learning}, author = {Shala, Gresa and Arango, Sebastian Pineda and Biedenkapp, Andr\'e and Hutter, Frank and Grabocka, Josif}, booktitle = {Proceedings of the Third International Conference on Automated Machine Learning}, pages = {18/1--31}, year = {2024}, editor = {Eggensperger, Katharina and Garnett, Roman and Vanschoren, Joaquin and Lindauer, Marius and Gardner, Jacob R.}, volume = {256}, series = {Proceedings of Machine Learning Research}, month = {09--12 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v256/main/assets/shala24a/shala24a.pdf}, url = {https://proceedings.mlr.press/v256/shala24a.html}, abstract = {Despite the undeniable importance of optimizing the hyperparameters of RL algorithms, existing state-of-the-art Hyperparameter Optimization (HPO) techniques are not frequently utilized by RL researchers. To catalyze HPO research in RL, we present a new large-scale benchmark that includes pre-computed reward curve evaluations of hyperparameter configurations for six established RL algorithms (PPO, DDPG, A2C, SAC, TD3, DQN) on 22 environments (Atari, Mujoco, Control), repeated for multiple seeds. We exhaustively computed the reward curves of all possible combinations of hyperparameters for the considered hyperparameter spaces for each RL algorithm in each environment. As a result, our benchmark permits zero-cost experiments for deploying and comparing new HPO methods. In addition, the benchmark offers a set of integrated HPO methods, enabling plug-and-play tuning of the hyperparameters of new RL algorithms, while pre-computed evaluations allow a zero-cost comparison of a new RL algorithm against the tuned RL baselines in our benchmark.} }
Endnote
%0 Conference Paper %T HPO-RL-Bench: A Zero-Cost Benchmark for HPO in Reinforcement Learning %A Gresa Shala %A Sebastian Pineda Arango %A André Biedenkapp %A Frank Hutter %A Josif Grabocka %B Proceedings of the Third International Conference on Automated Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Katharina Eggensperger %E Roman Garnett %E Joaquin Vanschoren %E Marius Lindauer %E Jacob R. Gardner %F pmlr-v256-shala24a %I PMLR %P 18/1--31 %U https://proceedings.mlr.press/v256/shala24a.html %V 256 %X Despite the undeniable importance of optimizing the hyperparameters of RL algorithms, existing state-of-the-art Hyperparameter Optimization (HPO) techniques are not frequently utilized by RL researchers. To catalyze HPO research in RL, we present a new large-scale benchmark that includes pre-computed reward curve evaluations of hyperparameter configurations for six established RL algorithms (PPO, DDPG, A2C, SAC, TD3, DQN) on 22 environments (Atari, Mujoco, Control), repeated for multiple seeds. We exhaustively computed the reward curves of all possible combinations of hyperparameters for the considered hyperparameter spaces for each RL algorithm in each environment. As a result, our benchmark permits zero-cost experiments for deploying and comparing new HPO methods. In addition, the benchmark offers a set of integrated HPO methods, enabling plug-and-play tuning of the hyperparameters of new RL algorithms, while pre-computed evaluations allow a zero-cost comparison of a new RL algorithm against the tuned RL baselines in our benchmark.
APA
Shala, G., Arango, S.P., Biedenkapp, A., Hutter, F. & Grabocka, J.. (2024). HPO-RL-Bench: A Zero-Cost Benchmark for HPO in Reinforcement Learning. Proceedings of the Third International Conference on Automated Machine Learning, in Proceedings of Machine Learning Research 256:18/1-31 Available from https://proceedings.mlr.press/v256/shala24a.html.

Related Material