Bayesian Generational Population-Based Training

Xingchen Wan, Cong Lu, Jack Parker-Holder, Philip J. Ball, Vu Nguyen, Binxin Ru, Michael Osborne
Proceedings of the First International Conference on Automated Machine Learning, PMLR 188:14/1-27, 2022.

Abstract

Reinforcement learning (RL) offers the potential for training generally capable agents that can interact autonomously in the real world. However, one key limitation is the brittleness of RL algorithms to core hyperparameters and network architecture choice. Furthermore, non-stationarities such as evolving training data and increased agent complexity mean that different hyperparameters and architectures may be optimal at different points of training. This motivates AutoRL, a class of methods seeking to automate these design choices. One prominent class of AutoRL methods is Population-Based Training (PBT), which have led to impressive performance in several large scale settings. In this paper, we introduce two new innovations in PBT-style methods. First, we employ trust-region based Bayesian Optimization, enabling full coverage of the high-dimensional mixed hyperparameter search space. Second, we show that using a generational approach, we can also learn both architectures and hyperparameters jointly on-the-fly in a single training run. Leveraging the new highly parallelizable Brax physics engine, we show that these innovations lead to dramatic performance gains, significantly outperforming the tuned baseline while learning entire configurations on the fly.

Cite this Paper


BibTeX
@InProceedings{pmlr-v188-wan22a, title = {Bayesian Generational Population-Based Training}, author = {Wan, Xingchen and Lu, Cong and Parker-Holder, Jack and Ball, Philip J. and Nguyen, Vu and Ru, Binxin and Osborne, Michael}, booktitle = {Proceedings of the First International Conference on Automated Machine Learning}, pages = {14/1--27}, year = {2022}, editor = {Guyon, Isabelle and Lindauer, Marius and van der Schaar, Mihaela and Hutter, Frank and Garnett, Roman}, volume = {188}, series = {Proceedings of Machine Learning Research}, month = {25--27 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v188/wan22a/wan22a.pdf}, url = {https://proceedings.mlr.press/v188/wan22a.html}, abstract = {Reinforcement learning (RL) offers the potential for training generally capable agents that can interact autonomously in the real world. However, one key limitation is the brittleness of RL algorithms to core hyperparameters and network architecture choice. Furthermore, non-stationarities such as evolving training data and increased agent complexity mean that different hyperparameters and architectures may be optimal at different points of training. This motivates AutoRL, a class of methods seeking to automate these design choices. One prominent class of AutoRL methods is Population-Based Training (PBT), which have led to impressive performance in several large scale settings. In this paper, we introduce two new innovations in PBT-style methods. First, we employ trust-region based Bayesian Optimization, enabling full coverage of the high-dimensional mixed hyperparameter search space. Second, we show that using a generational approach, we can also learn both architectures and hyperparameters jointly on-the-fly in a single training run. Leveraging the new highly parallelizable Brax physics engine, we show that these innovations lead to dramatic performance gains, significantly outperforming the tuned baseline while learning entire configurations on the fly.} }
Endnote
%0 Conference Paper %T Bayesian Generational Population-Based Training %A Xingchen Wan %A Cong Lu %A Jack Parker-Holder %A Philip J. Ball %A Vu Nguyen %A Binxin Ru %A Michael Osborne %B Proceedings of the First International Conference on Automated Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Isabelle Guyon %E Marius Lindauer %E Mihaela van der Schaar %E Frank Hutter %E Roman Garnett %F pmlr-v188-wan22a %I PMLR %P 14/1--27 %U https://proceedings.mlr.press/v188/wan22a.html %V 188 %X Reinforcement learning (RL) offers the potential for training generally capable agents that can interact autonomously in the real world. However, one key limitation is the brittleness of RL algorithms to core hyperparameters and network architecture choice. Furthermore, non-stationarities such as evolving training data and increased agent complexity mean that different hyperparameters and architectures may be optimal at different points of training. This motivates AutoRL, a class of methods seeking to automate these design choices. One prominent class of AutoRL methods is Population-Based Training (PBT), which have led to impressive performance in several large scale settings. In this paper, we introduce two new innovations in PBT-style methods. First, we employ trust-region based Bayesian Optimization, enabling full coverage of the high-dimensional mixed hyperparameter search space. Second, we show that using a generational approach, we can also learn both architectures and hyperparameters jointly on-the-fly in a single training run. Leveraging the new highly parallelizable Brax physics engine, we show that these innovations lead to dramatic performance gains, significantly outperforming the tuned baseline while learning entire configurations on the fly.
APA
Wan, X., Lu, C., Parker-Holder, J., Ball, P.J., Nguyen, V., Ru, B. & Osborne, M.. (2022). Bayesian Generational Population-Based Training. Proceedings of the First International Conference on Automated Machine Learning, in Proceedings of Machine Learning Research 188:14/1-27 Available from https://proceedings.mlr.press/v188/wan22a.html.

Related Material