Mixtures of Experts Unlock Parameter Scaling for Deep RL

Johan Samir Obando Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Nicolaus Foerster, Gintare Karolina Dziugaite, Doina Precup, Pablo Samuel Castro
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:38520-38540, 2024.

Abstract

The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model’s performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-obando-ceron24b, title = {Mixtures of Experts Unlock Parameter Scaling for Deep {RL}}, author = {Obando Ceron, Johan Samir and Sokar, Ghada and Willi, Timon and Lyle, Clare and Farebrother, Jesse and Foerster, Jakob Nicolaus and Dziugaite, Gintare Karolina and Precup, Doina and Castro, Pablo Samuel}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {38520--38540}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/obando-ceron24b/obando-ceron24b.pdf}, url = {https://proceedings.mlr.press/v235/obando-ceron24b.html}, abstract = {The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model’s performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.} }
Endnote
%0 Conference Paper %T Mixtures of Experts Unlock Parameter Scaling for Deep RL %A Johan Samir Obando Ceron %A Ghada Sokar %A Timon Willi %A Clare Lyle %A Jesse Farebrother %A Jakob Nicolaus Foerster %A Gintare Karolina Dziugaite %A Doina Precup %A Pablo Samuel Castro %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-obando-ceron24b %I PMLR %P 38520--38540 %U https://proceedings.mlr.press/v235/obando-ceron24b.html %V 235 %X The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model’s performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.
APA
Obando Ceron, J.S., Sokar, G., Willi, T., Lyle, C., Farebrother, J., Foerster, J.N., Dziugaite, G.K., Precup, D. & Castro, P.S.. (2024). Mixtures of Experts Unlock Parameter Scaling for Deep RL. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:38520-38540 Available from https://proceedings.mlr.press/v235/obando-ceron24b.html.

Related Material