Value-Based Deep RL Scales Predictably

Oleh Rybkin, Michal Nauman, Preston Fu, Charlie Victor Snell, Pieter Abbeel, Sergey Levine, Aviral Kumar
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:52422-52443, 2025.

Abstract

Scaling data and compute is critical in modern machine learning. However, scaling also demands predictability: we want methods to not only perform well with more compute or data, but also have their performance be predictable from low compute or low data runs, without ever running the large-scale experiment. In this paper, we show predictability of value-based off-policy deep RL. First, we show that data and compute requirements to reach a given performance level lie on a Pareto frontier, controlled by the updates-to-data (UTD) ratio. By estimating this frontier, we can extrapolate data requirements into a higher compute regime, and compute requirements into a higher data regime. Second, we determine the optimal allocation of total budget across data and compute to obtain given performance and use it to determine hyperparameters that maximize performance for a given budget. Third, this scaling behavior is enabled by first estimating predictable relationships between different hyperparameters, which is used to counteract effects of overfitting and plasticity loss unique to RL. We validate our approach using three algorithms: SAC, BRO, and PQL on DeepMind Control, OpenAI gym, and IsaacGym, when extrapolating to higher levels of data, compute, budget, or performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-rybkin25a, title = {Value-Based Deep {RL} Scales Predictably}, author = {Rybkin, Oleh and Nauman, Michal and Fu, Preston and Snell, Charlie Victor and Abbeel, Pieter and Levine, Sergey and Kumar, Aviral}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {52422--52443}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/rybkin25a/rybkin25a.pdf}, url = {https://proceedings.mlr.press/v267/rybkin25a.html}, abstract = {Scaling data and compute is critical in modern machine learning. However, scaling also demands predictability: we want methods to not only perform well with more compute or data, but also have their performance be predictable from low compute or low data runs, without ever running the large-scale experiment. In this paper, we show predictability of value-based off-policy deep RL. First, we show that data and compute requirements to reach a given performance level lie on a Pareto frontier, controlled by the updates-to-data (UTD) ratio. By estimating this frontier, we can extrapolate data requirements into a higher compute regime, and compute requirements into a higher data regime. Second, we determine the optimal allocation of total budget across data and compute to obtain given performance and use it to determine hyperparameters that maximize performance for a given budget. Third, this scaling behavior is enabled by first estimating predictable relationships between different hyperparameters, which is used to counteract effects of overfitting and plasticity loss unique to RL. We validate our approach using three algorithms: SAC, BRO, and PQL on DeepMind Control, OpenAI gym, and IsaacGym, when extrapolating to higher levels of data, compute, budget, or performance.} }
Endnote
%0 Conference Paper %T Value-Based Deep RL Scales Predictably %A Oleh Rybkin %A Michal Nauman %A Preston Fu %A Charlie Victor Snell %A Pieter Abbeel %A Sergey Levine %A Aviral Kumar %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-rybkin25a %I PMLR %P 52422--52443 %U https://proceedings.mlr.press/v267/rybkin25a.html %V 267 %X Scaling data and compute is critical in modern machine learning. However, scaling also demands predictability: we want methods to not only perform well with more compute or data, but also have their performance be predictable from low compute or low data runs, without ever running the large-scale experiment. In this paper, we show predictability of value-based off-policy deep RL. First, we show that data and compute requirements to reach a given performance level lie on a Pareto frontier, controlled by the updates-to-data (UTD) ratio. By estimating this frontier, we can extrapolate data requirements into a higher compute regime, and compute requirements into a higher data regime. Second, we determine the optimal allocation of total budget across data and compute to obtain given performance and use it to determine hyperparameters that maximize performance for a given budget. Third, this scaling behavior is enabled by first estimating predictable relationships between different hyperparameters, which is used to counteract effects of overfitting and plasticity loss unique to RL. We validate our approach using three algorithms: SAC, BRO, and PQL on DeepMind Control, OpenAI gym, and IsaacGym, when extrapolating to higher levels of data, compute, budget, or performance.
APA
Rybkin, O., Nauman, M., Fu, P., Snell, C.V., Abbeel, P., Levine, S. & Kumar, A.. (2025). Value-Based Deep RL Scales Predictably. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:52422-52443 Available from https://proceedings.mlr.press/v267/rybkin25a.html.

Related Material