Hyperspherical Normalization for Scalable Deep Reinforcement Learning

Hojoon Lee; Youngdo Lee; Takuma Seno; Donghu Kim; Peter Stone; Jaegul Choo

Hyperspherical Normalization for Scalable Deep Reinforcement Learning

Hojoon Lee, Youngdo Lee, Takuma Seno, Donghu Kim, Peter Stone, Jaegul Choo

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:33352-33403, 2025.

Abstract

Scaling up the model size and computation has brought consistent performance improvements in supervised learning. However, this lesson often fails to apply to reinforcement learning (RL) because training the model on non-stationary data easily leads to overfitting and unstable optimization. In response, we introduce SimbaV2, a novel RL architecture designed to stabilize optimization by (i) constraining the growth of weight and feature norm by hyperspherical normalization; and (ii) using a distributional value estimation with reward scaling to maintain stable gradients under varying reward magnitudes. Using the soft actor-critic as a base algorithm, SimbaV2 scales up effectively with larger models and greater compute, achieving state-of-the-art performance on 57 continuous control tasks across 4 domains.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-lee25u,
  title = 	 {Hyperspherical Normalization for Scalable Deep Reinforcement Learning},
  author =       {Lee, Hojoon and Lee, Youngdo and Seno, Takuma and Kim, Donghu and Stone, Peter and Choo, Jaegul},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {33352--33403},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/lee25u/lee25u.pdf},
  url = 	 {https://proceedings.mlr.press/v267/lee25u.html},
  abstract = 	 {Scaling up the model size and computation has brought consistent performance improvements in supervised learning. However, this lesson often fails to apply to reinforcement learning (RL) because training the model on non-stationary data easily leads to overfitting and unstable optimization. In response, we introduce SimbaV2, a novel RL architecture designed to stabilize optimization by (i) constraining the growth of weight and feature norm by hyperspherical normalization; and (ii) using a distributional value estimation with reward scaling to maintain stable gradients under varying reward magnitudes. Using the soft actor-critic as a base algorithm, SimbaV2 scales up effectively with larger models and greater compute, achieving state-of-the-art performance on 57 continuous control tasks across 4 domains.}
}

Endnote

%0 Conference Paper
%T Hyperspherical Normalization for Scalable Deep Reinforcement Learning
%A Hojoon Lee
%A Youngdo Lee
%A Takuma Seno
%A Donghu Kim
%A Peter Stone
%A Jaegul Choo
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-lee25u
%I PMLR
%P 33352--33403
%U https://proceedings.mlr.press/v267/lee25u.html
%V 267
%X Scaling up the model size and computation has brought consistent performance improvements in supervised learning. However, this lesson often fails to apply to reinforcement learning (RL) because training the model on non-stationary data easily leads to overfitting and unstable optimization. In response, we introduce SimbaV2, a novel RL architecture designed to stabilize optimization by (i) constraining the growth of weight and feature norm by hyperspherical normalization; and (ii) using a distributional value estimation with reward scaling to maintain stable gradients under varying reward magnitudes. Using the soft actor-critic as a base algorithm, SimbaV2 scales up effectively with larger models and greater compute, achieving state-of-the-art performance on 57 continuous control tasks across 4 domains.

APA

Lee, H., Lee, Y., Seno, T., Kim, D., Stone, P. & Choo, J.. (2025). Hyperspherical Normalization for Scalable Deep Reinforcement Learning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:33352-33403 Available from https://proceedings.mlr.press/v267/lee25u.html.

Hyperspherical Normalization for Scalable Deep Reinforcement Learning

Abstract

Cite this Paper

Related Material