Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision

Johan Björck, Xiangyu Chen, Christopher De Sa, Carla P Gomes, Kilian Weinberger
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:980-991, 2021.

Abstract

Low-precision training has become a popular approach to reduce compute requirements, memory footprint, and energy consumption in supervised learning. In contrast, this promising approach has not yet enjoyed similarly widespread adoption within the reinforcement learning (RL) community, partly because RL agents can be notoriously hard to train even in full precision. In this paper we consider continuous control with the state-of-the-art SAC agent and demonstrate that a naïve adaptation of low-precision methods from supervised learning fails. We propose a set of six modifications, all straightforward to implement, that leaves the underlying agent and its hyperparameters unchanged but improves the numerical stability dramatically. The resulting modified SAC agent has lower memory and compute requirements while matching full-precision rewards, demonstrating that low-precision training can substantially accelerate state-of-the-art RL without parameter tuning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-bjorck21a, title = {Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision}, author = {Bj{\"o}rck, Johan and Chen, Xiangyu and De Sa, Christopher and Gomes, Carla P and Weinberger, Kilian}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {980--991}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/bjorck21a/bjorck21a.pdf}, url = {http://proceedings.mlr.press/v139/bjorck21a.html}, abstract = {Low-precision training has become a popular approach to reduce compute requirements, memory footprint, and energy consumption in supervised learning. In contrast, this promising approach has not yet enjoyed similarly widespread adoption within the reinforcement learning (RL) community, partly because RL agents can be notoriously hard to train even in full precision. In this paper we consider continuous control with the state-of-the-art SAC agent and demonstrate that a naïve adaptation of low-precision methods from supervised learning fails. We propose a set of six modifications, all straightforward to implement, that leaves the underlying agent and its hyperparameters unchanged but improves the numerical stability dramatically. The resulting modified SAC agent has lower memory and compute requirements while matching full-precision rewards, demonstrating that low-precision training can substantially accelerate state-of-the-art RL without parameter tuning.} }
Endnote
%0 Conference Paper %T Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision %A Johan Björck %A Xiangyu Chen %A Christopher De Sa %A Carla P Gomes %A Kilian Weinberger %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-bjorck21a %I PMLR %P 980--991 %U http://proceedings.mlr.press/v139/bjorck21a.html %V 139 %X Low-precision training has become a popular approach to reduce compute requirements, memory footprint, and energy consumption in supervised learning. In contrast, this promising approach has not yet enjoyed similarly widespread adoption within the reinforcement learning (RL) community, partly because RL agents can be notoriously hard to train even in full precision. In this paper we consider continuous control with the state-of-the-art SAC agent and demonstrate that a naïve adaptation of low-precision methods from supervised learning fails. We propose a set of six modifications, all straightforward to implement, that leaves the underlying agent and its hyperparameters unchanged but improves the numerical stability dramatically. The resulting modified SAC agent has lower memory and compute requirements while matching full-precision rewards, demonstrating that low-precision training can substantially accelerate state-of-the-art RL without parameter tuning.
APA
Björck, J., Chen, X., De Sa, C., Gomes, C.P. & Weinberger, K.. (2021). Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:980-991 Available from http://proceedings.mlr.press/v139/bjorck21a.html.

Related Material