Momentum in Reinforcement Learning

Nino Vieillard, Bruno Scherrer, Olivier Pietquin, Matthieu Geist
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:2529-2538, 2020.

Abstract

We adapt the optimization’s concept of momentum to reinforcement learning. Seeing the state-action value functions as an anlog to the gradients in optimization, we interpret momentum as an average of consecutive $q$-functions. We derive Momentum Value Iteration (MoVI), a variation of Value iteration that incorporates this momentum idea. Our analysis shows that this allows MoVI to average errors over successive iterations. We show that the proposed approach can be readily extended to deep learning. Specifically,we propose a simple improvement on DQN based on MoVI, and experiment it on Atari games.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-vieillard20a, title = {Momentum in Reinforcement Learning}, author = {Vieillard, Nino and Scherrer, Bruno and Pietquin, Olivier and Geist, Matthieu}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {2529--2538}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/vieillard20a/vieillard20a.pdf}, url = {https://proceedings.mlr.press/v108/vieillard20a.html}, abstract = {We adapt the optimization’s concept of momentum to reinforcement learning. Seeing the state-action value functions as an anlog to the gradients in optimization, we interpret momentum as an average of consecutive $q$-functions. We derive Momentum Value Iteration (MoVI), a variation of Value iteration that incorporates this momentum idea. Our analysis shows that this allows MoVI to average errors over successive iterations. We show that the proposed approach can be readily extended to deep learning. Specifically,we propose a simple improvement on DQN based on MoVI, and experiment it on Atari games.} }
Endnote
%0 Conference Paper %T Momentum in Reinforcement Learning %A Nino Vieillard %A Bruno Scherrer %A Olivier Pietquin %A Matthieu Geist %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-vieillard20a %I PMLR %P 2529--2538 %U https://proceedings.mlr.press/v108/vieillard20a.html %V 108 %X We adapt the optimization’s concept of momentum to reinforcement learning. Seeing the state-action value functions as an anlog to the gradients in optimization, we interpret momentum as an average of consecutive $q$-functions. We derive Momentum Value Iteration (MoVI), a variation of Value iteration that incorporates this momentum idea. Our analysis shows that this allows MoVI to average errors over successive iterations. We show that the proposed approach can be readily extended to deep learning. Specifically,we propose a simple improvement on DQN based on MoVI, and experiment it on Atari games.
APA
Vieillard, N., Scherrer, B., Pietquin, O. & Geist, M.. (2020). Momentum in Reinforcement Learning. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:2529-2538 Available from https://proceedings.mlr.press/v108/vieillard20a.html.

Related Material