GMAC: A Distributional Perspective on Actor-Critic Framework

Daniel W Nam, Younghoon Kim, Chan Y Park
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:7927-7936, 2021.

Abstract

In this paper, we devise a distributional framework on actor-critic as a solution to distributional instability, action type restriction, and conflation between samples and statistics. We propose a new method that minimizes the Cram{é}r distance with the multi-step Bellman target distribution generated from a novel Sample-Replacement algorithm denoted SR(\lambda), which learns the correct value distribution under multiple Bellman operations. Parameterizing a value distribution with Gaussian Mixture Model further improves the efficiency and the performance of the method, which we name GMAC. We empirically show that GMAC captures the correct representation of value distributions and improves the performance of a conventional actor-critic method with low computational cost, in both discrete and continuous action spaces using Arcade Learning Environment (ALE) and PyBullet environment.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-nam21a, title = {GMAC: A Distributional Perspective on Actor-Critic Framework}, author = {Nam, Daniel W and Kim, Younghoon and Park, Chan Y}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {7927--7936}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/nam21a/nam21a.pdf}, url = {https://proceedings.mlr.press/v139/nam21a.html}, abstract = {In this paper, we devise a distributional framework on actor-critic as a solution to distributional instability, action type restriction, and conflation between samples and statistics. We propose a new method that minimizes the Cram{é}r distance with the multi-step Bellman target distribution generated from a novel Sample-Replacement algorithm denoted SR(\lambda), which learns the correct value distribution under multiple Bellman operations. Parameterizing a value distribution with Gaussian Mixture Model further improves the efficiency and the performance of the method, which we name GMAC. We empirically show that GMAC captures the correct representation of value distributions and improves the performance of a conventional actor-critic method with low computational cost, in both discrete and continuous action spaces using Arcade Learning Environment (ALE) and PyBullet environment.} }
Endnote
%0 Conference Paper %T GMAC: A Distributional Perspective on Actor-Critic Framework %A Daniel W Nam %A Younghoon Kim %A Chan Y Park %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-nam21a %I PMLR %P 7927--7936 %U https://proceedings.mlr.press/v139/nam21a.html %V 139 %X In this paper, we devise a distributional framework on actor-critic as a solution to distributional instability, action type restriction, and conflation between samples and statistics. We propose a new method that minimizes the Cram{é}r distance with the multi-step Bellman target distribution generated from a novel Sample-Replacement algorithm denoted SR(\lambda), which learns the correct value distribution under multiple Bellman operations. Parameterizing a value distribution with Gaussian Mixture Model further improves the efficiency and the performance of the method, which we name GMAC. We empirically show that GMAC captures the correct representation of value distributions and improves the performance of a conventional actor-critic method with low computational cost, in both discrete and continuous action spaces using Arcade Learning Environment (ALE) and PyBullet environment.
APA
Nam, D.W., Kim, Y. & Park, C.Y.. (2021). GMAC: A Distributional Perspective on Actor-Critic Framework. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:7927-7936 Available from https://proceedings.mlr.press/v139/nam21a.html.

Related Material