Implicit Generative Modeling for Efficient Exploration

Neale Ratzlaff, Qinxun Bai, Li Fuxin, Wei Xu
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:7985-7995, 2020.

Abstract

Efficient exploration remains a challenging problem in reinforcement learning, especially for those tasks where rewards from environments are sparse. In this work, we introduce an exploration approach based on a novel implicit generative modeling algorithm to estimate a Bayesian uncertainty of the agent’s belief of the environment dynamics. Each random draw from our generative model is a neural network that instantiates the dynamic function, hence multiple draws would approximate the posterior, and the variance in the predictions based on this posterior is used as an intrinsic reward for exploration. We design a training algorithm for our generative model based on the amortized Stein Variational Gradient Descent. In experiments, we demonstrate the effectiveness of this exploration algorithm in both pure exploration tasks and a downstream task, comparing with state-of-the-art intrinsic reward-based exploration approaches, including two recent approaches based on an ensemble of dynamic models. In challenging exploration tasks, our implicit generative model consistently outperforms competing approaches regarding data efficiency in exploration.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-ratzlaff20a, title = {Implicit Generative Modeling for Efficient Exploration}, author = {Ratzlaff, Neale and Bai, Qinxun and Fuxin, Li and Xu, Wei}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {7985--7995}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/ratzlaff20a/ratzlaff20a.pdf}, url = {https://proceedings.mlr.press/v119/ratzlaff20a.html}, abstract = {Efficient exploration remains a challenging problem in reinforcement learning, especially for those tasks where rewards from environments are sparse. In this work, we introduce an exploration approach based on a novel implicit generative modeling algorithm to estimate a Bayesian uncertainty of the agent’s belief of the environment dynamics. Each random draw from our generative model is a neural network that instantiates the dynamic function, hence multiple draws would approximate the posterior, and the variance in the predictions based on this posterior is used as an intrinsic reward for exploration. We design a training algorithm for our generative model based on the amortized Stein Variational Gradient Descent. In experiments, we demonstrate the effectiveness of this exploration algorithm in both pure exploration tasks and a downstream task, comparing with state-of-the-art intrinsic reward-based exploration approaches, including two recent approaches based on an ensemble of dynamic models. In challenging exploration tasks, our implicit generative model consistently outperforms competing approaches regarding data efficiency in exploration.} }
Endnote
%0 Conference Paper %T Implicit Generative Modeling for Efficient Exploration %A Neale Ratzlaff %A Qinxun Bai %A Li Fuxin %A Wei Xu %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-ratzlaff20a %I PMLR %P 7985--7995 %U https://proceedings.mlr.press/v119/ratzlaff20a.html %V 119 %X Efficient exploration remains a challenging problem in reinforcement learning, especially for those tasks where rewards from environments are sparse. In this work, we introduce an exploration approach based on a novel implicit generative modeling algorithm to estimate a Bayesian uncertainty of the agent’s belief of the environment dynamics. Each random draw from our generative model is a neural network that instantiates the dynamic function, hence multiple draws would approximate the posterior, and the variance in the predictions based on this posterior is used as an intrinsic reward for exploration. We design a training algorithm for our generative model based on the amortized Stein Variational Gradient Descent. In experiments, we demonstrate the effectiveness of this exploration algorithm in both pure exploration tasks and a downstream task, comparing with state-of-the-art intrinsic reward-based exploration approaches, including two recent approaches based on an ensemble of dynamic models. In challenging exploration tasks, our implicit generative model consistently outperforms competing approaches regarding data efficiency in exploration.
APA
Ratzlaff, N., Bai, Q., Fuxin, L. & Xu, W.. (2020). Implicit Generative Modeling for Efficient Exploration. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:7985-7995 Available from https://proceedings.mlr.press/v119/ratzlaff20a.html.

Related Material