Regret Minimization for Partially Observable Deep Reinforcement Learning

Peter Jin, Kurt Keutzer, Sergey Levine
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2342-2351, 2018.

Abstract

Deep reinforcement learning algorithms that estimate state and state-action value functions have been shown to be effective in a variety of challenging domains, including learning control strategies from raw image pixels. However, algorithms that estimate state and state-action value functions typically assume a fully observed state and must compensate for partial observations by using finite length observation histories or recurrent networks. In this work, we propose a new deep reinforcement learning algorithm based on counterfactual regret minimization that iteratively updates an approximation to an advantage-like function and is robust to partially observed state. We demonstrate that this new algorithm can substantially outperform strong baseline methods on several partially observed reinforcement learning tasks: learning first-person 3D navigation in Doom and Minecraft, and acting in the presence of partially observed objects in Doom and Pong.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-jin18c, title = {Regret Minimization for Partially Observable Deep Reinforcement Learning}, author = {Jin, Peter and Keutzer, Kurt and Levine, Sergey}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {2342--2351}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/jin18c/jin18c.pdf}, url = {https://proceedings.mlr.press/v80/jin18c.html}, abstract = {Deep reinforcement learning algorithms that estimate state and state-action value functions have been shown to be effective in a variety of challenging domains, including learning control strategies from raw image pixels. However, algorithms that estimate state and state-action value functions typically assume a fully observed state and must compensate for partial observations by using finite length observation histories or recurrent networks. In this work, we propose a new deep reinforcement learning algorithm based on counterfactual regret minimization that iteratively updates an approximation to an advantage-like function and is robust to partially observed state. We demonstrate that this new algorithm can substantially outperform strong baseline methods on several partially observed reinforcement learning tasks: learning first-person 3D navigation in Doom and Minecraft, and acting in the presence of partially observed objects in Doom and Pong.} }
Endnote
%0 Conference Paper %T Regret Minimization for Partially Observable Deep Reinforcement Learning %A Peter Jin %A Kurt Keutzer %A Sergey Levine %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-jin18c %I PMLR %P 2342--2351 %U https://proceedings.mlr.press/v80/jin18c.html %V 80 %X Deep reinforcement learning algorithms that estimate state and state-action value functions have been shown to be effective in a variety of challenging domains, including learning control strategies from raw image pixels. However, algorithms that estimate state and state-action value functions typically assume a fully observed state and must compensate for partial observations by using finite length observation histories or recurrent networks. In this work, we propose a new deep reinforcement learning algorithm based on counterfactual regret minimization that iteratively updates an approximation to an advantage-like function and is robust to partially observed state. We demonstrate that this new algorithm can substantially outperform strong baseline methods on several partially observed reinforcement learning tasks: learning first-person 3D navigation in Doom and Minecraft, and acting in the presence of partially observed objects in Doom and Pong.
APA
Jin, P., Keutzer, K. & Levine, S.. (2018). Regret Minimization for Partially Observable Deep Reinforcement Learning. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:2342-2351 Available from https://proceedings.mlr.press/v80/jin18c.html.

Related Material