Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning

Yunhao Tang; Alp Kucukelbir

Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning

Yunhao Tang, Alp Kucukelbir

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:2863-2871, 2021.

Abstract

We propose a graphical model framework for goal-conditioned RL, with an EM algorithm that operates on the lower bound of the RL objective. The E-step provides a natural interpretation of how ’learning in hindsight’ techniques, such as HER, to handle extremely sparse goal-conditioned rewards. The M-step reduces policy optimization to supervised learning updates, which greatly stabilizes end-to-end training on high-dimensional inputs such as images. We show that the combined algorithm, hEM significantly outperforms model-free baselines on a wide range of goal-conditioned benchmarks with sparse rewards.

Cite this Paper

BibTeX

@InProceedings{pmlr-v130-tang21b,
  title = 	 { Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning },
  author =       {Tang, Yunhao and Kucukelbir, Alp},
  booktitle = 	 {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2863--2871},
  year = 	 {2021},
  editor = 	 {Banerjee, Arindam and Fukumizu, Kenji},
  volume = 	 {130},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--15 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v130/tang21b/tang21b.pdf},
  url = 	 {https://proceedings.mlr.press/v130/tang21b.html},
  abstract = 	 { We propose a graphical model framework for goal-conditioned RL, with an EM algorithm that operates on the lower bound of the RL objective. The E-step provides a natural interpretation of how ’learning in hindsight’ techniques, such as HER, to handle extremely sparse goal-conditioned rewards. The M-step reduces policy optimization to supervised learning updates, which greatly stabilizes end-to-end training on high-dimensional inputs such as images. We show that the combined algorithm, hEM significantly outperforms model-free baselines on a wide range of goal-conditioned benchmarks with sparse rewards. }
}

Endnote

%0 Conference Paper
%T  Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning 
%A Yunhao Tang
%A Alp Kucukelbir
%B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2021
%E Arindam Banerjee
%E Kenji Fukumizu	
%F pmlr-v130-tang21b
%I PMLR
%P 2863--2871
%U https://proceedings.mlr.press/v130/tang21b.html
%V 130
%X  We propose a graphical model framework for goal-conditioned RL, with an EM algorithm that operates on the lower bound of the RL objective. The E-step provides a natural interpretation of how ’learning in hindsight’ techniques, such as HER, to handle extremely sparse goal-conditioned rewards. The M-step reduces policy optimization to supervised learning updates, which greatly stabilizes end-to-end training on high-dimensional inputs such as images. We show that the combined algorithm, hEM significantly outperforms model-free baselines on a wide range of goal-conditioned benchmarks with sparse rewards.

APA

Tang, Y. & Kucukelbir, A.. (2021).  Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:2863-2871 Available from https://proceedings.mlr.press/v130/tang21b.html.

Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning

Abstract

Cite this Paper

Related Material