Deep Coherent Exploration for Continuous Control

Yijie Zhang; Herke Van Hoof

Deep Coherent Exploration for Continuous Control

Yijie Zhang, Herke Van Hoof

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:12567-12577, 2021.

Abstract

In policy search methods for reinforcement learning (RL), exploration is often performed by injecting noise either in action space at each step independently or in parameter space over each full trajectory. In prior work, it has been shown that with linear policies, a more balanced trade-off between these two exploration strategies is beneficial. However, that method did not scale to policies using deep neural networks. In this paper, we introduce deep coherent exploration, a general and scalable exploration framework for deep RL algorithms for continuous control, that generalizes step-based and trajectory-based exploration. This framework models the last layer parameters of the policy network as latent variables and uses a recursive inference step within the policy update to handle these latent variables in a scalable manner. We find that deep coherent exploration improves the speed and stability of learning of A2C, PPO, and SAC on several continuous control tasks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v139-zhang21t,
  title = 	 {Deep Coherent Exploration for Continuous Control},
  author =       {Zhang, Yijie and Van Hoof, Herke},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {12567--12577},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/zhang21t/zhang21t.pdf},
  url = 	 {https://proceedings.mlr.press/v139/zhang21t.html},
  abstract = 	 {In policy search methods for reinforcement learning (RL), exploration is often performed by injecting noise either in action space at each step independently or in parameter space over each full trajectory. In prior work, it has been shown that with linear policies, a more balanced trade-off between these two exploration strategies is beneficial. However, that method did not scale to policies using deep neural networks. In this paper, we introduce deep coherent exploration, a general and scalable exploration framework for deep RL algorithms for continuous control, that generalizes step-based and trajectory-based exploration. This framework models the last layer parameters of the policy network as latent variables and uses a recursive inference step within the policy update to handle these latent variables in a scalable manner. We find that deep coherent exploration improves the speed and stability of learning of A2C, PPO, and SAC on several continuous control tasks.}
}

Endnote

%0 Conference Paper
%T Deep Coherent Exploration for Continuous Control
%A Yijie Zhang
%A Herke Van Hoof
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-zhang21t
%I PMLR
%P 12567--12577
%U https://proceedings.mlr.press/v139/zhang21t.html
%V 139
%X In policy search methods for reinforcement learning (RL), exploration is often performed by injecting noise either in action space at each step independently or in parameter space over each full trajectory. In prior work, it has been shown that with linear policies, a more balanced trade-off between these two exploration strategies is beneficial. However, that method did not scale to policies using deep neural networks. In this paper, we introduce deep coherent exploration, a general and scalable exploration framework for deep RL algorithms for continuous control, that generalizes step-based and trajectory-based exploration. This framework models the last layer parameters of the policy network as latent variables and uses a recursive inference step within the policy update to handle these latent variables in a scalable manner. We find that deep coherent exploration improves the speed and stability of learning of A2C, PPO, and SAC on several continuous control tasks.

APA


Zhang, Y. & Van Hoof, H.. (2021). Deep Coherent Exploration for Continuous Control. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:12567-12577 Available from https://proceedings.mlr.press/v139/zhang21t.html.

Deep Coherent Exploration for Continuous Control

Abstract

Cite this Paper

Related Material