Online Relative Entropy Policy Search using Reproducing Kernel Hilbert Space Embeddings

Zhitang Chen; Pascal Poupart; Yanhui Geng

Online Relative Entropy Policy Search using Reproducing Kernel Hilbert Space Embeddings

Zhitang Chen, Pascal Poupart, Yanhui Geng

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:573-581, 2016.

Abstract

Kernel methods have been successfully applied to reinforcement learning problems to address some challenges such as high dimensional and continuous states, value function approximation and state transition probability modeling. In this paper, we develop an online policy search algorithm based on a recent state-of-the-art algorithm REPS-RKHS that uses conditional kernel embeddings. Our online algorithm inherits the advantages of REPS-RKHS, including the ability to learn non-parametric control policies for infinite horizon continuous MDPs with high- dimensional sensory representations. Different from the original REPS-RKHS algorithm which is based on batch learning, the proposed online algorithm updates the model in an online fashion and thus is able to capture and respond to rapid changes in the system dynamics. In addition, the online update operation takes constant time (i.e., independent of the sample size n), which is much more efficient computationally and allows the policy to be continuously revised. Experiments on different domains are conducted and results show that our online algorithm outperforms the original algorithm.

Cite this Paper

BibTeX


@InProceedings{pmlr-v51-chen16a,
  title = 	 {Online Relative Entropy Policy Search using Reproducing Kernel Hilbert Space Embeddings},
  author = 	 {Chen, Zhitang and Poupart, Pascal and Geng, Yanhui},
  booktitle = 	 {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {573--581},
  year = 	 {2016},
  editor = 	 {Gretton, Arthur and Robert, Christian C.},
  volume = 	 {51},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Cadiz, Spain},
  month = 	 {09--11 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v51/chen16a.pdf},
  url = 	 {https://proceedings.mlr.press/v51/chen16a.html},
  abstract = 	 {Kernel methods have been successfully applied to reinforcement learning problems to address some challenges such as high dimensional and continuous states, value function approximation and state transition probability modeling. In this paper, we develop an online policy search algorithm based on a recent state-of-the-art algorithm REPS-RKHS that uses conditional kernel embeddings. Our online algorithm inherits the advantages of REPS-RKHS, including the ability to learn non-parametric control policies for infinite horizon continuous MDPs with high- dimensional sensory representations. Different from the original REPS-RKHS algorithm which is based on batch learning, the proposed online algorithm updates the model in an online fashion and thus is able to capture and respond to rapid changes in the system dynamics. In addition, the online update operation takes constant time (i.e., independent of the sample size n), which is much more efficient computationally and allows the policy to be continuously revised. Experiments on different domains are conducted and results show that our online algorithm outperforms the original algorithm.}
}

Endnote

%0 Conference Paper
%T Online Relative Entropy Policy Search using Reproducing Kernel Hilbert Space Embeddings
%A Zhitang Chen
%A Pascal Poupart
%A Yanhui Geng
%B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2016
%E Arthur Gretton
%E Christian C. Robert	
%F pmlr-v51-chen16a
%I PMLR
%P 573--581
%U https://proceedings.mlr.press/v51/chen16a.html
%V 51
%X Kernel methods have been successfully applied to reinforcement learning problems to address some challenges such as high dimensional and continuous states, value function approximation and state transition probability modeling. In this paper, we develop an online policy search algorithm based on a recent state-of-the-art algorithm REPS-RKHS that uses conditional kernel embeddings. Our online algorithm inherits the advantages of REPS-RKHS, including the ability to learn non-parametric control policies for infinite horizon continuous MDPs with high- dimensional sensory representations. Different from the original REPS-RKHS algorithm which is based on batch learning, the proposed online algorithm updates the model in an online fashion and thus is able to capture and respond to rapid changes in the system dynamics. In addition, the online update operation takes constant time (i.e., independent of the sample size n), which is much more efficient computationally and allows the policy to be continuously revised. Experiments on different domains are conducted and results show that our online algorithm outperforms the original algorithm.

RIS


TY  - CPAPER
TI  - Online Relative Entropy Policy Search using Reproducing Kernel Hilbert Space Embeddings
AU  - Zhitang Chen
AU  - Pascal Poupart
AU  - Yanhui Geng
BT  - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
DA  - 2016/05/02
ED  - Arthur Gretton
ED  - Christian C. Robert	
ID  - pmlr-v51-chen16a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 51
SP  - 573
EP  - 581
L1  - http://proceedings.mlr.press/v51/chen16a.pdf
UR  - https://proceedings.mlr.press/v51/chen16a.html
AB  - Kernel methods have been successfully applied to reinforcement learning problems to address some challenges such as high dimensional and continuous states, value function approximation and state transition probability modeling. In this paper, we develop an online policy search algorithm based on a recent state-of-the-art algorithm REPS-RKHS that uses conditional kernel embeddings. Our online algorithm inherits the advantages of REPS-RKHS, including the ability to learn non-parametric control policies for infinite horizon continuous MDPs with high- dimensional sensory representations. Different from the original REPS-RKHS algorithm which is based on batch learning, the proposed online algorithm updates the model in an online fashion and thus is able to capture and respond to rapid changes in the system dynamics. In addition, the online update operation takes constant time (i.e., independent of the sample size n), which is much more efficient computationally and allows the policy to be continuously revised. Experiments on different domains are conducted and results show that our online algorithm outperforms the original algorithm.
ER  -

APA


Chen, Z., Poupart, P. & Geng, Y.. (2016). Online Relative Entropy Policy Search using Reproducing Kernel Hilbert Space Embeddings. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:573-581 Available from https://proceedings.mlr.press/v51/chen16a.html.

Related Material

Download PDF