Online Relative Entropy Policy Search using Reproducing Kernel Hilbert Space Embeddings

Zhitang Chen, Pascal Poupart, Yanhui Geng
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:573-581, 2016.

Abstract

Kernel methods have been successfully applied to reinforcement learning problems to address some challenges such as high dimensional and continuous states, value function approximation and state transition probability modeling. In this paper, we develop an online policy search algorithm based on a recent state-of-the-art algorithm REPS-RKHS that uses conditional kernel embeddings. Our online algorithm inherits the advantages of REPS-RKHS, including the ability to learn non-parametric control policies for infinite horizon continuous MDPs with high- dimensional sensory representations. Different from the original REPS-RKHS algorithm which is based on batch learning, the proposed online algorithm updates the model in an online fashion and thus is able to capture and respond to rapid changes in the system dynamics. In addition, the online update operation takes constant time (i.e., independent of the sample size n), which is much more efficient computationally and allows the policy to be continuously revised. Experiments on different domains are conducted and results show that our online algorithm outperforms the original algorithm.

Cite this Paper


BibTeX
@InProceedings{pmlr-v51-chen16a, title = {Online Relative Entropy Policy Search using Reproducing Kernel Hilbert Space Embeddings}, author = {Chen, Zhitang and Poupart, Pascal and Geng, Yanhui}, booktitle = {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics}, pages = {573--581}, year = {2016}, editor = {Gretton, Arthur and Robert, Christian C.}, volume = {51}, series = {Proceedings of Machine Learning Research}, address = {Cadiz, Spain}, month = {09--11 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v51/chen16a.pdf}, url = {https://proceedings.mlr.press/v51/chen16a.html}, abstract = {Kernel methods have been successfully applied to reinforcement learning problems to address some challenges such as high dimensional and continuous states, value function approximation and state transition probability modeling. In this paper, we develop an online policy search algorithm based on a recent state-of-the-art algorithm REPS-RKHS that uses conditional kernel embeddings. Our online algorithm inherits the advantages of REPS-RKHS, including the ability to learn non-parametric control policies for infinite horizon continuous MDPs with high- dimensional sensory representations. Different from the original REPS-RKHS algorithm which is based on batch learning, the proposed online algorithm updates the model in an online fashion and thus is able to capture and respond to rapid changes in the system dynamics. In addition, the online update operation takes constant time (i.e., independent of the sample size n), which is much more efficient computationally and allows the policy to be continuously revised. Experiments on different domains are conducted and results show that our online algorithm outperforms the original algorithm.} }
Endnote
%0 Conference Paper %T Online Relative Entropy Policy Search using Reproducing Kernel Hilbert Space Embeddings %A Zhitang Chen %A Pascal Poupart %A Yanhui Geng %B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2016 %E Arthur Gretton %E Christian C. Robert %F pmlr-v51-chen16a %I PMLR %P 573--581 %U https://proceedings.mlr.press/v51/chen16a.html %V 51 %X Kernel methods have been successfully applied to reinforcement learning problems to address some challenges such as high dimensional and continuous states, value function approximation and state transition probability modeling. In this paper, we develop an online policy search algorithm based on a recent state-of-the-art algorithm REPS-RKHS that uses conditional kernel embeddings. Our online algorithm inherits the advantages of REPS-RKHS, including the ability to learn non-parametric control policies for infinite horizon continuous MDPs with high- dimensional sensory representations. Different from the original REPS-RKHS algorithm which is based on batch learning, the proposed online algorithm updates the model in an online fashion and thus is able to capture and respond to rapid changes in the system dynamics. In addition, the online update operation takes constant time (i.e., independent of the sample size n), which is much more efficient computationally and allows the policy to be continuously revised. Experiments on different domains are conducted and results show that our online algorithm outperforms the original algorithm.
RIS
TY - CPAPER TI - Online Relative Entropy Policy Search using Reproducing Kernel Hilbert Space Embeddings AU - Zhitang Chen AU - Pascal Poupart AU - Yanhui Geng BT - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics DA - 2016/05/02 ED - Arthur Gretton ED - Christian C. Robert ID - pmlr-v51-chen16a PB - PMLR DP - Proceedings of Machine Learning Research VL - 51 SP - 573 EP - 581 L1 - http://proceedings.mlr.press/v51/chen16a.pdf UR - https://proceedings.mlr.press/v51/chen16a.html AB - Kernel methods have been successfully applied to reinforcement learning problems to address some challenges such as high dimensional and continuous states, value function approximation and state transition probability modeling. In this paper, we develop an online policy search algorithm based on a recent state-of-the-art algorithm REPS-RKHS that uses conditional kernel embeddings. Our online algorithm inherits the advantages of REPS-RKHS, including the ability to learn non-parametric control policies for infinite horizon continuous MDPs with high- dimensional sensory representations. Different from the original REPS-RKHS algorithm which is based on batch learning, the proposed online algorithm updates the model in an online fashion and thus is able to capture and respond to rapid changes in the system dynamics. In addition, the online update operation takes constant time (i.e., independent of the sample size n), which is much more efficient computationally and allows the policy to be continuously revised. Experiments on different domains are conducted and results show that our online algorithm outperforms the original algorithm. ER -
APA
Chen, Z., Poupart, P. & Geng, Y.. (2016). Online Relative Entropy Policy Search using Reproducing Kernel Hilbert Space Embeddings. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:573-581 Available from https://proceedings.mlr.press/v51/chen16a.html.

Related Material