Learning of Non-Parametric Control Policies with High-Dimensional State Features

Herke Van Hoof; Jan Peters; Gerhard Neumann

Learning of Non-Parametric Control Policies with High-Dimensional State Features

Herke Van Hoof, Jan Peters, Gerhard Neumann

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR 38:995-1003, 2015.

Abstract

Learning complex control policies from high-dimensional sensory input is a challenge for reinforcement learning algorithms. Kernel methods that approximate values functions or transition models can address this problem. Yet, many current approaches rely on instable greedy maximization. In this paper, we develop a policy search algorithm that integrates robust policy updates and kernel embeddings. Our method can learn non-parametric control policies for infinite horizon continuous MDPs with high-dimensional sensory representations. We show that our method outperforms related approaches, and that our algorithm can learn an underpowered swing-up task task directly from high-dimensional image data.

Cite this Paper

BibTeX


@InProceedings{pmlr-v38-vanhoof15,
  title = 	 {{Learning of Non-Parametric Control Policies with High-Dimensional State Features}},
  author = 	 {Van Hoof, Herke and Peters, Jan and Neumann, Gerhard},
  booktitle = 	 {Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {995--1003},
  year = 	 {2015},
  editor = 	 {Lebanon, Guy and Vishwanathan, S. V. N.},
  volume = 	 {38},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {San Diego, California, USA},
  month = 	 {09--12 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v38/vanhoof15.pdf},
  url = 	 {https://proceedings.mlr.press/v38/vanhoof15.html},
  abstract = 	 {Learning complex control policies from high-dimensional sensory input is a challenge for reinforcement learning algorithms. Kernel methods that approximate values functions or transition models can address this problem. Yet, many current approaches rely on instable greedy maximization. In this paper, we develop a policy search algorithm  that integrates robust policy updates and kernel embeddings. Our method can learn non-parametric control policies for infinite horizon continuous MDPs with high-dimensional sensory representations. We show that our method outperforms related approaches, and that our algorithm can learn an underpowered swing-up task task directly from high-dimensional image data.}
}

Endnote

%0 Conference Paper
%T Learning of Non-Parametric Control Policies with High-Dimensional State Features
%A Herke Van Hoof
%A Jan Peters
%A Gerhard Neumann
%B Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2015
%E Guy Lebanon
%E S. V. N. Vishwanathan	
%F pmlr-v38-vanhoof15
%I PMLR
%P 995--1003
%U https://proceedings.mlr.press/v38/vanhoof15.html
%V 38
%X Learning complex control policies from high-dimensional sensory input is a challenge for reinforcement learning algorithms. Kernel methods that approximate values functions or transition models can address this problem. Yet, many current approaches rely on instable greedy maximization. In this paper, we develop a policy search algorithm  that integrates robust policy updates and kernel embeddings. Our method can learn non-parametric control policies for infinite horizon continuous MDPs with high-dimensional sensory representations. We show that our method outperforms related approaches, and that our algorithm can learn an underpowered swing-up task task directly from high-dimensional image data.

RIS


TY  - CPAPER
TI  - Learning of Non-Parametric Control Policies with High-Dimensional State Features
AU  - Herke Van Hoof
AU  - Jan Peters
AU  - Gerhard Neumann
BT  - Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
DA  - 2015/02/21
ED  - Guy Lebanon
ED  - S. V. N. Vishwanathan	
ID  - pmlr-v38-vanhoof15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 38
SP  - 995
EP  - 1003
L1  - http://proceedings.mlr.press/v38/vanhoof15.pdf
UR  - https://proceedings.mlr.press/v38/vanhoof15.html
AB  - Learning complex control policies from high-dimensional sensory input is a challenge for reinforcement learning algorithms. Kernel methods that approximate values functions or transition models can address this problem. Yet, many current approaches rely on instable greedy maximization. In this paper, we develop a policy search algorithm  that integrates robust policy updates and kernel embeddings. Our method can learn non-parametric control policies for infinite horizon continuous MDPs with high-dimensional sensory representations. We show that our method outperforms related approaches, and that our algorithm can learn an underpowered swing-up task task directly from high-dimensional image data.
ER  -

APA


Van Hoof, H., Peters, J. & Neumann, G.. (2015). Learning of Non-Parametric Control Policies with High-Dimensional State Features. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 38:995-1003 Available from https://proceedings.mlr.press/v38/vanhoof15.html.

Learning of Non-Parametric Control Policies with High-Dimensional State Features

Abstract

Cite this Paper

Related Material