Modelling Policies in MDPs in Reproducing Kernel Hilbert Space

Guy Lever; Ronnie Stafford

Modelling Policies in MDPs in Reproducing Kernel Hilbert Space

Guy Lever, Ronnie Stafford

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR 38:590-598, 2015.

Abstract

We consider modelling policies for MDPs in (vector-valued) reproducing kernel Hilbert function spaces (RKHS). This enables us to work “non-parametrically” in a rich function class, and provides the ability to learn complex policies. We present a framework for performing gradient-based policy optimization in the RKHS, deriving the functional gradient of the return for our policy, which has a simple form and can be estimated efficiently. The policy representation naturally focuses on the relevant region of state space defined by the policy trajectories, and does not rely on a-priori defined basis points; this can be an advantage in high dimensions where suitable basis points may be difficult to define a-priori. The method is adaptive in the sense that the policy representation will naturally adapt to the complexity of the policy being modelled, which is achieved with standard efficient sparsification tools in an RKHS. We argue that finding a good kernel on states can be easier then remetrizing a high dimensional feature space. We demonstrate the approach on benchmark domains and a simulated quadrocopter navigation task.

Cite this Paper

BibTeX


@InProceedings{pmlr-v38-lever15,
  title = 	 {{Modelling Policies in MDPs in Reproducing Kernel Hilbert Space}},
  author = 	 {Lever, Guy and Stafford, Ronnie},
  booktitle = 	 {Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {590--598},
  year = 	 {2015},
  editor = 	 {Lebanon, Guy and Vishwanathan, S. V. N.},
  volume = 	 {38},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {San Diego, California, USA},
  month = 	 {09--12 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v38/lever15.pdf},
  url = 	 {https://proceedings.mlr.press/v38/lever15.html},
  abstract = 	 {We consider modelling policies for MDPs in (vector-valued) reproducing kernel Hilbert function spaces (RKHS). This enables us to work “non-parametrically” in a rich function class, and provides the ability to learn complex policies. We present a framework for performing gradient-based policy optimization in the RKHS, deriving the functional gradient of the return for our policy, which has a simple form and can be estimated efficiently. The policy representation naturally focuses on the relevant region of state space defined by the policy trajectories, and does not rely on a-priori defined basis points; this can be an advantage in high dimensions where suitable basis points may be difficult to define a-priori. The method is adaptive in the sense that the policy representation will naturally adapt to the complexity of the policy being modelled, which is achieved with standard efficient sparsification tools in an RKHS. We argue that finding a good kernel on states can be easier then remetrizing a high dimensional feature space. We demonstrate the approach on benchmark domains and a simulated quadrocopter navigation task.}
}

Endnote

%0 Conference Paper
%T Modelling Policies in MDPs in Reproducing Kernel Hilbert Space
%A Guy Lever
%A Ronnie Stafford
%B Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2015
%E Guy Lebanon
%E S. V. N. Vishwanathan	
%F pmlr-v38-lever15
%I PMLR
%P 590--598
%U https://proceedings.mlr.press/v38/lever15.html
%V 38
%X We consider modelling policies for MDPs in (vector-valued) reproducing kernel Hilbert function spaces (RKHS). This enables us to work “non-parametrically” in a rich function class, and provides the ability to learn complex policies. We present a framework for performing gradient-based policy optimization in the RKHS, deriving the functional gradient of the return for our policy, which has a simple form and can be estimated efficiently. The policy representation naturally focuses on the relevant region of state space defined by the policy trajectories, and does not rely on a-priori defined basis points; this can be an advantage in high dimensions where suitable basis points may be difficult to define a-priori. The method is adaptive in the sense that the policy representation will naturally adapt to the complexity of the policy being modelled, which is achieved with standard efficient sparsification tools in an RKHS. We argue that finding a good kernel on states can be easier then remetrizing a high dimensional feature space. We demonstrate the approach on benchmark domains and a simulated quadrocopter navigation task.

RIS


TY  - CPAPER
TI  - Modelling Policies in MDPs in Reproducing Kernel Hilbert Space
AU  - Guy Lever
AU  - Ronnie Stafford
BT  - Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
DA  - 2015/02/21
ED  - Guy Lebanon
ED  - S. V. N. Vishwanathan	
ID  - pmlr-v38-lever15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 38
SP  - 590
EP  - 598
L1  - http://proceedings.mlr.press/v38/lever15.pdf
UR  - https://proceedings.mlr.press/v38/lever15.html
AB  - We consider modelling policies for MDPs in (vector-valued) reproducing kernel Hilbert function spaces (RKHS). This enables us to work “non-parametrically” in a rich function class, and provides the ability to learn complex policies. We present a framework for performing gradient-based policy optimization in the RKHS, deriving the functional gradient of the return for our policy, which has a simple form and can be estimated efficiently. The policy representation naturally focuses on the relevant region of state space defined by the policy trajectories, and does not rely on a-priori defined basis points; this can be an advantage in high dimensions where suitable basis points may be difficult to define a-priori. The method is adaptive in the sense that the policy representation will naturally adapt to the complexity of the policy being modelled, which is achieved with standard efficient sparsification tools in an RKHS. We argue that finding a good kernel on states can be easier then remetrizing a high dimensional feature space. We demonstrate the approach on benchmark domains and a simulated quadrocopter navigation task.
ER  -

APA


Lever, G. & Stafford, R.. (2015). Modelling Policies in MDPs in Reproducing Kernel Hilbert Space. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 38:590-598 Available from https://proceedings.mlr.press/v38/lever15.html.

Modelling Policies in MDPs in Reproducing Kernel Hilbert Space

Abstract

Cite this Paper

Related Material