Sample Efficient Reinforcement Learning with Gaussian Processes

Robert Grande; Thomas Walsh; Jonathan How

Sample Efficient Reinforcement Learning with Gaussian Processes

Robert Grande, Thomas Walsh, Jonathan How

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):1332-1340, 2014.

Abstract

This paper derives sample complexity results for using Gaussian Processes (GPs) in both model-based and model-free reinforcement learning (RL). We show that GPs are KWIK learnable, proving for the first time that a model-based RL approach using GPs, GP-Rmax, is sample efficient (PAC-MDP). However, we then show that previous approaches to model-free RL using GPs take an exponential number of steps to find an optimal policy, and are therefore not sample efficient. The third and main contribution is the introduction of a model-free RL algorithm using GPs, DGPQ, which is sample efficient and, in contrast to model-based algorithms, capable of acting in real time, as demonstrated on a five-dimensional aircraft simulator.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-grande14,
  title = 	 {Sample Efficient Reinforcement Learning with Gaussian Processes},
  author = 	 {Grande, Robert and Walsh, Thomas and How, Jonathan},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {1332--1340},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/grande14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/grande14.html},
  abstract = 	 {This paper derives sample complexity results for using Gaussian Processes (GPs) in both model-based and model-free reinforcement learning (RL). We show that GPs are KWIK learnable, proving for the first time that a model-based RL approach using GPs, GP-Rmax, is sample efficient (PAC-MDP). However, we then show that previous approaches to model-free RL using GPs take an exponential number of steps to find an optimal policy, and are therefore not sample efficient. The third and main contribution is the introduction of a model-free RL algorithm using GPs, DGPQ, which is sample efficient and, in contrast to model-based algorithms, capable of acting in real time, as demonstrated on a five-dimensional aircraft simulator.}
}

Endnote

%0 Conference Paper
%T Sample Efficient Reinforcement Learning with Gaussian Processes
%A Robert Grande
%A Thomas Walsh
%A Jonathan How
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-grande14
%I PMLR
%P 1332--1340
%U https://proceedings.mlr.press/v32/grande14.html
%V 32
%N 2
%X This paper derives sample complexity results for using Gaussian Processes (GPs) in both model-based and model-free reinforcement learning (RL). We show that GPs are KWIK learnable, proving for the first time that a model-based RL approach using GPs, GP-Rmax, is sample efficient (PAC-MDP). However, we then show that previous approaches to model-free RL using GPs take an exponential number of steps to find an optimal policy, and are therefore not sample efficient. The third and main contribution is the introduction of a model-free RL algorithm using GPs, DGPQ, which is sample efficient and, in contrast to model-based algorithms, capable of acting in real time, as demonstrated on a five-dimensional aircraft simulator.

RIS


TY  - CPAPER
TI  - Sample Efficient Reinforcement Learning with Gaussian Processes
AU  - Robert Grande
AU  - Thomas Walsh
AU  - Jonathan How
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/06/18
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-grande14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 2
SP  - 1332
EP  - 1340
L1  - http://proceedings.mlr.press/v32/grande14.pdf
UR  - https://proceedings.mlr.press/v32/grande14.html
AB  - This paper derives sample complexity results for using Gaussian Processes (GPs) in both model-based and model-free reinforcement learning (RL). We show that GPs are KWIK learnable, proving for the first time that a model-based RL approach using GPs, GP-Rmax, is sample efficient (PAC-MDP). However, we then show that previous approaches to model-free RL using GPs take an exponential number of steps to find an optimal policy, and are therefore not sample efficient. The third and main contribution is the introduction of a model-free RL algorithm using GPs, DGPQ, which is sample efficient and, in contrast to model-based algorithms, capable of acting in real time, as demonstrated on a five-dimensional aircraft simulator.
ER  -

APA


Grande, R., Walsh, T. & How, J.. (2014). Sample Efficient Reinforcement Learning with Gaussian Processes. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):1332-1340 Available from https://proceedings.mlr.press/v32/grande14.html.

Sample Efficient Reinforcement Learning with Gaussian Processes

Abstract

Cite this Paper

Related Material