Sample Efficient Reinforcement Learning with Gaussian Processes

Robert Grande, Thomas Walsh, Jonathan How
Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):1332-1340, 2014.

Abstract

This paper derives sample complexity results for using Gaussian Processes (GPs) in both model-based and model-free reinforcement learning (RL). We show that GPs are KWIK learnable, proving for the first time that a model-based RL approach using GPs, GP-Rmax, is sample efficient (PAC-MDP). However, we then show that previous approaches to model-free RL using GPs take an exponential number of steps to find an optimal policy, and are therefore not sample efficient. The third and main contribution is the introduction of a model-free RL algorithm using GPs, DGPQ, which is sample efficient and, in contrast to model-based algorithms, capable of acting in real time, as demonstrated on a five-dimensional aircraft simulator.

Cite this Paper


BibTeX
@InProceedings{pmlr-v32-grande14, title = {Sample Efficient Reinforcement Learning with Gaussian Processes}, author = {Grande, Robert and Walsh, Thomas and How, Jonathan}, booktitle = {Proceedings of the 31st International Conference on Machine Learning}, pages = {1332--1340}, year = {2014}, editor = {Xing, Eric P. and Jebara, Tony}, volume = {32}, number = {2}, series = {Proceedings of Machine Learning Research}, address = {Bejing, China}, month = {22--24 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v32/grande14.pdf}, url = {https://proceedings.mlr.press/v32/grande14.html}, abstract = {This paper derives sample complexity results for using Gaussian Processes (GPs) in both model-based and model-free reinforcement learning (RL). We show that GPs are KWIK learnable, proving for the first time that a model-based RL approach using GPs, GP-Rmax, is sample efficient (PAC-MDP). However, we then show that previous approaches to model-free RL using GPs take an exponential number of steps to find an optimal policy, and are therefore not sample efficient. The third and main contribution is the introduction of a model-free RL algorithm using GPs, DGPQ, which is sample efficient and, in contrast to model-based algorithms, capable of acting in real time, as demonstrated on a five-dimensional aircraft simulator.} }
Endnote
%0 Conference Paper %T Sample Efficient Reinforcement Learning with Gaussian Processes %A Robert Grande %A Thomas Walsh %A Jonathan How %B Proceedings of the 31st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2014 %E Eric P. Xing %E Tony Jebara %F pmlr-v32-grande14 %I PMLR %P 1332--1340 %U https://proceedings.mlr.press/v32/grande14.html %V 32 %N 2 %X This paper derives sample complexity results for using Gaussian Processes (GPs) in both model-based and model-free reinforcement learning (RL). We show that GPs are KWIK learnable, proving for the first time that a model-based RL approach using GPs, GP-Rmax, is sample efficient (PAC-MDP). However, we then show that previous approaches to model-free RL using GPs take an exponential number of steps to find an optimal policy, and are therefore not sample efficient. The third and main contribution is the introduction of a model-free RL algorithm using GPs, DGPQ, which is sample efficient and, in contrast to model-based algorithms, capable of acting in real time, as demonstrated on a five-dimensional aircraft simulator.
RIS
TY - CPAPER TI - Sample Efficient Reinforcement Learning with Gaussian Processes AU - Robert Grande AU - Thomas Walsh AU - Jonathan How BT - Proceedings of the 31st International Conference on Machine Learning DA - 2014/06/18 ED - Eric P. Xing ED - Tony Jebara ID - pmlr-v32-grande14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 32 IS - 2 SP - 1332 EP - 1340 L1 - http://proceedings.mlr.press/v32/grande14.pdf UR - https://proceedings.mlr.press/v32/grande14.html AB - This paper derives sample complexity results for using Gaussian Processes (GPs) in both model-based and model-free reinforcement learning (RL). We show that GPs are KWIK learnable, proving for the first time that a model-based RL approach using GPs, GP-Rmax, is sample efficient (PAC-MDP). However, we then show that previous approaches to model-free RL using GPs take an exponential number of steps to find an optimal policy, and are therefore not sample efficient. The third and main contribution is the introduction of a model-free RL algorithm using GPs, DGPQ, which is sample efficient and, in contrast to model-based algorithms, capable of acting in real time, as demonstrated on a five-dimensional aircraft simulator. ER -
APA
Grande, R., Walsh, T. & How, J.. (2014). Sample Efficient Reinforcement Learning with Gaussian Processes. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):1332-1340 Available from https://proceedings.mlr.press/v32/grande14.html.

Related Material