Contextual Decision Processes with low Bellman rank are PAC-Learnable

Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1704-1713, 2017.

Abstract

This paper studies systematic exploration for reinforcement learning (RL) with rich observations and function approximation. We introduce contextual decision processes (CDPs), that unify most prior RL settings. Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in CDPs and is naturally small for many well-studied RL models. Our second contribution is a new RL algorithm that does systematic exploration to learn near-optimal behavior in CDPs with low Bellman rank. The algorithm requires a number of samples that is polynomial in all relevant parameters but independent of the number of unique contexts. Our approach uses Bellman error minimization with optimistic exploration and provides new insights into efficient exploration for RL with function approximation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-jiang17c, title = {Contextual Decision Processes with low {B}ellman rank are {PAC}-Learnable}, author = {Nan Jiang and Akshay Krishnamurthy and Alekh Agarwal and John Langford and Robert E. Schapire}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {1704--1713}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/jiang17c/jiang17c.pdf}, url = { http://proceedings.mlr.press/v70/jiang17c.html }, abstract = {This paper studies systematic exploration for reinforcement learning (RL) with rich observations and function approximation. We introduce contextual decision processes (CDPs), that unify most prior RL settings. Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in CDPs and is naturally small for many well-studied RL models. Our second contribution is a new RL algorithm that does systematic exploration to learn near-optimal behavior in CDPs with low Bellman rank. The algorithm requires a number of samples that is polynomial in all relevant parameters but independent of the number of unique contexts. Our approach uses Bellman error minimization with optimistic exploration and provides new insights into efficient exploration for RL with function approximation.} }
Endnote
%0 Conference Paper %T Contextual Decision Processes with low Bellman rank are PAC-Learnable %A Nan Jiang %A Akshay Krishnamurthy %A Alekh Agarwal %A John Langford %A Robert E. Schapire %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-jiang17c %I PMLR %P 1704--1713 %U http://proceedings.mlr.press/v70/jiang17c.html %V 70 %X This paper studies systematic exploration for reinforcement learning (RL) with rich observations and function approximation. We introduce contextual decision processes (CDPs), that unify most prior RL settings. Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in CDPs and is naturally small for many well-studied RL models. Our second contribution is a new RL algorithm that does systematic exploration to learn near-optimal behavior in CDPs with low Bellman rank. The algorithm requires a number of samples that is polynomial in all relevant parameters but independent of the number of unique contexts. Our approach uses Bellman error minimization with optimistic exploration and provides new insights into efficient exploration for RL with function approximation.
APA
Jiang, N., Krishnamurthy, A., Agarwal, A., Langford, J. & Schapire, R.E.. (2017). Contextual Decision Processes with low Bellman rank are PAC-Learnable. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:1704-1713 Available from http://proceedings.mlr.press/v70/jiang17c.html .

Related Material