Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models

Bernardo Ávila Pires; Csaba Szepesvári

Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models

Bernardo Ávila Pires, Csaba Szepesvári

29th Annual Conference on Learning Theory, PMLR 49:121-151, 2016.

Abstract

In this paper we study a model-based approach to calculating approximately optimal policies in Markovian Decision Processes. In particular, we derive novel bounds on the loss of using a policy derived from a factored linear model, a class of models which generalize numerous previous models out of those that come with strong computational guarantees. For the first time in the literature, we derive performance bounds for model-based techniques where the model inaccuracy is measured in weighted norms. Moreover, our bounds show a decreased sensitivity to the discount factor and, unlike similar bounds derived for other approaches, they are insensitive to measure mismatch. Similarly to previous works, our proofs are also based on contraction arguments, but with the main differences that we use carefully constructed norms building on Banach lattices, and the contraction property is only assumed for operators acting on “compressed” spaces, thus weakening previous assumptions, while strengthening previous results.

Cite this Paper

BibTeX


@InProceedings{pmlr-v49-avilapires16,
  title = 	 {Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models},
  author = 	 {Ávila Pires, Bernardo and Szepesvári, Csaba},
  booktitle = 	 {29th Annual Conference on Learning Theory},
  pages = 	 {121--151},
  year = 	 {2016},
  editor = 	 {Feldman, Vitaly and Rakhlin, Alexander and Shamir, Ohad},
  volume = 	 {49},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Columbia University, New York, New York, USA},
  month = 	 {23--26 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v49/avilapires16.pdf},
  url = 	 {https://proceedings.mlr.press/v49/avilapires16.html},
  abstract = 	 {In this paper we study a model-based approach to calculating approximately optimal policies in Markovian Decision Processes. In particular, we derive novel bounds on the loss of using a policy derived from a factored linear model, a class of models which generalize numerous previous models out of those that come with strong computational guarantees. For the first time in the literature, we derive performance bounds for model-based techniques where the model inaccuracy is measured in weighted norms. Moreover, our bounds show a decreased sensitivity to the discount factor and, unlike similar bounds derived for other approaches, they are insensitive to measure mismatch. Similarly to previous works, our proofs are also based on contraction arguments, but with the main differences that we use carefully constructed norms building on Banach lattices, and the contraction property is only assumed for operators acting on “compressed” spaces, thus weakening previous assumptions, while strengthening previous results.}
}

Endnote

%0 Conference Paper
%T Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models
%A Bernardo Ávila Pires
%A Csaba Szepesvári
%B 29th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2016
%E Vitaly Feldman
%E Alexander Rakhlin
%E Ohad Shamir	
%F pmlr-v49-avilapires16
%I PMLR
%P 121--151
%U https://proceedings.mlr.press/v49/avilapires16.html
%V 49
%X In this paper we study a model-based approach to calculating approximately optimal policies in Markovian Decision Processes. In particular, we derive novel bounds on the loss of using a policy derived from a factored linear model, a class of models which generalize numerous previous models out of those that come with strong computational guarantees. For the first time in the literature, we derive performance bounds for model-based techniques where the model inaccuracy is measured in weighted norms. Moreover, our bounds show a decreased sensitivity to the discount factor and, unlike similar bounds derived for other approaches, they are insensitive to measure mismatch. Similarly to previous works, our proofs are also based on contraction arguments, but with the main differences that we use carefully constructed norms building on Banach lattices, and the contraction property is only assumed for operators acting on “compressed” spaces, thus weakening previous assumptions, while strengthening previous results.

RIS


TY  - CPAPER
TI  - Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models
AU  - Bernardo Ávila Pires
AU  - Csaba Szepesvári
BT  - 29th Annual Conference on Learning Theory
DA  - 2016/06/06
ED  - Vitaly Feldman
ED  - Alexander Rakhlin
ED  - Ohad Shamir	
ID  - pmlr-v49-avilapires16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 49
SP  - 121
EP  - 151
L1  - http://proceedings.mlr.press/v49/avilapires16.pdf
UR  - https://proceedings.mlr.press/v49/avilapires16.html
AB  - In this paper we study a model-based approach to calculating approximately optimal policies in Markovian Decision Processes. In particular, we derive novel bounds on the loss of using a policy derived from a factored linear model, a class of models which generalize numerous previous models out of those that come with strong computational guarantees. For the first time in the literature, we derive performance bounds for model-based techniques where the model inaccuracy is measured in weighted norms. Moreover, our bounds show a decreased sensitivity to the discount factor and, unlike similar bounds derived for other approaches, they are insensitive to measure mismatch. Similarly to previous works, our proofs are also based on contraction arguments, but with the main differences that we use carefully constructed norms building on Banach lattices, and the contraction property is only assumed for operators acting on “compressed” spaces, thus weakening previous assumptions, while strengthening previous results.
ER  -

APA


Ávila Pires, B. & Szepesvári, C.. (2016). Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models. 29th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 49:121-151 Available from https://proceedings.mlr.press/v49/avilapires16.html.

Related Material

Download PDF