Finite-sample Analysis of Bellman Residual Minimization

Odalric-Ambrym Maillard; Remi Munos; Alessandro Lazaric; Mohammad Ghavamzadeh

Finite-sample Analysis of Bellman Residual Minimization

Odalric-Ambrym Maillard, Remi Munos, Alessandro Lazaric, Mohammad Ghavamzadeh

Proceedings of 2nd Asian Conference on Machine Learning, PMLR 13:299-314, 2010.

Abstract

We consider the Bellman residual minimization approach for solving discounted Markov decision problems, where we assume that a generative model of the dynamics and rewards is available. At each policy iteration step, an approximation of the value function for the current policy is obtained by minimizing an empirical Bellman residual defined on a set of $n$ states drawn i.i.d. from a distribution $\mu$, the immediate rewards, and the next states sampled from the model. Our main result is a generalization bound for the Bellman residual in linear approximation spaces. In particular, we prove that the empirical Bellman residual approaches the true (quadratic) Bellman residual in $\mu$-norm with a rate of order $O(1/\sqrt{n})$. This result implies that minimizing the empirical residual is indeed a sound approach for the minimization of the true Bellman residual which guarantees a good approximation of the value function for each policy. Finally, we derive performance bounds for the resulting approximate policy iteration algorithm in terms of the number of samples $n$ and a measure of how well the function space is able to approximate the sequence of value functions.

Cite this Paper

BibTeX

@InProceedings{pmlr-v13-maillard10a,
  title = 	 {Finite-sample Analysis of Bellman Residual Minimization},
  author = 	 {Maillard, Odalric-Ambrym and Munos, Remi and Lazaric, Alessandro and Ghavamzadeh, Mohammad},
  booktitle = 	 {Proceedings of 2nd Asian Conference on Machine Learning},
  pages = 	 {299--314},
  year = 	 {2010},
  editor = 	 {Sugiyama, Masashi and Yang, Qiang},
  volume = 	 {13},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Tokyo, Japan},
  month = 	 {08--10 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v13/maillard10a/maillard10a.pdf},
  url = 	 {https://proceedings.mlr.press/v13/maillard10a.html},
  abstract = 	 {We consider the Bellman residual minimization approach for solving discounted Markov decision problems, where we assume that a generative model of the dynamics and rewards is available. At each policy iteration step, an approximation of the value function for the current policy is obtained by minimizing an empirical Bellman residual defined on a set of $n$ states drawn i.i.d. from a distribution $\mu$, the immediate rewards, and the next states sampled from the model. Our main result is a generalization bound for the Bellman residual in linear approximation spaces. In particular, we prove that the empirical Bellman residual approaches the true (quadratic) Bellman residual in $\mu$-norm with a rate of order $O(1/\sqrt{n})$. This result implies that minimizing the empirical residual is indeed a sound approach for the minimization of the true Bellman residual which guarantees a good approximation of the value function for each policy. Finally, we derive performance bounds for the resulting approximate policy iteration algorithm in terms of the number of samples $n$ and a measure of how well the function space is able to approximate the sequence of value functions.}
}

Endnote

%0 Conference Paper
%T Finite-sample Analysis of Bellman Residual Minimization
%A Odalric-Ambrym Maillard
%A Remi Munos
%A Alessandro Lazaric
%A Mohammad Ghavamzadeh
%B Proceedings of 2nd Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2010
%E Masashi Sugiyama
%E Qiang Yang	
%F pmlr-v13-maillard10a
%I PMLR
%P 299--314
%U https://proceedings.mlr.press/v13/maillard10a.html
%V 13
%X We consider the Bellman residual minimization approach for solving discounted Markov decision problems, where we assume that a generative model of the dynamics and rewards is available. At each policy iteration step, an approximation of the value function for the current policy is obtained by minimizing an empirical Bellman residual defined on a set of $n$ states drawn i.i.d. from a distribution $\mu$, the immediate rewards, and the next states sampled from the model. Our main result is a generalization bound for the Bellman residual in linear approximation spaces. In particular, we prove that the empirical Bellman residual approaches the true (quadratic) Bellman residual in $\mu$-norm with a rate of order $O(1/\sqrt{n})$. This result implies that minimizing the empirical residual is indeed a sound approach for the minimization of the true Bellman residual which guarantees a good approximation of the value function for each policy. Finally, we derive performance bounds for the resulting approximate policy iteration algorithm in terms of the number of samples $n$ and a measure of how well the function space is able to approximate the sequence of value functions.

RIS

TY  - CPAPER
TI  - Finite-sample Analysis of Bellman Residual Minimization
AU  - Odalric-Ambrym Maillard
AU  - Remi Munos
AU  - Alessandro Lazaric
AU  - Mohammad Ghavamzadeh
BT  - Proceedings of 2nd Asian Conference on Machine Learning
DA  - 2010/10/31
ED  - Masashi Sugiyama
ED  - Qiang Yang	
ID  - pmlr-v13-maillard10a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 13
SP  - 299
EP  - 314
L1  - http://proceedings.mlr.press/v13/maillard10a/maillard10a.pdf
UR  - https://proceedings.mlr.press/v13/maillard10a.html
AB  - We consider the Bellman residual minimization approach for solving discounted Markov decision problems, where we assume that a generative model of the dynamics and rewards is available. At each policy iteration step, an approximation of the value function for the current policy is obtained by minimizing an empirical Bellman residual defined on a set of $n$ states drawn i.i.d. from a distribution $\mu$, the immediate rewards, and the next states sampled from the model. Our main result is a generalization bound for the Bellman residual in linear approximation spaces. In particular, we prove that the empirical Bellman residual approaches the true (quadratic) Bellman residual in $\mu$-norm with a rate of order $O(1/\sqrt{n})$. This result implies that minimizing the empirical residual is indeed a sound approach for the minimization of the true Bellman residual which guarantees a good approximation of the value function for each policy. Finally, we derive performance bounds for the resulting approximate policy iteration algorithm in terms of the number of samples $n$ and a measure of how well the function space is able to approximate the sequence of value functions.
ER  -

APA

Maillard, O., Munos, R., Lazaric, A. & Ghavamzadeh, M.. (2010). Finite-sample Analysis of Bellman Residual Minimization. Proceedings of 2nd Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 13:299-314 Available from https://proceedings.mlr.press/v13/maillard10a.html.

Related Material

Download PDF