Finite-sample Analysis of Bellman Residual Minimization

Odalric-Ambrym Maillard, Remi Munos, Alessandro Lazaric, Mohammad Ghavamzadeh
Proceedings of 2nd Asian Conference on Machine Learning, PMLR 13:299-314, 2010.

Abstract

We consider the Bellman residual minimization approach for solving discounted Markov decision problems, where we assume that a generative model of the dynamics and rewards is available. At each policy iteration step, an approximation of the value function for the current policy is obtained by minimizing an empirical Bellman residual defined on a set of $n$ states drawn i.i.d. from a distribution $\mu$, the immediate rewards, and the next states sampled from the model. Our main result is a generalization bound for the Bellman residual in linear approximation spaces. In particular, we prove that the empirical Bellman residual approaches the true (quadratic) Bellman residual in $\mu$-norm with a rate of order $O(1/\sqrt{n})$. This result implies that minimizing the empirical residual is indeed a sound approach for the minimization of the true Bellman residual which guarantees a good approximation of the value function for each policy. Finally, we derive performance bounds for the resulting approximate policy iteration algorithm in terms of the number of samples $n$ and a measure of how well the function space is able to approximate the sequence of value functions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v13-maillard10a, title = {Finite-sample Analysis of Bellman Residual Minimization}, author = {Maillard, Odalric-Ambrym and Munos, Remi and Lazaric, Alessandro and Ghavamzadeh, Mohammad}, booktitle = {Proceedings of 2nd Asian Conference on Machine Learning}, pages = {299--314}, year = {2010}, editor = {Sugiyama, Masashi and Yang, Qiang}, volume = {13}, series = {Proceedings of Machine Learning Research}, address = {Tokyo, Japan}, month = {08--10 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v13/maillard10a/maillard10a.pdf}, url = {https://proceedings.mlr.press/v13/maillard10a.html}, abstract = {We consider the Bellman residual minimization approach for solving discounted Markov decision problems, where we assume that a generative model of the dynamics and rewards is available. At each policy iteration step, an approximation of the value function for the current policy is obtained by minimizing an empirical Bellman residual defined on a set of $n$ states drawn i.i.d. from a distribution $\mu$, the immediate rewards, and the next states sampled from the model. Our main result is a generalization bound for the Bellman residual in linear approximation spaces. In particular, we prove that the empirical Bellman residual approaches the true (quadratic) Bellman residual in $\mu$-norm with a rate of order $O(1/\sqrt{n})$. This result implies that minimizing the empirical residual is indeed a sound approach for the minimization of the true Bellman residual which guarantees a good approximation of the value function for each policy. Finally, we derive performance bounds for the resulting approximate policy iteration algorithm in terms of the number of samples $n$ and a measure of how well the function space is able to approximate the sequence of value functions.} }
Endnote
%0 Conference Paper %T Finite-sample Analysis of Bellman Residual Minimization %A Odalric-Ambrym Maillard %A Remi Munos %A Alessandro Lazaric %A Mohammad Ghavamzadeh %B Proceedings of 2nd Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2010 %E Masashi Sugiyama %E Qiang Yang %F pmlr-v13-maillard10a %I PMLR %P 299--314 %U https://proceedings.mlr.press/v13/maillard10a.html %V 13 %X We consider the Bellman residual minimization approach for solving discounted Markov decision problems, where we assume that a generative model of the dynamics and rewards is available. At each policy iteration step, an approximation of the value function for the current policy is obtained by minimizing an empirical Bellman residual defined on a set of $n$ states drawn i.i.d. from a distribution $\mu$, the immediate rewards, and the next states sampled from the model. Our main result is a generalization bound for the Bellman residual in linear approximation spaces. In particular, we prove that the empirical Bellman residual approaches the true (quadratic) Bellman residual in $\mu$-norm with a rate of order $O(1/\sqrt{n})$. This result implies that minimizing the empirical residual is indeed a sound approach for the minimization of the true Bellman residual which guarantees a good approximation of the value function for each policy. Finally, we derive performance bounds for the resulting approximate policy iteration algorithm in terms of the number of samples $n$ and a measure of how well the function space is able to approximate the sequence of value functions.
RIS
TY - CPAPER TI - Finite-sample Analysis of Bellman Residual Minimization AU - Odalric-Ambrym Maillard AU - Remi Munos AU - Alessandro Lazaric AU - Mohammad Ghavamzadeh BT - Proceedings of 2nd Asian Conference on Machine Learning DA - 2010/10/31 ED - Masashi Sugiyama ED - Qiang Yang ID - pmlr-v13-maillard10a PB - PMLR DP - Proceedings of Machine Learning Research VL - 13 SP - 299 EP - 314 L1 - http://proceedings.mlr.press/v13/maillard10a/maillard10a.pdf UR - https://proceedings.mlr.press/v13/maillard10a.html AB - We consider the Bellman residual minimization approach for solving discounted Markov decision problems, where we assume that a generative model of the dynamics and rewards is available. At each policy iteration step, an approximation of the value function for the current policy is obtained by minimizing an empirical Bellman residual defined on a set of $n$ states drawn i.i.d. from a distribution $\mu$, the immediate rewards, and the next states sampled from the model. Our main result is a generalization bound for the Bellman residual in linear approximation spaces. In particular, we prove that the empirical Bellman residual approaches the true (quadratic) Bellman residual in $\mu$-norm with a rate of order $O(1/\sqrt{n})$. This result implies that minimizing the empirical residual is indeed a sound approach for the minimization of the true Bellman residual which guarantees a good approximation of the value function for each policy. Finally, we derive performance bounds for the resulting approximate policy iteration algorithm in terms of the number of samples $n$ and a measure of how well the function space is able to approximate the sequence of value functions. ER -
APA
Maillard, O., Munos, R., Lazaric, A. & Ghavamzadeh, M.. (2010). Finite-sample Analysis of Bellman Residual Minimization. Proceedings of 2nd Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 13:299-314 Available from https://proceedings.mlr.press/v13/maillard10a.html.

Related Material