Dynamic Policy Programming with Function Approximation

Mohammad Gheshlaghi Azar; Vicenç Gómez; Bert Kappen

Dynamic Policy Programming with Function Approximation

Mohammad Gheshlaghi Azar, Vicenç Gómez, Bert Kappen

Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, PMLR 15:119-127, 2011.

Abstract

In this paper, we consider the problem of planning in the infinite-horizon discounted-reward Markov decision problems. We propose a novel iterative method, called dynamic policy programming (DPP), which updates the parametrized policy by a Bellman-like iteration. For discrete state-action case, we establish sup-norm loss bounds for the performance of the policy induced by DPP and prove that it asymptotically converges to the optimal policy. Then, we generalize our approach to large-scale (continuous) state-action problems using function approximation technique. We provide sup-norm performance-loss bounds for approximate DPP and compare these bounds with the standard results from approximate dynamic programming (ADP) showing that approximate DPP results in a tighter asymptotic bound than standard ADP methods. We also numerically compare the performance of DPP to other ADP and RL methods. We observe that approximate DPP asymptotically outperforms other methods on the mountain-car problem.

Cite this Paper

BibTeX


@InProceedings{pmlr-v15-azar11a,
  title = 	 {Dynamic Policy Programming with Function Approximation},
  author = 	 {Azar, Mohammad Gheshlaghi and Gómez, Vicenç and Kappen, Bert},
  booktitle = 	 {Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {119--127},
  year = 	 {2011},
  editor = 	 {Gordon, Geoffrey and Dunson, David and Dudík, Miroslav},
  volume = 	 {15},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Fort Lauderdale, FL, USA},
  month = 	 {11--13 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v15/azar11a/azar11a.pdf},
  url = 	 {https://proceedings.mlr.press/v15/azar11a.html},
  abstract = 	 {In this paper, we consider the problem of planning   in the infinite-horizon discounted-reward Markov decision problems.  We propose a novel iterative method, called dynamic policy programming (DPP), which updates the parametrized policy by a Bellman-like iteration.  For discrete state-action case, we establish  sup-norm loss bounds for the performance of the policy induced by DPP and  prove that it asymptotically converges to the optimal policy.  Then, we generalize our approach to large-scale (continuous) state-action problems using function approximation technique.   We provide  sup-norm performance-loss bounds  for approximate DPP and compare these bounds with the standard results  from  approximate dynamic programming (ADP) showing that approximate DPP results in a tighter asymptotic bound than standard ADP methods.  We also numerically compare the performance of DPP to other ADP and RL methods.  We observe that approximate DPP  asymptotically outperforms other methods on the mountain-car problem.}
}

Endnote

%0 Conference Paper
%T Dynamic Policy Programming with Function Approximation
%A Mohammad Gheshlaghi Azar
%A Vicenç Gómez
%A Bert Kappen
%B Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2011
%E Geoffrey Gordon
%E David Dunson
%E Miroslav Dudík	
%F pmlr-v15-azar11a
%I PMLR
%P 119--127
%U https://proceedings.mlr.press/v15/azar11a.html
%V 15
%X In this paper, we consider the problem of planning   in the infinite-horizon discounted-reward Markov decision problems.  We propose a novel iterative method, called dynamic policy programming (DPP), which updates the parametrized policy by a Bellman-like iteration.  For discrete state-action case, we establish  sup-norm loss bounds for the performance of the policy induced by DPP and  prove that it asymptotically converges to the optimal policy.  Then, we generalize our approach to large-scale (continuous) state-action problems using function approximation technique.   We provide  sup-norm performance-loss bounds  for approximate DPP and compare these bounds with the standard results  from  approximate dynamic programming (ADP) showing that approximate DPP results in a tighter asymptotic bound than standard ADP methods.  We also numerically compare the performance of DPP to other ADP and RL methods.  We observe that approximate DPP  asymptotically outperforms other methods on the mountain-car problem.

RIS


TY  - CPAPER
TI  - Dynamic Policy Programming with Function Approximation
AU  - Mohammad Gheshlaghi Azar
AU  - Vicenç Gómez
AU  - Bert Kappen
BT  - Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics
DA  - 2011/06/14
ED  - Geoffrey Gordon
ED  - David Dunson
ED  - Miroslav Dudík	
ID  - pmlr-v15-azar11a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 15
SP  - 119
EP  - 127
L1  - http://proceedings.mlr.press/v15/azar11a/azar11a.pdf
UR  - https://proceedings.mlr.press/v15/azar11a.html
AB  - In this paper, we consider the problem of planning   in the infinite-horizon discounted-reward Markov decision problems.  We propose a novel iterative method, called dynamic policy programming (DPP), which updates the parametrized policy by a Bellman-like iteration.  For discrete state-action case, we establish  sup-norm loss bounds for the performance of the policy induced by DPP and  prove that it asymptotically converges to the optimal policy.  Then, we generalize our approach to large-scale (continuous) state-action problems using function approximation technique.   We provide  sup-norm performance-loss bounds  for approximate DPP and compare these bounds with the standard results  from  approximate dynamic programming (ADP) showing that approximate DPP results in a tighter asymptotic bound than standard ADP methods.  We also numerically compare the performance of DPP to other ADP and RL methods.  We observe that approximate DPP  asymptotically outperforms other methods on the mountain-car problem.
ER  -

APA


Azar, M.G., Gómez, V. & Kappen, B.. (2011). Dynamic Policy Programming with Function Approximation. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 15:119-127 Available from https://proceedings.mlr.press/v15/azar11a.html.

Dynamic Policy Programming with Function Approximation

Abstract

Cite this Paper

Related Material