Monte-Carlo Algorithms for the Improvement of Finite-State Stochastic Controllers: Application to Bayes-Adaptive Markov Decision Processes

Michael O. Duff
Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, PMLR R3:93-97, 2001.

Abstract

We consider the problem of "optimal learning" for Markov decision processes with uncertain transition probabilities. Motivated by the correspondence between these processes and partially-observable Markov decision processes, we adopt policies expressed as finite-state stochastic automata, and we propose policy improvement algorithms that utilize Monte-Carlo techniques for gradient estimation and ascent.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR3-duff01a, title = {Monte-Carlo Algorithms for the Improvement of Finite-State Stochastic Controllers: Application to Bayes-Adaptive Markov Decision Processes}, author = {Duff, Michael O.}, booktitle = {Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics}, pages = {93--97}, year = {2001}, editor = {Richardson, Thomas S. and Jaakkola, Tommi S.}, volume = {R3}, series = {Proceedings of Machine Learning Research}, month = {04--07 Jan}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/r3/duff01a/duff01a.pdf}, url = {https://proceedings.mlr.press/r3/duff01a.html}, abstract = {We consider the problem of "optimal learning" for Markov decision processes with uncertain transition probabilities. Motivated by the correspondence between these processes and partially-observable Markov decision processes, we adopt policies expressed as finite-state stochastic automata, and we propose policy improvement algorithms that utilize Monte-Carlo techniques for gradient estimation and ascent.}, note = {Reissued by PMLR on 31 March 2021.} }
Endnote
%0 Conference Paper %T Monte-Carlo Algorithms for the Improvement of Finite-State Stochastic Controllers: Application to Bayes-Adaptive Markov Decision Processes %A Michael O. Duff %B Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2001 %E Thomas S. Richardson %E Tommi S. Jaakkola %F pmlr-vR3-duff01a %I PMLR %P 93--97 %U https://proceedings.mlr.press/r3/duff01a.html %V R3 %X We consider the problem of "optimal learning" for Markov decision processes with uncertain transition probabilities. Motivated by the correspondence between these processes and partially-observable Markov decision processes, we adopt policies expressed as finite-state stochastic automata, and we propose policy improvement algorithms that utilize Monte-Carlo techniques for gradient estimation and ascent. %Z Reissued by PMLR on 31 March 2021.
APA
Duff, M.O.. (2001). Monte-Carlo Algorithms for the Improvement of Finite-State Stochastic Controllers: Application to Bayes-Adaptive Markov Decision Processes. Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R3:93-97 Available from https://proceedings.mlr.press/r3/duff01a.html. Reissued by PMLR on 31 March 2021.

Related Material