Solving Hidden-Mode Markov Decision Problems

Samuel Ping-Man Choi, Nevin Lianwen Zhang, Dit-Yan Yeung
Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, PMLR R3:49-56, 2001.

Abstract

Markov decision processes (HM-MDPs) are a novel mathematical framework for a subclass of nonstationary reinforcement learning problems where environment dynamics change over time according to a Markov process. HM-MDPs are a special case of partially observable Markov decision processes (POMDPs), and therefore nonstationary problems of this type can in principle be addressed indirectly via existing POMDP algorithms. However, previous research has shown that such an indirect approach is inefficient compared with a direct HM-MDP approach in terms of the model learning time. In this paper, we investigate how to solve HM-MDP problems efficiently by using a direct approach. We exploit the HM-MDP structure and derive an equation for dynamic programming update. Our equation decomposes the value function into a number of components and as a result, substantially reduces the amount of computations in finding optimal policies. Based on the incremental pruning and point-based improvement techniques, a value iteration algorithm is also implemented. Empirical results show that the HM-MDP approach outperforms the POMDP one several order of magnitude with respect to both space requirement and speed.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR3-choi01a, title = {Solving Hidden-Mode Markov Decision Problems}, author = {Choi, Samuel Ping-Man and Zhang, Nevin Lianwen and Yeung, Dit-Yan}, booktitle = {Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics}, pages = {49--56}, year = {2001}, editor = {Richardson, Thomas S. and Jaakkola, Tommi S.}, volume = {R3}, series = {Proceedings of Machine Learning Research}, month = {04--07 Jan}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/r3/choi01a/choi01a.pdf}, url = {https://proceedings.mlr.press/r3/choi01a.html}, abstract = {Markov decision processes (HM-MDPs) are a novel mathematical framework for a subclass of nonstationary reinforcement learning problems where environment dynamics change over time according to a Markov process. HM-MDPs are a special case of partially observable Markov decision processes (POMDPs), and therefore nonstationary problems of this type can in principle be addressed indirectly via existing POMDP algorithms. However, previous research has shown that such an indirect approach is inefficient compared with a direct HM-MDP approach in terms of the model learning time. In this paper, we investigate how to solve HM-MDP problems efficiently by using a direct approach. We exploit the HM-MDP structure and derive an equation for dynamic programming update. Our equation decomposes the value function into a number of components and as a result, substantially reduces the amount of computations in finding optimal policies. Based on the incremental pruning and point-based improvement techniques, a value iteration algorithm is also implemented. Empirical results show that the HM-MDP approach outperforms the POMDP one several order of magnitude with respect to both space requirement and speed.}, note = {Reissued by PMLR on 31 March 2021.} }
Endnote
%0 Conference Paper %T Solving Hidden-Mode Markov Decision Problems %A Samuel Ping-Man Choi %A Nevin Lianwen Zhang %A Dit-Yan Yeung %B Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2001 %E Thomas S. Richardson %E Tommi S. Jaakkola %F pmlr-vR3-choi01a %I PMLR %P 49--56 %U https://proceedings.mlr.press/r3/choi01a.html %V R3 %X Markov decision processes (HM-MDPs) are a novel mathematical framework for a subclass of nonstationary reinforcement learning problems where environment dynamics change over time according to a Markov process. HM-MDPs are a special case of partially observable Markov decision processes (POMDPs), and therefore nonstationary problems of this type can in principle be addressed indirectly via existing POMDP algorithms. However, previous research has shown that such an indirect approach is inefficient compared with a direct HM-MDP approach in terms of the model learning time. In this paper, we investigate how to solve HM-MDP problems efficiently by using a direct approach. We exploit the HM-MDP structure and derive an equation for dynamic programming update. Our equation decomposes the value function into a number of components and as a result, substantially reduces the amount of computations in finding optimal policies. Based on the incremental pruning and point-based improvement techniques, a value iteration algorithm is also implemented. Empirical results show that the HM-MDP approach outperforms the POMDP one several order of magnitude with respect to both space requirement and speed. %Z Reissued by PMLR on 31 March 2021.
APA
Choi, S.P., Zhang, N.L. & Yeung, D.. (2001). Solving Hidden-Mode Markov Decision Problems. Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R3:49-56 Available from https://proceedings.mlr.press/r3/choi01a.html. Reissued by PMLR on 31 March 2021.

Related Material