Robust partially observable Markov decision process

Takayuki Osogami

Robust partially observable Markov decision process

Takayuki Osogami

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:106-115, 2015.

Abstract

We seek to find the robust policy that maximizes the expected cumulative reward for the worst case when a partially observable Markov decision process (POMDP) has uncertain parameters whose values are only known to be in a given region. We prove that the robust value function, which represents the expected cumulative reward that can be obtained with the robust policy, is convex with respect to the belief state. Based on the convexity, we design a value-iteration algorithm for finding the robust policy. We prove that our value iteration converges for an infinite horizon. We also design point-based value iteration for fining the robust policy more efficiency possibly with approximation. Numerical experiments show that our point-based value iteration can adequately find robust policies.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-osogami15,
  title = 	 {Robust partially observable Markov decision process},
  author = 	 {Osogami, Takayuki},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {106--115},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/osogami15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/osogami15.html},
  abstract = 	 {We seek to find the robust policy that maximizes the expected cumulative reward for the worst case when a partially observable Markov decision process (POMDP) has uncertain parameters whose values are only known to be in a given region. We prove that the robust value function, which represents the expected cumulative reward that can be obtained with the robust policy, is convex with respect to the belief state. Based on the convexity, we design a value-iteration algorithm for finding the robust policy. We prove that our value iteration converges for an infinite horizon. We also design point-based value iteration for fining the robust policy more efficiency possibly with approximation. Numerical experiments show that our point-based value iteration can adequately find robust policies.}
}

Endnote

%0 Conference Paper
%T Robust partially observable Markov decision process
%A Takayuki Osogami
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-osogami15
%I PMLR
%P 106--115
%U https://proceedings.mlr.press/v37/osogami15.html
%V 37
%X We seek to find the robust policy that maximizes the expected cumulative reward for the worst case when a partially observable Markov decision process (POMDP) has uncertain parameters whose values are only known to be in a given region. We prove that the robust value function, which represents the expected cumulative reward that can be obtained with the robust policy, is convex with respect to the belief state. Based on the convexity, we design a value-iteration algorithm for finding the robust policy. We prove that our value iteration converges for an infinite horizon. We also design point-based value iteration for fining the robust policy more efficiency possibly with approximation. Numerical experiments show that our point-based value iteration can adequately find robust policies.

RIS


TY  - CPAPER
TI  - Robust partially observable Markov decision process
AU  - Takayuki Osogami
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-osogami15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 106
EP  - 115
L1  - http://proceedings.mlr.press/v37/osogami15.pdf
UR  - https://proceedings.mlr.press/v37/osogami15.html
AB  - We seek to find the robust policy that maximizes the expected cumulative reward for the worst case when a partially observable Markov decision process (POMDP) has uncertain parameters whose values are only known to be in a given region. We prove that the robust value function, which represents the expected cumulative reward that can be obtained with the robust policy, is convex with respect to the belief state. Based on the convexity, we design a value-iteration algorithm for finding the robust policy. We prove that our value iteration converges for an infinite horizon. We also design point-based value iteration for fining the robust policy more efficiency possibly with approximation. Numerical experiments show that our point-based value iteration can adequately find robust policies.
ER  -

APA


Osogami, T.. (2015). Robust partially observable Markov decision process. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:106-115 Available from https://proceedings.mlr.press/v37/osogami15.html.

Robust partially observable Markov decision process

Abstract

Cite this Paper

Related Material