On Average Reward Policy Evaluation in Infinite-State Partially Observable Systems

Yuri Grinberg; Doina Precup

On Average Reward Policy Evaluation in Infinite-State Partially Observable Systems

Yuri Grinberg, Doina Precup

Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:449-457, 2012.

Abstract

We investigate the problem of estimating the average reward of given decision policies in discrete-time controllable dynamical systems with finite action and observation sets, but possibly infinite state space. Unlike in systems with finite state spaces, in infinite–state systems the expected reward for some policies might not exist, so policy evaluation, which is a key step in optimal control methods, might fail. Our main analysis tool is Ergodic theory, which allows learning potentially useful quantities from the system without building a model. Our main contribution is three-fold. First, we present several dynamical systems that demonstrate the difficulty of learning in the general case, without making additional assumptions. We state the necessary condition that the underlying system must satisfy to be amenable for learning. Second, we discuss the relationship between this condition and state-of-the-art predictive representations, and we show that there are systems that satisfy the above condition but cannot be modeled by such representations. Third, we establish sufficient conditions for average-reward policy evaluation in this setting.

Cite this Paper

BibTeX


@InProceedings{pmlr-v22-grinberg12,
  title = 	 {On Average Reward Policy Evaluation in Infinite-State Partially Observable Systems},
  author = 	 {Grinberg, Yuri and Precup, Doina},
  booktitle = 	 {Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {449--457},
  year = 	 {2012},
  editor = 	 {Lawrence, Neil D. and Girolami, Mark},
  volume = 	 {22},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {La Palma, Canary Islands},
  month = 	 {21--23 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v22/grinberg12/grinberg12.pdf},
  url = 	 {https://proceedings.mlr.press/v22/grinberg12.html},
  abstract = 	 {We investigate the problem of estimating the average reward of given decision policies in discrete-time controllable dynamical systems with  finite action and observation sets, but possibly infinite state space.   Unlike in systems with finite state spaces, in infinite–state systems the expected reward for some policies might not exist, so policy evaluation, which is a key step in optimal control methods, might fail.  Our main analysis tool is Ergodic theory, which allows learning potentially useful quantities from the system without building a model. Our main contribution is three-fold. First, we present several dynamical systems that demonstrate the difficulty of learning in the general case, without making additional assumptions. We state the necessary condition that the underlying system must satisfy to be amenable for learning.  Second, we discuss the relationship between this condition and state-of-the-art predictive representations, and we show that there are systems that satisfy the above condition but cannot be modeled by such representations. Third, we establish sufficient conditions for average-reward policy evaluation in this setting.}
}

Endnote

%0 Conference Paper
%T On Average Reward Policy Evaluation in Infinite-State Partially Observable Systems
%A Yuri Grinberg
%A Doina Precup
%B Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2012
%E Neil D. Lawrence
%E Mark Girolami	
%F pmlr-v22-grinberg12
%I PMLR
%P 449--457
%U https://proceedings.mlr.press/v22/grinberg12.html
%V 22
%X We investigate the problem of estimating the average reward of given decision policies in discrete-time controllable dynamical systems with  finite action and observation sets, but possibly infinite state space.   Unlike in systems with finite state spaces, in infinite–state systems the expected reward for some policies might not exist, so policy evaluation, which is a key step in optimal control methods, might fail.  Our main analysis tool is Ergodic theory, which allows learning potentially useful quantities from the system without building a model. Our main contribution is three-fold. First, we present several dynamical systems that demonstrate the difficulty of learning in the general case, without making additional assumptions. We state the necessary condition that the underlying system must satisfy to be amenable for learning.  Second, we discuss the relationship between this condition and state-of-the-art predictive representations, and we show that there are systems that satisfy the above condition but cannot be modeled by such representations. Third, we establish sufficient conditions for average-reward policy evaluation in this setting.

RIS


TY  - CPAPER
TI  - On Average Reward Policy Evaluation in Infinite-State Partially Observable Systems
AU  - Yuri Grinberg
AU  - Doina Precup
BT  - Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
DA  - 2012/03/21
ED  - Neil D. Lawrence
ED  - Mark Girolami	
ID  - pmlr-v22-grinberg12
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 22
SP  - 449
EP  - 457
L1  - http://proceedings.mlr.press/v22/grinberg12/grinberg12.pdf
UR  - https://proceedings.mlr.press/v22/grinberg12.html
AB  - We investigate the problem of estimating the average reward of given decision policies in discrete-time controllable dynamical systems with  finite action and observation sets, but possibly infinite state space.   Unlike in systems with finite state spaces, in infinite–state systems the expected reward for some policies might not exist, so policy evaluation, which is a key step in optimal control methods, might fail.  Our main analysis tool is Ergodic theory, which allows learning potentially useful quantities from the system without building a model. Our main contribution is three-fold. First, we present several dynamical systems that demonstrate the difficulty of learning in the general case, without making additional assumptions. We state the necessary condition that the underlying system must satisfy to be amenable for learning.  Second, we discuss the relationship between this condition and state-of-the-art predictive representations, and we show that there are systems that satisfy the above condition but cannot be modeled by such representations. Third, we establish sufficient conditions for average-reward policy evaluation in this setting.
ER  -

APA


Grinberg, Y. & Precup, D.. (2012). On Average Reward Policy Evaluation in Infinite-State Partially Observable Systems. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 22:449-457 Available from https://proceedings.mlr.press/v22/grinberg12.html.

Related Material

Download PDF