On Average Reward Policy Evaluation in Infinite-State Partially Observable Systems

Yuri Grinberg, Doina Precup
; Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:449-457, 2012.

Abstract

We investigate the problem of estimating the average reward of given decision policies in discrete-time controllable dynamical systems with finite action and observation sets, but possibly infinite state space. Unlike in systems with finite state spaces, in infinite–state systems the expected reward for some policies might not exist, so policy evaluation, which is a key step in optimal control methods, might fail. Our main analysis tool is Ergodic theory, which allows learning potentially useful quantities from the system without building a model. Our main contribution is three-fold. First, we present several dynamical systems that demonstrate the difficulty of learning in the general case, without making additional assumptions. We state the necessary condition that the underlying system must satisfy to be amenable for learning. Second, we discuss the relationship between this condition and state-of-the-art predictive representations, and we show that there are systems that satisfy the above condition but cannot be modeled by such representations. Third, we establish sufficient conditions for average-reward policy evaluation in this setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v22-grinberg12, title = {On Average Reward Policy Evaluation in Infinite-State Partially Observable Systems}, author = {Yuri Grinberg and Doina Precup}, booktitle = {Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics}, pages = {449--457}, year = {2012}, editor = {Neil D. Lawrence and Mark Girolami}, volume = {22}, series = {Proceedings of Machine Learning Research}, address = {La Palma, Canary Islands}, month = {21--23 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v22/grinberg12/grinberg12.pdf}, url = {http://proceedings.mlr.press/v22/grinberg12.html}, abstract = {We investigate the problem of estimating the average reward of given decision policies in discrete-time controllable dynamical systems with finite action and observation sets, but possibly infinite state space. Unlike in systems with finite state spaces, in infinite–state systems the expected reward for some policies might not exist, so policy evaluation, which is a key step in optimal control methods, might fail. Our main analysis tool is Ergodic theory, which allows learning potentially useful quantities from the system without building a model. Our main contribution is three-fold. First, we present several dynamical systems that demonstrate the difficulty of learning in the general case, without making additional assumptions. We state the necessary condition that the underlying system must satisfy to be amenable for learning. Second, we discuss the relationship between this condition and state-of-the-art predictive representations, and we show that there are systems that satisfy the above condition but cannot be modeled by such representations. Third, we establish sufficient conditions for average-reward policy evaluation in this setting.} }
Endnote
%0 Conference Paper %T On Average Reward Policy Evaluation in Infinite-State Partially Observable Systems %A Yuri Grinberg %A Doina Precup %B Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2012 %E Neil D. Lawrence %E Mark Girolami %F pmlr-v22-grinberg12 %I PMLR %J Proceedings of Machine Learning Research %P 449--457 %U http://proceedings.mlr.press %V 22 %W PMLR %X We investigate the problem of estimating the average reward of given decision policies in discrete-time controllable dynamical systems with finite action and observation sets, but possibly infinite state space. Unlike in systems with finite state spaces, in infinite–state systems the expected reward for some policies might not exist, so policy evaluation, which is a key step in optimal control methods, might fail. Our main analysis tool is Ergodic theory, which allows learning potentially useful quantities from the system without building a model. Our main contribution is three-fold. First, we present several dynamical systems that demonstrate the difficulty of learning in the general case, without making additional assumptions. We state the necessary condition that the underlying system must satisfy to be amenable for learning. Second, we discuss the relationship between this condition and state-of-the-art predictive representations, and we show that there are systems that satisfy the above condition but cannot be modeled by such representations. Third, we establish sufficient conditions for average-reward policy evaluation in this setting.
RIS
TY - CPAPER TI - On Average Reward Policy Evaluation in Infinite-State Partially Observable Systems AU - Yuri Grinberg AU - Doina Precup BT - Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics PY - 2012/03/21 DA - 2012/03/21 ED - Neil D. Lawrence ED - Mark Girolami ID - pmlr-v22-grinberg12 PB - PMLR SP - 449 DP - PMLR EP - 457 L1 - http://proceedings.mlr.press/v22/grinberg12/grinberg12.pdf UR - http://proceedings.mlr.press/v22/grinberg12.html AB - We investigate the problem of estimating the average reward of given decision policies in discrete-time controllable dynamical systems with finite action and observation sets, but possibly infinite state space. Unlike in systems with finite state spaces, in infinite–state systems the expected reward for some policies might not exist, so policy evaluation, which is a key step in optimal control methods, might fail. Our main analysis tool is Ergodic theory, which allows learning potentially useful quantities from the system without building a model. Our main contribution is three-fold. First, we present several dynamical systems that demonstrate the difficulty of learning in the general case, without making additional assumptions. We state the necessary condition that the underlying system must satisfy to be amenable for learning. Second, we discuss the relationship between this condition and state-of-the-art predictive representations, and we show that there are systems that satisfy the above condition but cannot be modeled by such representations. Third, we establish sufficient conditions for average-reward policy evaluation in this setting. ER -
APA
Grinberg, Y. & Precup, D.. (2012). On Average Reward Policy Evaluation in Infinite-State Partially Observable Systems. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, in PMLR 22:449-457

Related Material