Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:449-457, 2012.
Abstract
We investigate the problem of estimating the average reward of given decision policies in discrete-time controllable dynamical systems with finite action and observation sets, but possibly infinite state space. Unlike in systems with finite state spaces, in infinite–state systems the expected reward for some policies might not exist, so policy evaluation, which is a key step in optimal control methods, might fail. Our main analysis tool is Ergodic theory, which allows learning potentially useful quantities from the system without building a model. Our main contribution is three-fold. First, we present several dynamical systems that demonstrate the difficulty of learning in the general case, without making additional assumptions. We state the necessary condition that the underlying system must satisfy to be amenable for learning. Second, we discuss the relationship between this condition and state-of-the-art predictive representations, and we show that there are systems that satisfy the above condition but cannot be modeled by such representations. Third, we establish sufficient conditions for average-reward policy evaluation in this setting.
@InProceedings{pmlr-v22-grinberg12,
title = {On Average Reward Policy Evaluation in Infinite-State Partially Observable Systems},
author = {Yuri Grinberg and Doina Precup},
booktitle = {Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics},
pages = {449--457},
year = {2012},
editor = {Neil D. Lawrence and Mark Girolami},
volume = {22},
series = {Proceedings of Machine Learning Research},
address = {La Palma, Canary Islands},
month = {21--23 Apr},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v22/grinberg12/grinberg12.pdf},
url = {http://proceedings.mlr.press/v22/grinberg12.html},
abstract = {We investigate the problem of estimating the average reward of given decision policies in discrete-time controllable dynamical systems with finite action and observation sets, but possibly infinite state space. Unlike in systems with finite state spaces, in infinite–state systems the expected reward for some policies might not exist, so policy evaluation, which is a key step in optimal control methods, might fail. Our main analysis tool is Ergodic theory, which allows learning potentially useful quantities from the system without building a model. Our main contribution is three-fold. First, we present several dynamical systems that demonstrate the difficulty of learning in the general case, without making additional assumptions. We state the necessary condition that the underlying system must satisfy to be amenable for learning. Second, we discuss the relationship between this condition and state-of-the-art predictive representations, and we show that there are systems that satisfy the above condition but cannot be modeled by such representations. Third, we establish sufficient conditions for average-reward policy evaluation in this setting.}
}
%0 Conference Paper
%T On Average Reward Policy Evaluation in Infinite-State Partially Observable Systems
%A Yuri Grinberg
%A Doina Precup
%B Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2012
%E Neil D. Lawrence
%E Mark Girolami
%F pmlr-v22-grinberg12
%I PMLR
%J Proceedings of Machine Learning Research
%P 449--457
%U http://proceedings.mlr.press
%V 22
%W PMLR
%X We investigate the problem of estimating the average reward of given decision policies in discrete-time controllable dynamical systems with finite action and observation sets, but possibly infinite state space. Unlike in systems with finite state spaces, in infinite–state systems the expected reward for some policies might not exist, so policy evaluation, which is a key step in optimal control methods, might fail. Our main analysis tool is Ergodic theory, which allows learning potentially useful quantities from the system without building a model. Our main contribution is three-fold. First, we present several dynamical systems that demonstrate the difficulty of learning in the general case, without making additional assumptions. We state the necessary condition that the underlying system must satisfy to be amenable for learning. Second, we discuss the relationship between this condition and state-of-the-art predictive representations, and we show that there are systems that satisfy the above condition but cannot be modeled by such representations. Third, we establish sufficient conditions for average-reward policy evaluation in this setting.
TY - CPAPER
TI - On Average Reward Policy Evaluation in Infinite-State Partially Observable Systems
AU - Yuri Grinberg
AU - Doina Precup
BT - Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
PY - 2012/03/21
DA - 2012/03/21
ED - Neil D. Lawrence
ED - Mark Girolami
ID - pmlr-v22-grinberg12
PB - PMLR
SP - 449
DP - PMLR
EP - 457
L1 - http://proceedings.mlr.press/v22/grinberg12/grinberg12.pdf
UR - http://proceedings.mlr.press/v22/grinberg12.html
AB - We investigate the problem of estimating the average reward of given decision policies in discrete-time controllable dynamical systems with finite action and observation sets, but possibly infinite state space. Unlike in systems with finite state spaces, in infinite–state systems the expected reward for some policies might not exist, so policy evaluation, which is a key step in optimal control methods, might fail. Our main analysis tool is Ergodic theory, which allows learning potentially useful quantities from the system without building a model. Our main contribution is three-fold. First, we present several dynamical systems that demonstrate the difficulty of learning in the general case, without making additional assumptions. We state the necessary condition that the underlying system must satisfy to be amenable for learning. Second, we discuss the relationship between this condition and state-of-the-art predictive representations, and we show that there are systems that satisfy the above condition but cannot be modeled by such representations. Third, we establish sufficient conditions for average-reward policy evaluation in this setting.
ER -
Grinberg, Y. & Precup, D.. (2012). On Average Reward Policy Evaluation in Infinite-State Partially Observable Systems. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, in PMLR 22:449-457
This site last compiled Wed, 03 Jan 2018 13:53:01 +0000