On the Occupancy Measure of Non-Markovian Policies in Continuous MDPs

Romain Laroche, Remi Tachet Des Combes
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:18548-18562, 2023.

Abstract

The state-action occupancy measure of a policy is the expected (discounted or undiscounted) number of times a state-action couple is visited in a trajectory. For decades, RL books have been reporting the occupancy equivalence between Markovian and non-Markovian policies in countable state-action spaces under mild conditions. This equivalence states that the occupancy of any non-Markovian policy can be equivalently obtained by a Markovian policy, i.e. a memoryless probability distribution, conditioned only on its current state. While expected, for technical reasons, the translation of this result to continuous state space has resisted until now. Our main contribution is to fill this gap and to provide a general measure-theoretic treatment of the problem, permitting, in particular, its extension to continuous MDPs. Furthermore, we show that when the occupancy is infinite, we may encounter some non-trivial cases where the result does not hold anymore.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-laroche23a, title = {On the Occupancy Measure of Non-{M}arkovian Policies in Continuous {MDP}s}, author = {Laroche, Romain and Tachet Des Combes, Remi}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {18548--18562}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/laroche23a/laroche23a.pdf}, url = {https://proceedings.mlr.press/v202/laroche23a.html}, abstract = {The state-action occupancy measure of a policy is the expected (discounted or undiscounted) number of times a state-action couple is visited in a trajectory. For decades, RL books have been reporting the occupancy equivalence between Markovian and non-Markovian policies in countable state-action spaces under mild conditions. This equivalence states that the occupancy of any non-Markovian policy can be equivalently obtained by a Markovian policy, i.e. a memoryless probability distribution, conditioned only on its current state. While expected, for technical reasons, the translation of this result to continuous state space has resisted until now. Our main contribution is to fill this gap and to provide a general measure-theoretic treatment of the problem, permitting, in particular, its extension to continuous MDPs. Furthermore, we show that when the occupancy is infinite, we may encounter some non-trivial cases where the result does not hold anymore.} }
Endnote
%0 Conference Paper %T On the Occupancy Measure of Non-Markovian Policies in Continuous MDPs %A Romain Laroche %A Remi Tachet Des Combes %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-laroche23a %I PMLR %P 18548--18562 %U https://proceedings.mlr.press/v202/laroche23a.html %V 202 %X The state-action occupancy measure of a policy is the expected (discounted or undiscounted) number of times a state-action couple is visited in a trajectory. For decades, RL books have been reporting the occupancy equivalence between Markovian and non-Markovian policies in countable state-action spaces under mild conditions. This equivalence states that the occupancy of any non-Markovian policy can be equivalently obtained by a Markovian policy, i.e. a memoryless probability distribution, conditioned only on its current state. While expected, for technical reasons, the translation of this result to continuous state space has resisted until now. Our main contribution is to fill this gap and to provide a general measure-theoretic treatment of the problem, permitting, in particular, its extension to continuous MDPs. Furthermore, we show that when the occupancy is infinite, we may encounter some non-trivial cases where the result does not hold anymore.
APA
Laroche, R. & Tachet Des Combes, R.. (2023). On the Occupancy Measure of Non-Markovian Policies in Continuous MDPs. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:18548-18562 Available from https://proceedings.mlr.press/v202/laroche23a.html.

Related Material