Finite-Memory Near-Optimal Learning for Markov Decision Processes with Long-Run Average Reward

Jan Kretinsky, Fabian Michel, Lukas Michel, Guillermo Perez
Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR 124:1149-1158, 2020.

Abstract

We consider learning policies online in Markov decision processes with the long-run average reward (a.k.a. mean payoff). To ensure implementability of the policies, we focus on policies with finite memory. Firstly, we show that near optimality can be achieved almost surely, using an unintuitive gadget we call forgetfulness. Secondly, we extend the approach to a setting with partial knowledge of the system topology, introducing two optimality measures and providing near-optimal algorithms also for these cases.

Cite this Paper


BibTeX
@InProceedings{pmlr-v124-kretinsky20a, title = {Finite-Memory Near-Optimal Learning for Markov Decision Processes with Long-Run Average Reward}, author = {Kretinsky, Jan and Michel, Fabian and Michel, Lukas and Perez, Guillermo}, booktitle = {Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)}, pages = {1149--1158}, year = {2020}, editor = {Peters, Jonas and Sontag, David}, volume = {124}, series = {Proceedings of Machine Learning Research}, month = {03--06 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v124/kretinsky20a/kretinsky20a.pdf}, url = {https://proceedings.mlr.press/v124/kretinsky20a.html}, abstract = {We consider learning policies online in Markov decision processes with the long-run average reward (a.k.a. mean payoff). To ensure implementability of the policies, we focus on policies with finite memory. Firstly, we show that near optimality can be achieved almost surely, using an unintuitive gadget we call forgetfulness. Secondly, we extend the approach to a setting with partial knowledge of the system topology, introducing two optimality measures and providing near-optimal algorithms also for these cases.} }
Endnote
%0 Conference Paper %T Finite-Memory Near-Optimal Learning for Markov Decision Processes with Long-Run Average Reward %A Jan Kretinsky %A Fabian Michel %A Lukas Michel %A Guillermo Perez %B Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI) %C Proceedings of Machine Learning Research %D 2020 %E Jonas Peters %E David Sontag %F pmlr-v124-kretinsky20a %I PMLR %P 1149--1158 %U https://proceedings.mlr.press/v124/kretinsky20a.html %V 124 %X We consider learning policies online in Markov decision processes with the long-run average reward (a.k.a. mean payoff). To ensure implementability of the policies, we focus on policies with finite memory. Firstly, we show that near optimality can be achieved almost surely, using an unintuitive gadget we call forgetfulness. Secondly, we extend the approach to a setting with partial knowledge of the system topology, introducing two optimality measures and providing near-optimal algorithms also for these cases.
APA
Kretinsky, J., Michel, F., Michel, L. & Perez, G.. (2020). Finite-Memory Near-Optimal Learning for Markov Decision Processes with Long-Run Average Reward. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), in Proceedings of Machine Learning Research 124:1149-1158 Available from https://proceedings.mlr.press/v124/kretinsky20a.html.

Related Material