Variational Regret Bounds for Reinforcement Learning

Ronald Ortner, Pratik Gajane, Peter Auer
Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, PMLR 115:81-90, 2020.

Abstract

We consider undiscounted reinforcement learning in Markov decision processes (MDPs) where \textit{both} the reward functions and the state-transition probabilities may vary (gradually or abruptly) over time. For this problem setting, we propose an algorithm and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. The upper bound on the regret is given in terms of the total variation in the MDP. This is the first variational regret bound for the general reinforcement learning setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v115-ortner20a, title = {Variational Regret Bounds for Reinforcement Learning}, author = {Ortner, Ronald and Gajane, Pratik and Auer, Peter}, booktitle = {Proceedings of The 35th Uncertainty in Artificial Intelligence Conference}, pages = {81--90}, year = {2020}, editor = {Adams, Ryan P. and Gogate, Vibhav}, volume = {115}, series = {Proceedings of Machine Learning Research}, month = {22--25 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v115/ortner20a/ortner20a.pdf}, url = {https://proceedings.mlr.press/v115/ortner20a.html}, abstract = {We consider undiscounted reinforcement learning in Markov decision processes (MDPs) where \textit{both} the reward functions and the state-transition probabilities may vary (gradually or abruptly) over time. For this problem setting, we propose an algorithm and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. The upper bound on the regret is given in terms of the total variation in the MDP. This is the first variational regret bound for the general reinforcement learning setting. } }
Endnote
%0 Conference Paper %T Variational Regret Bounds for Reinforcement Learning %A Ronald Ortner %A Pratik Gajane %A Peter Auer %B Proceedings of The 35th Uncertainty in Artificial Intelligence Conference %C Proceedings of Machine Learning Research %D 2020 %E Ryan P. Adams %E Vibhav Gogate %F pmlr-v115-ortner20a %I PMLR %P 81--90 %U https://proceedings.mlr.press/v115/ortner20a.html %V 115 %X We consider undiscounted reinforcement learning in Markov decision processes (MDPs) where \textit{both} the reward functions and the state-transition probabilities may vary (gradually or abruptly) over time. For this problem setting, we propose an algorithm and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. The upper bound on the regret is given in terms of the total variation in the MDP. This is the first variational regret bound for the general reinforcement learning setting.
APA
Ortner, R., Gajane, P. & Auer, P.. (2020). Variational Regret Bounds for Reinforcement Learning. Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, in Proceedings of Machine Learning Research 115:81-90 Available from https://proceedings.mlr.press/v115/ortner20a.html.

Related Material