Approximate Relative Value Learning for Average-reward Continuous State MDPs

Hiteshi Sharma, Mehdi Jafarnia-Jahromi, Rahul Jain
Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, PMLR 115:956-964, 2020.

Abstract

In this paper, we propose an approximate relative value learning (ARVL) algorithm for non- parametric MDPs with continuous state space and finite actions and average reward criterion. It is a sampling based algorithm combined with kernel density estimation and function approximation via nearest neighbors. The theoretical analysis is done via a random contraction operator framework and stochastic dominance argument. This is the first such algorithm for continuous state space MDPs with average re- ward criteria with these provable properties which does not require any discretization of state space as far as we know. We then evaluate the proposed algorithm on a benchmark problem numerically.

Cite this Paper


BibTeX
@InProceedings{pmlr-v115-sharma20a, title = {Approximate Relative Value Learning for Average-reward Continuous State MDPs}, author = {Sharma, Hiteshi and Jafarnia-Jahromi, Mehdi and Jain, Rahul}, booktitle = {Proceedings of The 35th Uncertainty in Artificial Intelligence Conference}, pages = {956--964}, year = {2020}, editor = {Adams, Ryan P. and Gogate, Vibhav}, volume = {115}, series = {Proceedings of Machine Learning Research}, month = {22--25 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v115/sharma20a/sharma20a.pdf}, url = {https://proceedings.mlr.press/v115/sharma20a.html}, abstract = {In this paper, we propose an approximate relative value learning (ARVL) algorithm for non- parametric MDPs with continuous state space and finite actions and average reward criterion. It is a sampling based algorithm combined with kernel density estimation and function approximation via nearest neighbors. The theoretical analysis is done via a random contraction operator framework and stochastic dominance argument. This is the first such algorithm for continuous state space MDPs with average re- ward criteria with these provable properties which does not require any discretization of state space as far as we know. We then evaluate the proposed algorithm on a benchmark problem numerically.} }
Endnote
%0 Conference Paper %T Approximate Relative Value Learning for Average-reward Continuous State MDPs %A Hiteshi Sharma %A Mehdi Jafarnia-Jahromi %A Rahul Jain %B Proceedings of The 35th Uncertainty in Artificial Intelligence Conference %C Proceedings of Machine Learning Research %D 2020 %E Ryan P. Adams %E Vibhav Gogate %F pmlr-v115-sharma20a %I PMLR %P 956--964 %U https://proceedings.mlr.press/v115/sharma20a.html %V 115 %X In this paper, we propose an approximate relative value learning (ARVL) algorithm for non- parametric MDPs with continuous state space and finite actions and average reward criterion. It is a sampling based algorithm combined with kernel density estimation and function approximation via nearest neighbors. The theoretical analysis is done via a random contraction operator framework and stochastic dominance argument. This is the first such algorithm for continuous state space MDPs with average re- ward criteria with these provable properties which does not require any discretization of state space as far as we know. We then evaluate the proposed algorithm on a benchmark problem numerically.
APA
Sharma, H., Jafarnia-Jahromi, M. & Jain, R.. (2020). Approximate Relative Value Learning for Average-reward Continuous State MDPs. Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, in Proceedings of Machine Learning Research 115:956-964 Available from https://proceedings.mlr.press/v115/sharma20a.html.

Related Material