Value-Aware Loss Function for Model-based Reinforcement Learning

Amir-Massoud Farahmand, Andre Barreto, Daniel Nikovski
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:1486-1494, 2017.

Abstract

We consider the problem of estimating the transition probability kernel to be used by a model-based reinforcement learning (RL) algorithm. We argue that estimating a generative model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does not take into account the underlying structure of decision problem and the RL algorithm that intends to solve it. We introduce a loss function that takes the structure of the value function into account. We provide a finite-sample upper bound for the loss function showing the dependence of the error on model approximation error, number of samples, and the complexity of the model space. We also empirically compare the method with the maximum likelihood estimator on a simple problem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v54-farahmand17a, title = {{Value-Aware Loss Function for Model-based Reinforcement Learning}}, author = {Farahmand, Amir-Massoud and Barreto, Andre and Nikovski, Daniel}, booktitle = {Proceedings of the 20th International Conference on Artificial Intelligence and Statistics}, pages = {1486--1494}, year = {2017}, editor = {Singh, Aarti and Zhu, Jerry}, volume = {54}, series = {Proceedings of Machine Learning Research}, month = {20--22 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v54/farahmand17a/farahmand17a.pdf}, url = {https://proceedings.mlr.press/v54/farahmand17a.html}, abstract = {We consider the problem of estimating the transition probability kernel to be used by a model-based reinforcement learning (RL) algorithm. We argue that estimating a generative model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does not take into account the underlying structure of decision problem and the RL algorithm that intends to solve it. We introduce a loss function that takes the structure of the value function into account. We provide a finite-sample upper bound for the loss function showing the dependence of the error on model approximation error, number of samples, and the complexity of the model space. We also empirically compare the method with the maximum likelihood estimator on a simple problem.} }
Endnote
%0 Conference Paper %T Value-Aware Loss Function for Model-based Reinforcement Learning %A Amir-Massoud Farahmand %A Andre Barreto %A Daniel Nikovski %B Proceedings of the 20th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2017 %E Aarti Singh %E Jerry Zhu %F pmlr-v54-farahmand17a %I PMLR %P 1486--1494 %U https://proceedings.mlr.press/v54/farahmand17a.html %V 54 %X We consider the problem of estimating the transition probability kernel to be used by a model-based reinforcement learning (RL) algorithm. We argue that estimating a generative model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does not take into account the underlying structure of decision problem and the RL algorithm that intends to solve it. We introduce a loss function that takes the structure of the value function into account. We provide a finite-sample upper bound for the loss function showing the dependence of the error on model approximation error, number of samples, and the complexity of the model space. We also empirically compare the method with the maximum likelihood estimator on a simple problem.
APA
Farahmand, A., Barreto, A. & Nikovski, D.. (2017). Value-Aware Loss Function for Model-based Reinforcement Learning. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 54:1486-1494 Available from https://proceedings.mlr.press/v54/farahmand17a.html.

Related Material