Active Model Estimation in Markov Decision Processes

Jean Tarbouriech, Shubhanshu Shekhar, Matteo Pirotta, Mohammad Ghavamzadeh, Alessandro Lazaric
Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR 124:1019-1028, 2020.

Abstract

We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP). Efficient exploration in this problem requires the agent to identify the regions in which estimating the model is more difficult and then exploit this knowledge to collect more samples there. In this paper, we formalize this problem, introduce the first algorithm to learn an $\epsilon$-accurate estimate of the dynamics, and provide its sample complexity analysis. While this algorithm enjoys strong guarantees in the large-sample regime, it tends to have a poor performance in early stages of exploration. To address this issue, we propose an algorithm that is based on maximum weighted entropy, a heuristic that stems from common sense and our theoretical analysis. The main idea here is to cover the entire state-action space with the weight proportional to the noise in their transition functions. Using a number of simple domains with heterogeneous noise in their transitions, we show that our heuristic-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime, while achieving similar asymptotic performance as that of the original algorithm.

Cite this Paper


BibTeX
@InProceedings{pmlr-v124-tarbouriech20a, title = {Active Model Estimation in Markov Decision Processes}, author = {Tarbouriech, Jean and Shekhar, Shubhanshu and Pirotta, Matteo and Ghavamzadeh, Mohammad and Lazaric, Alessandro}, booktitle = {Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)}, pages = {1019--1028}, year = {2020}, editor = {Peters, Jonas and Sontag, David}, volume = {124}, series = {Proceedings of Machine Learning Research}, month = {03--06 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v124/tarbouriech20a/tarbouriech20a.pdf}, url = {https://proceedings.mlr.press/v124/tarbouriech20a.html}, abstract = {We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP). Efficient exploration in this problem requires the agent to identify the regions in which estimating the model is more difficult and then exploit this knowledge to collect more samples there. In this paper, we formalize this problem, introduce the first algorithm to learn an $\epsilon$-accurate estimate of the dynamics, and provide its sample complexity analysis. While this algorithm enjoys strong guarantees in the large-sample regime, it tends to have a poor performance in early stages of exploration. To address this issue, we propose an algorithm that is based on maximum weighted entropy, a heuristic that stems from common sense and our theoretical analysis. The main idea here is to cover the entire state-action space with the weight proportional to the noise in their transition functions. Using a number of simple domains with heterogeneous noise in their transitions, we show that our heuristic-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime, while achieving similar asymptotic performance as that of the original algorithm. } }
Endnote
%0 Conference Paper %T Active Model Estimation in Markov Decision Processes %A Jean Tarbouriech %A Shubhanshu Shekhar %A Matteo Pirotta %A Mohammad Ghavamzadeh %A Alessandro Lazaric %B Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI) %C Proceedings of Machine Learning Research %D 2020 %E Jonas Peters %E David Sontag %F pmlr-v124-tarbouriech20a %I PMLR %P 1019--1028 %U https://proceedings.mlr.press/v124/tarbouriech20a.html %V 124 %X We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP). Efficient exploration in this problem requires the agent to identify the regions in which estimating the model is more difficult and then exploit this knowledge to collect more samples there. In this paper, we formalize this problem, introduce the first algorithm to learn an $\epsilon$-accurate estimate of the dynamics, and provide its sample complexity analysis. While this algorithm enjoys strong guarantees in the large-sample regime, it tends to have a poor performance in early stages of exploration. To address this issue, we propose an algorithm that is based on maximum weighted entropy, a heuristic that stems from common sense and our theoretical analysis. The main idea here is to cover the entire state-action space with the weight proportional to the noise in their transition functions. Using a number of simple domains with heterogeneous noise in their transitions, we show that our heuristic-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime, while achieving similar asymptotic performance as that of the original algorithm.
APA
Tarbouriech, J., Shekhar, S., Pirotta, M., Ghavamzadeh, M. & Lazaric, A.. (2020). Active Model Estimation in Markov Decision Processes. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), in Proceedings of Machine Learning Research 124:1019-1028 Available from https://proceedings.mlr.press/v124/tarbouriech20a.html.

Related Material