Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning

Tianyu Li; Bogdan Mazoure; Doina Precup; Guillaume Rabusseau

Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning

Tianyu Li, Bogdan Mazoure, Doina Precup, Guillaume Rabusseau

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:2852-2862, 2020.

Abstract

Learning and planning in partially-observable domains is one of the most difficult problems in reinforcement learning. Traditional methods consider these two problems as independent, resulting in a classic two-stage paradigm: first learn the environment dynamics and then compute the optimal policy accordingly. This approach, however, disconnects the reward information from the learning of the environment model and can consequently lead to representations that are sample inefficient and time consuming for planning purpose. In this paper, we propose a novel algorithm that incorporate reward information into the representations of the environment to unify these two stages. Our algorithm is closely related to the spectral learning algorithm for predicitive state representations and offers appealing theoretical guarantees and time complexity. We empirically show on two domains that our approach is more sample and time efficient compared to classical methods.

Cite this Paper

BibTeX


@InProceedings{pmlr-v108-li20h,
  title = 	 {Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning},
  author =       {Li, Tianyu and Mazoure, Bogdan and Precup, Doina and Rabusseau, Guillaume},
  booktitle = 	 {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2852--2862},
  year = 	 {2020},
  editor = 	 {Chiappa, Silvia and Calandra, Roberto},
  volume = 	 {108},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {26--28 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v108/li20h/li20h.pdf},
  url = 	 {https://proceedings.mlr.press/v108/li20h.html},
  abstract = 	 {Learning and planning in partially-observable domains is one of the most difficult problems in reinforcement learning. Traditional methods consider these two problems as independent, resulting in a classic two-stage paradigm: first learn the environment dynamics and then compute the optimal policy accordingly. This approach, however, disconnects the reward information from the learning of the environment model and can consequently lead to representations that are sample inefficient and time consuming for planning purpose. In this paper, we propose a novel algorithm that incorporate reward information into the representations of the environment to unify these two stages. Our algorithm is closely related to the spectral learning algorithm for predicitive state representations and offers appealing theoretical guarantees and time complexity. We empirically show on two domains that our approach is more sample  and time efficient compared to  classical methods.}
}

Endnote

%0 Conference Paper
%T Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning
%A Tianyu Li
%A Bogdan Mazoure
%A Doina Precup
%A Guillaume Rabusseau
%B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2020
%E Silvia Chiappa
%E Roberto Calandra	
%F pmlr-v108-li20h
%I PMLR
%P 2852--2862
%U https://proceedings.mlr.press/v108/li20h.html
%V 108
%X Learning and planning in partially-observable domains is one of the most difficult problems in reinforcement learning. Traditional methods consider these two problems as independent, resulting in a classic two-stage paradigm: first learn the environment dynamics and then compute the optimal policy accordingly. This approach, however, disconnects the reward information from the learning of the environment model and can consequently lead to representations that are sample inefficient and time consuming for planning purpose. In this paper, we propose a novel algorithm that incorporate reward information into the representations of the environment to unify these two stages. Our algorithm is closely related to the spectral learning algorithm for predicitive state representations and offers appealing theoretical guarantees and time complexity. We empirically show on two domains that our approach is more sample  and time efficient compared to  classical methods.

APA


Li, T., Mazoure, B., Precup, D. & Rabusseau, G.. (2020). Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:2852-2862 Available from https://proceedings.mlr.press/v108/li20h.html.

Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning

Abstract

Cite this Paper

Related Material