Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning

Tianyu Li, Bogdan Mazoure, Doina Precup, Guillaume Rabusseau
; Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:2852-2862, 2020.

Abstract

Learning and planning in partially-observable domains is one of the most difficult problems in reinforcement learning. Traditional methods consider these two problems as independent, resulting in a classic two-stage paradigm: first learn the environment dynamics and then compute the optimal policy accordingly. This approach, however, disconnects the reward information from the learning of the environment model and can consequently lead to representations that are sample inefficient and time consuming for planning purpose. In this paper, we propose a novel algorithm that incorporate reward information into the representations of the environment to unify these two stages. Our algorithm is closely related to the spectral learning algorithm for predicitive state representations and offers appealing theoretical guarantees and time complexity. We empirically show on two domains that our approach is more sample and time efficient compared to classical methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-li20h, title = {Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning}, author = {Li, Tianyu and Mazoure, Bogdan and Precup, Doina and Rabusseau, Guillaume}, pages = {2852--2862}, year = {2020}, editor = {Silvia Chiappa and Roberto Calandra}, volume = {108}, series = {Proceedings of Machine Learning Research}, address = {Online}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/li20h/li20h.pdf}, url = {http://proceedings.mlr.press/v108/li20h.html}, abstract = {Learning and planning in partially-observable domains is one of the most difficult problems in reinforcement learning. Traditional methods consider these two problems as independent, resulting in a classic two-stage paradigm: first learn the environment dynamics and then compute the optimal policy accordingly. This approach, however, disconnects the reward information from the learning of the environment model and can consequently lead to representations that are sample inefficient and time consuming for planning purpose. In this paper, we propose a novel algorithm that incorporate reward information into the representations of the environment to unify these two stages. Our algorithm is closely related to the spectral learning algorithm for predicitive state representations and offers appealing theoretical guarantees and time complexity. We empirically show on two domains that our approach is more sample and time efficient compared to classical methods.} }
Endnote
%0 Conference Paper %T Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning %A Tianyu Li %A Bogdan Mazoure %A Doina Precup %A Guillaume Rabusseau %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-li20h %I PMLR %J Proceedings of Machine Learning Research %P 2852--2862 %U http://proceedings.mlr.press %V 108 %W PMLR %X Learning and planning in partially-observable domains is one of the most difficult problems in reinforcement learning. Traditional methods consider these two problems as independent, resulting in a classic two-stage paradigm: first learn the environment dynamics and then compute the optimal policy accordingly. This approach, however, disconnects the reward information from the learning of the environment model and can consequently lead to representations that are sample inefficient and time consuming for planning purpose. In this paper, we propose a novel algorithm that incorporate reward information into the representations of the environment to unify these two stages. Our algorithm is closely related to the spectral learning algorithm for predicitive state representations and offers appealing theoretical guarantees and time complexity. We empirically show on two domains that our approach is more sample and time efficient compared to classical methods.
APA
Li, T., Mazoure, B., Precup, D. & Rabusseau, G.. (2020). Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in PMLR 108:2852-2862

Related Material