Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings

Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:34615-34641, 2023.

Abstract

We study reinforcement learning with function approximation for large-scale Partially Observable Markov Decision Processes (POMDPs) where the state space and observation space are large or even continuous. Particularly, we consider Hilbert space embeddings of POMDP where the feature of latent states and the feature of observations admit a conditional Hilbert space embedding of the observation emission process, and the latent state transition is deterministic. Under the function approximation setup where the optimal latent state-action $Q$-function is linear in the state feature, and the optimal $Q$-function has a gap in actions, we provide a computationally and statistically efficient algorithm for finding the exact optimal policy. We show our algorithm’s computational and statistical complexities scale polynomially with respect to the horizon and the intrinsic dimension of the feature on the observation space. Furthermore, we show both the deterministic latent transitions and gap assumptions are necessary to avoid statistical complexity exponential in horizon or dimension. Since our guarantee does not have an explicit dependence on the size of the state and observation spaces, our algorithm provably scales to large-scale POMDPs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-uehara23a, title = {Computationally Efficient {PAC} {RL} in {POMDP}s with Latent Determinism and Conditional Embeddings}, author = {Uehara, Masatoshi and Sekhari, Ayush and Lee, Jason D. and Kallus, Nathan and Sun, Wen}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {34615--34641}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/uehara23a/uehara23a.pdf}, url = {https://proceedings.mlr.press/v202/uehara23a.html}, abstract = {We study reinforcement learning with function approximation for large-scale Partially Observable Markov Decision Processes (POMDPs) where the state space and observation space are large or even continuous. Particularly, we consider Hilbert space embeddings of POMDP where the feature of latent states and the feature of observations admit a conditional Hilbert space embedding of the observation emission process, and the latent state transition is deterministic. Under the function approximation setup where the optimal latent state-action $Q$-function is linear in the state feature, and the optimal $Q$-function has a gap in actions, we provide a computationally and statistically efficient algorithm for finding the exact optimal policy. We show our algorithm’s computational and statistical complexities scale polynomially with respect to the horizon and the intrinsic dimension of the feature on the observation space. Furthermore, we show both the deterministic latent transitions and gap assumptions are necessary to avoid statistical complexity exponential in horizon or dimension. Since our guarantee does not have an explicit dependence on the size of the state and observation spaces, our algorithm provably scales to large-scale POMDPs.} }
Endnote
%0 Conference Paper %T Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings %A Masatoshi Uehara %A Ayush Sekhari %A Jason D. Lee %A Nathan Kallus %A Wen Sun %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-uehara23a %I PMLR %P 34615--34641 %U https://proceedings.mlr.press/v202/uehara23a.html %V 202 %X We study reinforcement learning with function approximation for large-scale Partially Observable Markov Decision Processes (POMDPs) where the state space and observation space are large or even continuous. Particularly, we consider Hilbert space embeddings of POMDP where the feature of latent states and the feature of observations admit a conditional Hilbert space embedding of the observation emission process, and the latent state transition is deterministic. Under the function approximation setup where the optimal latent state-action $Q$-function is linear in the state feature, and the optimal $Q$-function has a gap in actions, we provide a computationally and statistically efficient algorithm for finding the exact optimal policy. We show our algorithm’s computational and statistical complexities scale polynomially with respect to the horizon and the intrinsic dimension of the feature on the observation space. Furthermore, we show both the deterministic latent transitions and gap assumptions are necessary to avoid statistical complexity exponential in horizon or dimension. Since our guarantee does not have an explicit dependence on the size of the state and observation spaces, our algorithm provably scales to large-scale POMDPs.
APA
Uehara, M., Sekhari, A., Lee, J.D., Kallus, N. & Sun, W.. (2023). Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:34615-34641 Available from https://proceedings.mlr.press/v202/uehara23a.html.

Related Material