Provably efficient RL with Rich Observations via Latent State Decoding

Simon Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudik, John Langford
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:1665-1674, 2019.

Abstract

We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states. Under certain identifiability assumptions, we demonstrate how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps—where previously decoded latent states provide labels for later regression problems—and use it to construct good exploration policies. We provide finite-sample guarantees on the quality of the learned state decoding function and exploration policies, and complement our theory with an empirical evaluation on a class of hard exploration problems. Our method exponentially improves over $Q$-learning with naïve exploration, even when $Q$-learning has cheating access to latent states.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-du19b, title = {Provably efficient {RL} with Rich Observations via Latent State Decoding}, author = {Du, Simon and Krishnamurthy, Akshay and Jiang, Nan and Agarwal, Alekh and Dudik, Miroslav and Langford, John}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {1665--1674}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/du19b/du19b.pdf}, url = {https://proceedings.mlr.press/v97/du19b.html}, abstract = {We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states. Under certain identifiability assumptions, we demonstrate how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps—where previously decoded latent states provide labels for later regression problems—and use it to construct good exploration policies. We provide finite-sample guarantees on the quality of the learned state decoding function and exploration policies, and complement our theory with an empirical evaluation on a class of hard exploration problems. Our method exponentially improves over $Q$-learning with naïve exploration, even when $Q$-learning has cheating access to latent states.} }
Endnote
%0 Conference Paper %T Provably efficient RL with Rich Observations via Latent State Decoding %A Simon Du %A Akshay Krishnamurthy %A Nan Jiang %A Alekh Agarwal %A Miroslav Dudik %A John Langford %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-du19b %I PMLR %P 1665--1674 %U https://proceedings.mlr.press/v97/du19b.html %V 97 %X We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states. Under certain identifiability assumptions, we demonstrate how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps—where previously decoded latent states provide labels for later regression problems—and use it to construct good exploration policies. We provide finite-sample guarantees on the quality of the learned state decoding function and exploration policies, and complement our theory with an empirical evaluation on a class of hard exploration problems. Our method exponentially improves over $Q$-learning with naïve exploration, even when $Q$-learning has cheating access to latent states.
APA
Du, S., Krishnamurthy, A., Jiang, N., Agarwal, A., Dudik, M. & Langford, J.. (2019). Provably efficient RL with Rich Observations via Latent State Decoding. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:1665-1674 Available from https://proceedings.mlr.press/v97/du19b.html.

Related Material