Provably efficient representation selection in Low-rank Markov Decision Processes: from online to offline RL

W. Zhang; J. He; D. Zhou; Q. Gu; A. Zhang

Provably efficient representation selection in Low-rank Markov Decision Processes: from online to offline RL

W. Zhang, J. He, D. Zhou, Q. Gu, A. Zhang

Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:2488-2497, 2023.

Abstract

The success of deep reinforcement learning (DRL) lies in its ability to learn a representation that is well-suited for the exploration and exploitation task. To understand how the choice of representation can improve the efficiency of reinforcement learning (RL), we study representation selection for a class of low-rank Markov Decision Processes (MDPs) where the transition kernel can be represented in a bilinear form. We propose an efficient algorithm, called ReLEX, for representation learning in both online and offline RL. Specifically, we show that the online version of ReLEX, called ReLEX-UCB, always performs no worse than the state-of-the-art algorithm without representation selection, and achieves a strictly better constant regret if the representation function class has a "coverage" property over the entire state-action space. For the offline counterpart, ReLEX-LCB, we show that the algorithm can find the optimal policy if the representation class can cover the state-action space and achieves gap-dependent sample complexity. This is the first result with constant sample complexity for representation learning in offline RL.

Cite this Paper

BibTeX


@InProceedings{pmlr-v216-zhang23c,
  title = 	 {Provably efficient representation selection in Low-rank {M}arkov Decision Processes: from online to offline {RL}},
  author =       {Zhang, W. and He, J. and Zhou, D. and Gu, Q. and Zhang, A.},
  booktitle = 	 {Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence},
  pages = 	 {2488--2497},
  year = 	 {2023},
  editor = 	 {Evans, Robin J. and Shpitser, Ilya},
  volume = 	 {216},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {31 Jul--04 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v216/zhang23c/zhang23c.pdf},
  url = 	 {https://proceedings.mlr.press/v216/zhang23c.html},
  abstract = 	 {The success of deep reinforcement learning (DRL) lies in its ability to learn a representation that is well-suited for the exploration and exploitation task. To understand how the choice of representation can improve the efficiency of reinforcement learning (RL), we study representation selection for a class of low-rank Markov Decision Processes (MDPs) where the transition kernel can be represented in a bilinear form. We propose an efficient algorithm, called ReLEX, for representation learning in both online and offline RL. Specifically, we show that the online version of ReLEX, called ReLEX-UCB, always performs no worse than the state-of-the-art algorithm without representation selection, and achieves a strictly better constant regret if the representation function class has a "coverage" property over the entire state-action space. For the offline counterpart, ReLEX-LCB, we show that the algorithm can find the optimal policy if the representation class can cover the state-action space and achieves gap-dependent sample complexity. This is the first result with constant sample complexity for representation learning in offline RL.}
}

Endnote

%0 Conference Paper
%T Provably efficient representation selection in Low-rank Markov Decision Processes: from online to offline RL
%A W. Zhang
%A J. He
%A D. Zhou
%A Q. Gu
%A A. Zhang
%B Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2023
%E Robin J. Evans
%E Ilya Shpitser	
%F pmlr-v216-zhang23c
%I PMLR
%P 2488--2497
%U https://proceedings.mlr.press/v216/zhang23c.html
%V 216
%X The success of deep reinforcement learning (DRL) lies in its ability to learn a representation that is well-suited for the exploration and exploitation task. To understand how the choice of representation can improve the efficiency of reinforcement learning (RL), we study representation selection for a class of low-rank Markov Decision Processes (MDPs) where the transition kernel can be represented in a bilinear form. We propose an efficient algorithm, called ReLEX, for representation learning in both online and offline RL. Specifically, we show that the online version of ReLEX, called ReLEX-UCB, always performs no worse than the state-of-the-art algorithm without representation selection, and achieves a strictly better constant regret if the representation function class has a "coverage" property over the entire state-action space. For the offline counterpart, ReLEX-LCB, we show that the algorithm can find the optimal policy if the representation class can cover the state-action space and achieves gap-dependent sample complexity. This is the first result with constant sample complexity for representation learning in offline RL.

APA


Zhang, W., He, J., Zhou, D., Gu, Q. & Zhang, A.. (2023). Provably efficient representation selection in Low-rank Markov Decision Processes: from online to offline RL. Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 216:2488-2497 Available from https://proceedings.mlr.press/v216/zhang23c.html.

Provably efficient representation selection in Low-rank Markov Decision Processes: from online to offline RL

Abstract

Cite this Paper

Related Material