Exponential Hardness of Reinforcement Learning with Linear Function Approximation

Sihan Liu; Gaurav Mahajan; Daniel Kane; Shachar Lovett; Gellért Weisz; Csaba Szepesvári

Exponential Hardness of Reinforcement Learning with Linear Function Approximation

Sihan Liu, Gaurav Mahajan, Daniel Kane, Shachar Lovett, Gellért Weisz, Csaba Szepesvári

Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:1588-1617, 2023.

Abstract

A fundamental question in reinforcement learning theory is: suppose the optimal value functions are linear in given features, can we learn them efficiently? This problem’s counterpart in supervised learning, linear regression, can be solved both statistically and computationally efficiently. Therefore, it was quite surprising when a recent work \cite{kane2022computational} showed a computational-statistical gap for linear reinforcement learning: even though there are polynomial sample-complexity algorithms, unless NP = RP, there are no polynomial time algorithms for this setting.In this work, we build on their result to show a computational lower bound, which is exponential in feature dimension and horizon, for linear reinforcement learning under the Randomized Exponential Time Hypothesis. To prove this we build a round-based game where in each round the learner is searching for an unknown vector in a unit hypercube. The rewards in this game are chosen such that if the learner achieves large reward, then the learner’s actions can be used to simulate solving a variant of 3-SAT, where (a) each variable shows up in a bounded number of clauses (b) if an instance has no solutions then it also has no solutions that satisfy more than (1-

$\epsilon$ )-fraction of clauses. We use standard reductions to show this 3-SAT variant is approximately as hard as 3-SAT. Finally, we also show a lower bound optimized for horizon dependence that almost matches the best known upper bound of

$\exp(\sqrt{H})$ .

Cite this Paper

BibTeX


@InProceedings{pmlr-v195-liu23b,
  title = 	 {Exponential Hardness of Reinforcement Learning with Linear Function Approximation},
  author =       {Liu, Sihan and Mahajan, Gaurav and Kane, Daniel and Lovett, Shachar and Weisz, Gell{\'e}rt and Szepesv{\'a}ri, Csaba},
  booktitle = 	 {Proceedings of Thirty Sixth Conference on Learning Theory},
  pages = 	 {1588--1617},
  year = 	 {2023},
  editor = 	 {Neu, Gergely and Rosasco, Lorenzo},
  volume = 	 {195},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {12--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v195/liu23b/liu23b.pdf},
  url = 	 {https://proceedings.mlr.press/v195/liu23b.html},
  abstract = 	 {A fundamental question in reinforcement learning theory is: suppose the optimal value functions are linear in given features, can we learn them efficiently? This problem’s counterpart in supervised learning, linear regression, can be solved both statistically and computationally efficiently. Therefore, it was quite surprising when a recent work \cite{kane2022computational} showed a computational-statistical gap for linear reinforcement learning: even though there are polynomial sample-complexity algorithms, unless NP = RP, there are no polynomial time algorithms for this setting.In this work, we build on their result to show a computational lower bound, which is exponential in feature dimension and horizon, for linear reinforcement learning under the Randomized Exponential Time Hypothesis. To prove this we build a round-based game where in each round the learner is searching for an unknown vector in a unit hypercube. The rewards in this game are chosen such that if the learner achieves large reward, then the learner’s actions can be used to simulate solving a variant of 3-SAT, where (a) each variable shows up in a bounded number of clauses (b) if an instance has no solutions then it also has no solutions that satisfy more than (1-$\epsilon$)-fraction of clauses. We use standard reductions to show this 3-SAT variant is approximately as hard as 3-SAT. Finally, we also show a lower bound optimized for horizon dependence that almost matches the best known upper bound of $\exp(\sqrt{H})$.}
}

Endnote

%0 Conference Paper
%T Exponential Hardness of Reinforcement Learning with Linear Function Approximation
%A Sihan Liu
%A Gaurav Mahajan
%A Daniel Kane
%A Shachar Lovett
%A Gellért Weisz
%A Csaba Szepesvári
%B Proceedings of Thirty Sixth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2023
%E Gergely Neu
%E Lorenzo Rosasco	
%F pmlr-v195-liu23b
%I PMLR
%P 1588--1617
%U https://proceedings.mlr.press/v195/liu23b.html
%V 195
%X A fundamental question in reinforcement learning theory is: suppose the optimal value functions are linear in given features, can we learn them efficiently? This problem’s counterpart in supervised learning, linear regression, can be solved both statistically and computationally efficiently. Therefore, it was quite surprising when a recent work \cite{kane2022computational} showed a computational-statistical gap for linear reinforcement learning: even though there are polynomial sample-complexity algorithms, unless NP = RP, there are no polynomial time algorithms for this setting.In this work, we build on their result to show a computational lower bound, which is exponential in feature dimension and horizon, for linear reinforcement learning under the Randomized Exponential Time Hypothesis. To prove this we build a round-based game where in each round the learner is searching for an unknown vector in a unit hypercube. The rewards in this game are chosen such that if the learner achieves large reward, then the learner’s actions can be used to simulate solving a variant of 3-SAT, where (a) each variable shows up in a bounded number of clauses (b) if an instance has no solutions then it also has no solutions that satisfy more than (1-$\epsilon$)-fraction of clauses. We use standard reductions to show this 3-SAT variant is approximately as hard as 3-SAT. Finally, we also show a lower bound optimized for horizon dependence that almost matches the best known upper bound of $\exp(\sqrt{H})$.

APA


Liu, S., Mahajan, G., Kane, D., Lovett, S., Weisz, G. & Szepesvári, C.. (2023). Exponential Hardness of Reinforcement Learning with Linear Function Approximation. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:1588-1617 Available from https://proceedings.mlr.press/v195/liu23b.html.

Related Material

Download PDF