Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions

Gellért Weisz; Philip Amortila; Csaba Szepesvári

Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions

Gellért Weisz, Philip Amortila, Csaba Szepesvári

Proceedings of the 32nd International Conference on Algorithmic Learning Theory, PMLR 132:1237-1264, 2021.

Abstract

We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a feature map that is available to the planner. Previous work has left open the question of whether there exist sound planners that need only

$\mbox{poly}(H,d)$ queries regardless of the MDP, where

$H$ is the horizon and

$d$ is the dimensionality of the features. We answer this question in the negative: we show that any sound planner must query at least

$\min(e^{\Omega(d)},\Omega(2^H))$ samples in the fized-horizon setting and

$e^{\Omega(d)}$ samples in the discounted setting. We also show that for any

$\delta>0$ , the least-squares value iteration algorithm with

$\tilde{\mathcal{O}}(H^5 d^{H+1}/\delta^2)$ queries can compute a

$\delta$ -optimal policy in the fixed-horizon setting. We discuss implications and remaining open questions.

Cite this Paper

BibTeX


@InProceedings{pmlr-v132-weisz21a,
  title = 	 {Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions},
  author =       {Weisz, Gell{\'e}rt and Amortila, Philip and {Sz}epesv{\'a}ri, {Cs}aba},
  booktitle = 	 {Proceedings of the 32nd International Conference on Algorithmic Learning Theory},
  pages = 	 {1237--1264},
  year = 	 {2021},
  editor = 	 {Feldman, Vitaly and Ligett, Katrina and Sabato, Sivan},
  volume = 	 {132},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--19 Mar},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v132/weisz21a/weisz21a.pdf},
  url = 	 {https://proceedings.mlr.press/v132/weisz21a.html},
  abstract = 	 {We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a feature map that is available to the planner. Previous work has left open the question of whether there exist sound planners that need only $\mbox{poly}(H,d)$ queries regardless of the MDP, where $H$ is the horizon and $d$ is the dimensionality of the features. We answer this question in the negative: we show that any sound planner must query at least $\min(e^{\Omega(d)},\Omega(2^H))$ samples in the fized-horizon setting and $e^{\Omega(d)}$ samples in the discounted setting. We also show that for any $\delta>0$, the least-squares value iteration algorithm with $\tilde{\mathcal{O}}(H^5 d^{H+1}/\delta^2)$ queries can compute a $\delta$-optimal policy in the fixed-horizon setting. We discuss implications and remaining open questions.}
}

Endnote

%0 Conference Paper
%T Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions
%A Gellért Weisz
%A Philip Amortila
%A Csaba Szepesvári
%B Proceedings of the 32nd International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2021
%E Vitaly Feldman
%E Katrina Ligett
%E Sivan Sabato	
%F pmlr-v132-weisz21a
%I PMLR
%P 1237--1264
%U https://proceedings.mlr.press/v132/weisz21a.html
%V 132
%X We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a feature map that is available to the planner. Previous work has left open the question of whether there exist sound planners that need only $\mbox{poly}(H,d)$ queries regardless of the MDP, where $H$ is the horizon and $d$ is the dimensionality of the features. We answer this question in the negative: we show that any sound planner must query at least $\min(e^{\Omega(d)},\Omega(2^H))$ samples in the fized-horizon setting and $e^{\Omega(d)}$ samples in the discounted setting. We also show that for any $\delta>0$, the least-squares value iteration algorithm with $\tilde{\mathcal{O}}(H^5 d^{H+1}/\delta^2)$ queries can compute a $\delta$-optimal policy in the fixed-horizon setting. We discuss implications and remaining open questions.

APA


Weisz, G., Amortila, P. & Szepesvári, C.. (2021). Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions. Proceedings of the 32nd International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 132:1237-1264 Available from https://proceedings.mlr.press/v132/weisz21a.html.

Related Material

Download PDF