[edit]
Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions
Proceedings of the 32nd International Conference on Algorithmic Learning Theory, PMLR 132:1237-1264, 2021.
Abstract
We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a feature map that is available to the planner. Previous work has left open the question of whether there exist sound planners that need only $\mbox{poly}(H,d)$ queries regardless of the MDP, where $H$ is the horizon and $d$ is the dimensionality of the features. We answer this question in the negative: we show that any sound planner must query at least $\min(e^{\Omega(d)},\Omega(2^H))$ samples in the fized-horizon setting and $e^{\Omega(d)}$ samples in the discounted setting. We also show that for any $\delta>0$, the least-squares value iteration algorithm with $\tilde{\mathcal{O}}(H^5 d^{H+1}/\delta^2)$ queries can compute a $\delta$-optimal policy in the fixed-horizon setting. We discuss implications and remaining open questions.