Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions

Gellért Weisz, Philip Amortila, Csaba Szepesvári
Proceedings of the 32nd International Conference on Algorithmic Learning Theory, PMLR 132:1237-1264, 2021.

Abstract

We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a feature map that is available to the planner. Previous work has left open the question of whether there exist sound planners that need only $\mbox{poly}(H,d)$ queries regardless of the MDP, where $H$ is the horizon and $d$ is the dimensionality of the features. We answer this question in the negative: we show that any sound planner must query at least $\min(e^{\Omega(d)},\Omega(2^H))$ samples in the fized-horizon setting and $e^{\Omega(d)}$ samples in the discounted setting. We also show that for any $\delta>0$, the least-squares value iteration algorithm with $\tilde{\mathcal{O}}(H^5 d^{H+1}/\delta^2)$ queries can compute a $\delta$-optimal policy in the fixed-horizon setting. We discuss implications and remaining open questions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v132-weisz21a, title = {Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions}, author = {Weisz, Gell{\'e}rt and Amortila, Philip and {Sz}epesv{\'a}ri, {Cs}aba}, booktitle = {Proceedings of the 32nd International Conference on Algorithmic Learning Theory}, pages = {1237--1264}, year = {2021}, editor = {Feldman, Vitaly and Ligett, Katrina and Sabato, Sivan}, volume = {132}, series = {Proceedings of Machine Learning Research}, month = {16--19 Mar}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v132/weisz21a/weisz21a.pdf}, url = {https://proceedings.mlr.press/v132/weisz21a.html}, abstract = {We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a feature map that is available to the planner. Previous work has left open the question of whether there exist sound planners that need only $\mbox{poly}(H,d)$ queries regardless of the MDP, where $H$ is the horizon and $d$ is the dimensionality of the features. We answer this question in the negative: we show that any sound planner must query at least $\min(e^{\Omega(d)},\Omega(2^H))$ samples in the fized-horizon setting and $e^{\Omega(d)}$ samples in the discounted setting. We also show that for any $\delta>0$, the least-squares value iteration algorithm with $\tilde{\mathcal{O}}(H^5 d^{H+1}/\delta^2)$ queries can compute a $\delta$-optimal policy in the fixed-horizon setting. We discuss implications and remaining open questions.} }
Endnote
%0 Conference Paper %T Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions %A Gellért Weisz %A Philip Amortila %A Csaba Szepesvári %B Proceedings of the 32nd International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2021 %E Vitaly Feldman %E Katrina Ligett %E Sivan Sabato %F pmlr-v132-weisz21a %I PMLR %P 1237--1264 %U https://proceedings.mlr.press/v132/weisz21a.html %V 132 %X We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a feature map that is available to the planner. Previous work has left open the question of whether there exist sound planners that need only $\mbox{poly}(H,d)$ queries regardless of the MDP, where $H$ is the horizon and $d$ is the dimensionality of the features. We answer this question in the negative: we show that any sound planner must query at least $\min(e^{\Omega(d)},\Omega(2^H))$ samples in the fized-horizon setting and $e^{\Omega(d)}$ samples in the discounted setting. We also show that for any $\delta>0$, the least-squares value iteration algorithm with $\tilde{\mathcal{O}}(H^5 d^{H+1}/\delta^2)$ queries can compute a $\delta$-optimal policy in the fixed-horizon setting. We discuss implications and remaining open questions.
APA
Weisz, G., Amortila, P. & Szepesvári, C.. (2021). Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions. Proceedings of the 32nd International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 132:1237-1264 Available from https://proceedings.mlr.press/v132/weisz21a.html.

Related Material