Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions

Gellért Weisz, Philip Amortila, Csaba Szepesvári
Proceedings of the 32nd International Conference on Algorithmic Learning Theory, PMLR 132:1237-1264, 2021.

Abstract

We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a feature map that is available to the planner. Previous work has left open the question of whether there exist sound planners that need only poly(H,d) queries regardless of the MDP, where H is the horizon and d is the dimensionality of the features. We answer this question in the negative: we show that any sound planner must query at least min samples in the fized-horizon setting and e^{\Omega(d)} samples in the discounted setting. We also show that for any \delta>0, the least-squares value iteration algorithm with \tilde{\mathcal{O}}(H^5 d^{H+1}/\delta^2) queries can compute a \delta-optimal policy in the fixed-horizon setting. We discuss implications and remaining open questions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v132-weisz21a, title = {Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions}, author = {Weisz, Gell{\'e}rt and Amortila, Philip and {Sz}epesv{\'a}ri, {Cs}aba}, booktitle = {Proceedings of the 32nd International Conference on Algorithmic Learning Theory}, pages = {1237--1264}, year = {2021}, editor = {Feldman, Vitaly and Ligett, Katrina and Sabato, Sivan}, volume = {132}, series = {Proceedings of Machine Learning Research}, month = {16--19 Mar}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v132/weisz21a/weisz21a.pdf}, url = {https://proceedings.mlr.press/v132/weisz21a.html}, abstract = {We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a feature map that is available to the planner. Previous work has left open the question of whether there exist sound planners that need only $\mbox{poly}(H,d)$ queries regardless of the MDP, where $H$ is the horizon and $d$ is the dimensionality of the features. We answer this question in the negative: we show that any sound planner must query at least $\min(e^{\Omega(d)},\Omega(2^H))$ samples in the fized-horizon setting and $e^{\Omega(d)}$ samples in the discounted setting. We also show that for any $\delta>0$, the least-squares value iteration algorithm with $\tilde{\mathcal{O}}(H^5 d^{H+1}/\delta^2)$ queries can compute a $\delta$-optimal policy in the fixed-horizon setting. We discuss implications and remaining open questions.} }
Endnote
%0 Conference Paper %T Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions %A Gellért Weisz %A Philip Amortila %A Csaba Szepesvári %B Proceedings of the 32nd International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2021 %E Vitaly Feldman %E Katrina Ligett %E Sivan Sabato %F pmlr-v132-weisz21a %I PMLR %P 1237--1264 %U https://proceedings.mlr.press/v132/weisz21a.html %V 132 %X We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a feature map that is available to the planner. Previous work has left open the question of whether there exist sound planners that need only $\mbox{poly}(H,d)$ queries regardless of the MDP, where $H$ is the horizon and $d$ is the dimensionality of the features. We answer this question in the negative: we show that any sound planner must query at least $\min(e^{\Omega(d)},\Omega(2^H))$ samples in the fized-horizon setting and $e^{\Omega(d)}$ samples in the discounted setting. We also show that for any $\delta>0$, the least-squares value iteration algorithm with $\tilde{\mathcal{O}}(H^5 d^{H+1}/\delta^2)$ queries can compute a $\delta$-optimal policy in the fixed-horizon setting. We discuss implications and remaining open questions.
APA
Weisz, G., Amortila, P. & Szepesvári, C.. (2021). Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions. Proceedings of the 32nd International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 132:1237-1264 Available from https://proceedings.mlr.press/v132/weisz21a.html.

Related Material