Open Problem: The Dependence of Sample Complexity Lower Bounds on Planning Horizon


Nan Jiang, Alekh Agarwal ;
Proceedings of the 31st Conference On Learning Theory, PMLR 75:3395-3398, 2018.


In reinforcement learning (RL), problems with long planning horizons are perceived as very challenging. The recent advances in PAC RL, however, show that the sample complexity of RL does not depend on planning horizon except at a superficial level. How can we explain such a difference? Noting that the technical assumptions in these upper bounds might have hidden away the challenges of long horizons, we ask the question: \emph{can we prove a lower bound with a horizon dependence when such assumptions are removed?} We also provide a few observations on the desired characteristics of the lower bound construction.

Related Material