[edit]
Learning Stochastic Shortest Path with Linear Function Approximation
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:15584-15629, 2022.
Abstract
We study the stochastic shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of unknown models. We call this class of SSP problems as linear mixture SSPs. We propose a novel algorithm with Hoeffding-type confidence sets for learning the linear mixture SSP, which can attain an ˜O(dB1.5⋆K/cmin regret. Here K is the number of episodes, d is the dimension of the feature mapping in the mixture model, B_{\star} bounds the expected cumulative cost of the optimal policy, and c_{\min}>0 is the lower bound of the cost function. Our algorithm also applies to the case when c_{\min} = 0, and an \tilde{\mathcal{O}}(K^{2/3}) regret is guaranteed. To the best of our knowledge, this is the first algorithm with a sublinear regret guarantee for learning linear mixture SSP. Moreover, we design a refined Bernstein-type confidence set and propose an improved algorithm, which provably achieves an \tilde{\mathcal{O}}(d B_{\star}\sqrt{K/c_{\min}}) regret. In complement to the regret upper bounds, we also prove a lower bound of \Omega(dB_{\star} \sqrt{K}). Hence, our improved algorithm matches the lower bound up to a 1/\sqrt{c_{\min}} factor and poly-logarithmic factors, achieving a near-optimal regret guarantee.