Learning Stochastic Shortest Path with Linear Function Approximation

Yifei Min; Jiafan He; Tianhao Wang; Quanquan Gu

Learning Stochastic Shortest Path with Linear Function Approximation

Yifei Min, Jiafan He, Tianhao Wang, Quanquan Gu

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:15584-15629, 2022.

Abstract

We study the stochastic shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of unknown models. We call this class of SSP problems as linear mixture SSPs. We propose a novel algorithm with Hoeffding-type confidence sets for learning the linear mixture SSP, which can attain an

$\tilde{\mathcal{O}}(d B_{\star}^{1.5}\sqrt{K/c_{\min}})$ regret. Here

$K$ is the number of episodes,

$d$ is the dimension of the feature mapping in the mixture model,

$B_{\star}$ bounds the expected cumulative cost of the optimal policy, and

$c_{\min}>0$ is the lower bound of the cost function. Our algorithm also applies to the case when

$c_{\min} = 0$ , and an

$\tilde{\mathcal{O}}(K^{2/3})$ regret is guaranteed. To the best of our knowledge, this is the first algorithm with a sublinear regret guarantee for learning linear mixture SSP. Moreover, we design a refined Bernstein-type confidence set and propose an improved algorithm, which provably achieves an

$\tilde{\mathcal{O}}(d B_{\star}\sqrt{K/c_{\min}})$ regret. In complement to the regret upper bounds, we also prove a lower bound of

$\Omega(dB_{\star} \sqrt{K})$ . Hence, our improved algorithm matches the lower bound up to a

$1/\sqrt{c_{\min}}$ factor and poly-logarithmic factors, achieving a near-optimal regret guarantee.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-min22a,
  title = 	 {Learning Stochastic Shortest Path with Linear Function Approximation},
  author =       {Min, Yifei and He, Jiafan and Wang, Tianhao and Gu, Quanquan},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {15584--15629},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/min22a/min22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/min22a.html},
  abstract = 	 {We study the stochastic shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of unknown models. We call this class of SSP problems as linear mixture SSPs. We propose a novel algorithm with Hoeffding-type confidence sets for learning the linear mixture SSP, which can attain an $\tilde{\mathcal{O}}(d B_{\star}^{1.5}\sqrt{K/c_{\min}})$ regret. Here $K$ is the number of episodes, $d$ is the dimension of the feature mapping in the mixture model, $B_{\star}$ bounds the expected cumulative cost of the optimal policy, and $c_{\min}>0$ is the lower bound of the cost function. Our algorithm also applies to the case when $c_{\min} = 0$, and an $\tilde{\mathcal{O}}(K^{2/3})$ regret is guaranteed. To the best of our knowledge, this is the first algorithm with a sublinear regret guarantee for learning linear mixture SSP. Moreover, we design a refined Bernstein-type confidence set and propose an improved algorithm, which provably achieves an $\tilde{\mathcal{O}}(d B_{\star}\sqrt{K/c_{\min}})$ regret. In complement to the regret upper bounds, we also prove a lower bound of $\Omega(dB_{\star} \sqrt{K})$. Hence, our improved algorithm matches the lower bound up to a $1/\sqrt{c_{\min}}$ factor and poly-logarithmic factors, achieving a near-optimal regret guarantee.}
}

Endnote

%0 Conference Paper
%T Learning Stochastic Shortest Path with Linear Function Approximation
%A Yifei Min
%A Jiafan He
%A Tianhao Wang
%A Quanquan Gu
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-min22a
%I PMLR
%P 15584--15629
%U https://proceedings.mlr.press/v162/min22a.html
%V 162
%X We study the stochastic shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of unknown models. We call this class of SSP problems as linear mixture SSPs. We propose a novel algorithm with Hoeffding-type confidence sets for learning the linear mixture SSP, which can attain an $\tilde{\mathcal{O}}(d B_{\star}^{1.5}\sqrt{K/c_{\min}})$ regret. Here $K$ is the number of episodes, $d$ is the dimension of the feature mapping in the mixture model, $B_{\star}$ bounds the expected cumulative cost of the optimal policy, and $c_{\min}>0$ is the lower bound of the cost function. Our algorithm also applies to the case when $c_{\min} = 0$, and an $\tilde{\mathcal{O}}(K^{2/3})$ regret is guaranteed. To the best of our knowledge, this is the first algorithm with a sublinear regret guarantee for learning linear mixture SSP. Moreover, we design a refined Bernstein-type confidence set and propose an improved algorithm, which provably achieves an $\tilde{\mathcal{O}}(d B_{\star}\sqrt{K/c_{\min}})$ regret. In complement to the regret upper bounds, we also prove a lower bound of $\Omega(dB_{\star} \sqrt{K})$. Hence, our improved algorithm matches the lower bound up to a $1/\sqrt{c_{\min}}$ factor and poly-logarithmic factors, achieving a near-optimal regret guarantee.

APA


Min, Y., He, J., Wang, T. & Gu, Q.. (2022). Learning Stochastic Shortest Path with Linear Function Approximation. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:15584-15629 Available from https://proceedings.mlr.press/v162/min22a.html.

Related Material

Download PDF