Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

Daniel Vial; Advait Parulekar; Sanjay Shakkottai; R Srikant

Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

Daniel Vial, Advait Parulekar, Sanjay Shakkottai, R Srikant

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:22203-22233, 2022.

Abstract

We propose an algorithm that uses linear function approximation (LFA) for stochastic shortest path (SSP). Under minimal assumptions, it obtains sublinear regret, is computationally efficient, and uses stationary policies. To our knowledge, this is the first such algorithm in the LFA literature (for SSP or other formulations). Our algorithm is a special case of a more general one, which achieves regret square root in the number of episodes given access to a computation oracle.

Cite this Paper

BibTeX

@InProceedings{pmlr-v162-vial22a,
  title = 	 {Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation},
  author =       {Vial, Daniel and Parulekar, Advait and Shakkottai, Sanjay and Srikant, R},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {22203--22233},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/vial22a/vial22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/vial22a.html},
  abstract = 	 {We propose an algorithm that uses linear function approximation (LFA) for stochastic shortest path (SSP). Under minimal assumptions, it obtains sublinear regret, is computationally efficient, and uses stationary policies. To our knowledge, this is the first such algorithm in the LFA literature (for SSP or other formulations). Our algorithm is a special case of a more general one, which achieves regret square root in the number of episodes given access to a computation oracle.}
}

Endnote

%0 Conference Paper
%T Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation
%A Daniel Vial
%A Advait Parulekar
%A Sanjay Shakkottai
%A R Srikant
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-vial22a
%I PMLR
%P 22203--22233
%U https://proceedings.mlr.press/v162/vial22a.html
%V 162
%X We propose an algorithm that uses linear function approximation (LFA) for stochastic shortest path (SSP). Under minimal assumptions, it obtains sublinear regret, is computationally efficient, and uses stationary policies. To our knowledge, this is the first such algorithm in the LFA literature (for SSP or other formulations). Our algorithm is a special case of a more general one, which achieves regret square root in the number of episodes given access to a computation oracle.

APA

Vial, D., Parulekar, A., Shakkottai, S. & Srikant, R.. (2022). Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:22203-22233 Available from https://proceedings.mlr.press/v162/vial22a.html.

Related Material

Download PDF