Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model

Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric
Proceedings of the 32nd International Conference on Algorithmic Learning Theory, PMLR 132:1157-1178, 2021.

Abstract

We consider the objective of computing an $\epsilon$-optimal policy in a stochastic shortest path (SSP) setting, provided that we can access a generative sampling oracle. We propose two algorithms for this setting and derive PAC bounds on their sample complexity: one for the case of positive costs and the other for the case of non-negative costs under a restricted optimality criterion. While tight sample complexity bounds have been derived for the finite-horizon and discounted MDPs, the SSP problem is a strict generalization of these settings and it poses additional technical challenges due to the fact that no specific time horizon is prescribed and policies may never terminate, i.e., we are possibly facing non-proper policies. As a consequence, we can neither directly apply existing techniques minimizing sample complexity nor rely on a regret-to-PAC conversion leveraging recent regret bounds for SSP. Our analysis instead combines SSP-specific tools and variance reduction techniques to obtain the first sample complexity bounds for this setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v132-tarbouriech21a, title = {Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model}, author = {Tarbouriech, Jean and Pirotta, Matteo and Valko, Michal and Lazaric, Alessandro}, booktitle = {Proceedings of the 32nd International Conference on Algorithmic Learning Theory}, pages = {1157--1178}, year = {2021}, editor = {Feldman, Vitaly and Ligett, Katrina and Sabato, Sivan}, volume = {132}, series = {Proceedings of Machine Learning Research}, month = {16--19 Mar}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v132/tarbouriech21a/tarbouriech21a.pdf}, url = {https://proceedings.mlr.press/v132/tarbouriech21a.html}, abstract = {We consider the objective of computing an $\epsilon$-optimal policy in a stochastic shortest path (SSP) setting, provided that we can access a generative sampling oracle. We propose two algorithms for this setting and derive PAC bounds on their sample complexity: one for the case of positive costs and the other for the case of non-negative costs under a restricted optimality criterion. While tight sample complexity bounds have been derived for the finite-horizon and discounted MDPs, the SSP problem is a strict generalization of these settings and it poses additional technical challenges due to the fact that no specific time horizon is prescribed and policies may never terminate, i.e., we are possibly facing non-proper policies. As a consequence, we can neither directly apply existing techniques minimizing sample complexity nor rely on a regret-to-PAC conversion leveraging recent regret bounds for SSP. Our analysis instead combines SSP-specific tools and variance reduction techniques to obtain the first sample complexity bounds for this setting.} }
Endnote
%0 Conference Paper %T Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model %A Jean Tarbouriech %A Matteo Pirotta %A Michal Valko %A Alessandro Lazaric %B Proceedings of the 32nd International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2021 %E Vitaly Feldman %E Katrina Ligett %E Sivan Sabato %F pmlr-v132-tarbouriech21a %I PMLR %P 1157--1178 %U https://proceedings.mlr.press/v132/tarbouriech21a.html %V 132 %X We consider the objective of computing an $\epsilon$-optimal policy in a stochastic shortest path (SSP) setting, provided that we can access a generative sampling oracle. We propose two algorithms for this setting and derive PAC bounds on their sample complexity: one for the case of positive costs and the other for the case of non-negative costs under a restricted optimality criterion. While tight sample complexity bounds have been derived for the finite-horizon and discounted MDPs, the SSP problem is a strict generalization of these settings and it poses additional technical challenges due to the fact that no specific time horizon is prescribed and policies may never terminate, i.e., we are possibly facing non-proper policies. As a consequence, we can neither directly apply existing techniques minimizing sample complexity nor rely on a regret-to-PAC conversion leveraging recent regret bounds for SSP. Our analysis instead combines SSP-specific tools and variance reduction techniques to obtain the first sample complexity bounds for this setting.
APA
Tarbouriech, J., Pirotta, M., Valko, M. & Lazaric, A.. (2021). Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model. Proceedings of the 32nd International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 132:1157-1178 Available from https://proceedings.mlr.press/v132/tarbouriech21a.html.

Related Material