A Unified Algorithm for Stochastic Path Problems

Christoph Dann; Chen-Yu Wei; Julian Zimmert

A Unified Algorithm for Stochastic Path Problems

Christoph Dann, Chen-Yu Wei, Julian Zimmert

Proceedings of The 34th International Conference on Algorithmic Learning Theory, PMLR 201:510-557, 2023.

Abstract

We study reinforcement learning in stochastic path (SP) problems. The goal in these problems is to maximize the expected sum of rewards until the agent reaches a terminal state. We provide the first regret guarantees in this general problem by analyzing a simple optimistic algorithm. Our regret bound matches the best known results for the well-studied special case of stochastic shortest path (SSP) with all non-positive rewards. For SSP, we present an adaptation procedure for the case when the scale of rewards

$B_\star$ is unknown. We show that there is no price for adaptation, and our regret bound matches that with a known

$B_\star$ . We also provide a scale adaptation procedure for the special case of stochastic longest paths (SLP) where all rewards are non-negative. However, unlike in SSP, we show through a lower bound that there is an unavoidable price for adaptation.

Cite this Paper

BibTeX


@InProceedings{pmlr-v201-dann23b,
  title = 	 {A Unified Algorithm for Stochastic Path Problems},
  author =       {Dann, Christoph and Wei, Chen-Yu and Zimmert, Julian},
  booktitle = 	 {Proceedings of The 34th International Conference on Algorithmic Learning Theory},
  pages = 	 {510--557},
  year = 	 {2023},
  editor = 	 {Agrawal, Shipra and Orabona, Francesco},
  volume = 	 {201},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {20 Feb--23 Feb},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v201/dann23b/dann23b.pdf},
  url = 	 {https://proceedings.mlr.press/v201/dann23b.html},
  abstract = 	 {We study reinforcement learning in stochastic path (SP) problems. The goal in these problems is to maximize the expected sum of rewards until the agent reaches a terminal state. We provide the first regret guarantees in this general problem by analyzing a simple optimistic algorithm. Our regret bound matches the best known results for the well-studied special case of stochastic shortest path (SSP) with all non-positive rewards. For SSP, we present an adaptation procedure for the case when the scale of rewards $B_\star$ is unknown. We show that there is no price for adaptation, and our regret bound matches that with a known $B_\star$. We also provide a scale adaptation procedure for the special case of stochastic longest paths (SLP) where all rewards are non-negative. However, unlike in SSP, we show through a lower bound that there is an unavoidable price for adaptation. }
}

Endnote

%0 Conference Paper
%T A Unified Algorithm for Stochastic Path Problems
%A Christoph Dann
%A Chen-Yu Wei
%A Julian Zimmert
%B Proceedings of The 34th International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2023
%E Shipra Agrawal
%E Francesco Orabona	
%F pmlr-v201-dann23b
%I PMLR
%P 510--557
%U https://proceedings.mlr.press/v201/dann23b.html
%V 201
%X We study reinforcement learning in stochastic path (SP) problems. The goal in these problems is to maximize the expected sum of rewards until the agent reaches a terminal state. We provide the first regret guarantees in this general problem by analyzing a simple optimistic algorithm. Our regret bound matches the best known results for the well-studied special case of stochastic shortest path (SSP) with all non-positive rewards. For SSP, we present an adaptation procedure for the case when the scale of rewards $B_\star$ is unknown. We show that there is no price for adaptation, and our regret bound matches that with a known $B_\star$. We also provide a scale adaptation procedure for the special case of stochastic longest paths (SLP) where all rewards are non-negative. However, unlike in SSP, we show through a lower bound that there is an unavoidable price for adaptation.

APA


Dann, C., Wei, C. & Zimmert, J.. (2023). A Unified Algorithm for Stochastic Path Problems. Proceedings of The 34th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 201:510-557 Available from https://proceedings.mlr.press/v201/dann23b.html.

Related Material

Download PDF