Finding the Stochastic Shortest Path with Low Regret: the Adversarial Cost and Unknown Transition Case

Liyu Chen; Haipeng Luo

Finding the Stochastic Shortest Path with Low Regret: the Adversarial Cost and Unknown Transition Case

Liyu Chen, Haipeng Luo

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:1651-1660, 2021.

Abstract

We make significant progress toward the stochastic shortest path problem with adversarial costs and unknown transition. Specifically, we develop algorithms that achieve $O(\sqrt{S^2ADT_\star K})$ regret for the full-information setting and $O(\sqrt{S^3A^2DT_\star K})$ regret for the bandit feedback setting, where $D$ is the diameter, $T_\star$ is the expected hitting time of the optimal policy, $S$ is the number of states, $A$ is the number of actions, and $K$ is the number of episodes. Our work strictly improves (Rosenberg and Mansour, 2020) in the full information setting, extends (Chen et al., 2020) from known transition to unknown transition, and is also the first to consider the most challenging combination: bandit feedback with adversarial costs and unknown transition. To remedy the gap between our upper bounds and the current best lower bounds constructed via a stochastically oblivious adversary, we also propose algorithms with near-optimal regret for this special case.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-chen21l,
  title = 	 {Finding the Stochastic Shortest Path with Low Regret: the Adversarial Cost and Unknown Transition Case},
  author =       {Chen, Liyu and Luo, Haipeng},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {1651--1660},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/chen21l/chen21l.pdf},
  url = 	 {https://proceedings.mlr.press/v139/chen21l.html},
  abstract = 	 {We make significant progress toward the stochastic shortest path problem with adversarial costs and unknown transition. Specifically, we develop algorithms that achieve $O(\sqrt{S^2ADT_\star K})$ regret for the full-information setting and $O(\sqrt{S^3A^2DT_\star K})$ regret for the bandit feedback setting, where $D$ is the diameter, $T_\star$ is the expected hitting time of the optimal policy, $S$ is the number of states, $A$ is the number of actions, and $K$ is the number of episodes. Our work strictly improves (Rosenberg and Mansour, 2020) in the full information setting, extends (Chen et al., 2020) from known transition to unknown transition, and is also the first to consider the most challenging combination: bandit feedback with adversarial costs and unknown transition. To remedy the gap between our upper bounds and the current best lower bounds constructed via a stochastically oblivious adversary, we also propose algorithms with near-optimal regret for this special case.}
}

Endnote

%0 Conference Paper
%T Finding the Stochastic Shortest Path with Low Regret: the Adversarial Cost and Unknown Transition Case
%A Liyu Chen
%A Haipeng Luo
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-chen21l
%I PMLR
%P 1651--1660
%U https://proceedings.mlr.press/v139/chen21l.html
%V 139
%X We make significant progress toward the stochastic shortest path problem with adversarial costs and unknown transition. Specifically, we develop algorithms that achieve $O(\sqrt{S^2ADT_\star K})$ regret for the full-information setting and $O(\sqrt{S^3A^2DT_\star K})$ regret for the bandit feedback setting, where $D$ is the diameter, $T_\star$ is the expected hitting time of the optimal policy, $S$ is the number of states, $A$ is the number of actions, and $K$ is the number of episodes. Our work strictly improves (Rosenberg and Mansour, 2020) in the full information setting, extends (Chen et al., 2020) from known transition to unknown transition, and is also the first to consider the most challenging combination: bandit feedback with adversarial costs and unknown transition. To remedy the gap between our upper bounds and the current best lower bounds constructed via a stochastically oblivious adversary, we also propose algorithms with near-optimal regret for this special case.

APA

Chen, L. & Luo, H.. (2021). Finding the Stochastic Shortest Path with Low Regret: the Adversarial Cost and Unknown Transition Case. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:1651-1660 Available from https://proceedings.mlr.press/v139/chen21l.html.

Finding the Stochastic Shortest Path with Low Regret: the Adversarial Cost and Unknown Transition Case

Abstract

Cite this Paper

Related Material