On the uniqueness of solution for the Bellman equation of LTL objectives

Zetong Xuan; Alper Bozkurt; Miroslav Pajic; Yu Wang

On the uniqueness of solution for the Bellman equation of LTL objectives

Zetong Xuan, Alper Bozkurt, Miroslav Pajic, Yu Wang

Proceedings of the 6th Annual Learning for Dynamics & Control Conference, PMLR 242:428-439, 2024.

Abstract

Surrogate rewards for linear temporal logic (LTL) objectives are commonly utilized in planning problems for LTL objectives. In a widely-adopted surrogate reward approach, two discount factors are used to ensure that the expected return approximates the satisfaction probability of the LTL objective. The expected return then can be estimated by methods using the Bellman updates such as reinforcement learning. However, the uniqueness of the solution to the Bellman equation with two discount factors has not been explicitly discussed. We demonstrate with an example that when one of the discount factors is set to one, as allowed in many previous works, the Bellman equation may have multiple solutions, leading to inaccurate evaluation of the expected return. We then propose a condition for the Bellman equation to have the expected return as the unique solution, requiring the solutions for states inside a rejecting bottom strongly connected component (BSCC) to be

$0$ . We prove this condition is sufficient by showing that the solutions for the states with discounting can be separated from those for the states without discounting under this condition.

Cite this Paper

BibTeX


@InProceedings{pmlr-v242-xuan24a,
  title = 	 {On the uniqueness of solution for the {B}ellman equation of {LTL} objectives},
  author =       {Xuan, Zetong and Bozkurt, Alper and Pajic, Miroslav and Wang, Yu},
  booktitle = 	 {Proceedings of the 6th Annual Learning for Dynamics & Control Conference},
  pages = 	 {428--439},
  year = 	 {2024},
  editor = 	 {Abate, Alessandro and Cannon, Mark and Margellos, Kostas and Papachristodoulou, Antonis},
  volume = 	 {242},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--17 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v242/xuan24a/xuan24a.pdf},
  url = 	 {https://proceedings.mlr.press/v242/xuan24a.html},
  abstract = 	 {Surrogate rewards for linear temporal logic (LTL) objectives are commonly utilized in planning problems for LTL objectives. In a widely-adopted surrogate reward approach, two discount factors are used to ensure that the expected return approximates the satisfaction probability of the LTL objective. The expected return then can be estimated by methods using the Bellman updates such as reinforcement learning. However, the uniqueness of the solution to the Bellman equation with two discount factors has not been explicitly discussed. We demonstrate with an example that when one of the discount factors is set to one, as allowed in many previous works, the Bellman equation may have multiple solutions, leading to inaccurate evaluation of the expected return. We then propose a condition for the Bellman equation to have the expected return as the unique solution, requiring the solutions for states inside a rejecting bottom strongly connected component (BSCC) to be $0$. We prove this condition is sufficient by showing that the solutions for the states with discounting can be separated from those for the states without discounting under this condition.}
}

Endnote

%0 Conference Paper
%T On the uniqueness of solution for the Bellman equation of LTL objectives
%A Zetong Xuan
%A Alper Bozkurt
%A Miroslav Pajic
%A Yu Wang
%B Proceedings of the 6th Annual Learning for Dynamics & Control Conference
%C Proceedings of Machine Learning Research
%D 2024
%E Alessandro Abate
%E Mark Cannon
%E Kostas Margellos
%E Antonis Papachristodoulou	
%F pmlr-v242-xuan24a
%I PMLR
%P 428--439
%U https://proceedings.mlr.press/v242/xuan24a.html
%V 242
%X Surrogate rewards for linear temporal logic (LTL) objectives are commonly utilized in planning problems for LTL objectives. In a widely-adopted surrogate reward approach, two discount factors are used to ensure that the expected return approximates the satisfaction probability of the LTL objective. The expected return then can be estimated by methods using the Bellman updates such as reinforcement learning. However, the uniqueness of the solution to the Bellman equation with two discount factors has not been explicitly discussed. We demonstrate with an example that when one of the discount factors is set to one, as allowed in many previous works, the Bellman equation may have multiple solutions, leading to inaccurate evaluation of the expected return. We then propose a condition for the Bellman equation to have the expected return as the unique solution, requiring the solutions for states inside a rejecting bottom strongly connected component (BSCC) to be $0$. We prove this condition is sufficient by showing that the solutions for the states with discounting can be separated from those for the states without discounting under this condition.

APA


Xuan, Z., Bozkurt, A., Pajic, M. & Wang, Y.. (2024). On the uniqueness of solution for the Bellman equation of LTL objectives. Proceedings of the 6th Annual Learning for Dynamics & Control Conference, in Proceedings of Machine Learning Research 242:428-439 Available from https://proceedings.mlr.press/v242/xuan24a.html.

Related Material

Download PDF