Reinforcement Learning from Reachability Specifications: PAC Guarantees with Expected Conditional Distance

Jakub Svoboda, Suguman Bansal, Krishnendu Chatterjee
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:47331-47344, 2024.

Abstract

Reinforcement Learning (RL) from temporal logical specifications is a fundamental problem in sequential decision making. One of the basic and core such specification is the reachability specification that requires a target set to be eventually visited. Despite strong empirical results for RL from such specifications, the theoretical guarantees are bleak, including the impossibility of Probably Approximately Correct (PAC) guarantee for reachability specifications. Given the impossibility result, in this work we consider the problem of RL from reachability specifications along with the information of expected conditional distance (ECD). We present (a) lower bound results which establish the necessity of ECD information for PAC guarantees and (b) an algorithm that establishes PAC-guarantees given the ECD information. To the best of our knowledge, this is the first RL from reachability specifications that does not make any assumptions on the underlying environment to learn policies.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-svoboda24a, title = {Reinforcement Learning from Reachability Specifications: {PAC} Guarantees with Expected Conditional Distance}, author = {Svoboda, Jakub and Bansal, Suguman and Chatterjee, Krishnendu}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {47331--47344}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/svoboda24a/svoboda24a.pdf}, url = {https://proceedings.mlr.press/v235/svoboda24a.html}, abstract = {Reinforcement Learning (RL) from temporal logical specifications is a fundamental problem in sequential decision making. One of the basic and core such specification is the reachability specification that requires a target set to be eventually visited. Despite strong empirical results for RL from such specifications, the theoretical guarantees are bleak, including the impossibility of Probably Approximately Correct (PAC) guarantee for reachability specifications. Given the impossibility result, in this work we consider the problem of RL from reachability specifications along with the information of expected conditional distance (ECD). We present (a) lower bound results which establish the necessity of ECD information for PAC guarantees and (b) an algorithm that establishes PAC-guarantees given the ECD information. To the best of our knowledge, this is the first RL from reachability specifications that does not make any assumptions on the underlying environment to learn policies.} }
Endnote
%0 Conference Paper %T Reinforcement Learning from Reachability Specifications: PAC Guarantees with Expected Conditional Distance %A Jakub Svoboda %A Suguman Bansal %A Krishnendu Chatterjee %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-svoboda24a %I PMLR %P 47331--47344 %U https://proceedings.mlr.press/v235/svoboda24a.html %V 235 %X Reinforcement Learning (RL) from temporal logical specifications is a fundamental problem in sequential decision making. One of the basic and core such specification is the reachability specification that requires a target set to be eventually visited. Despite strong empirical results for RL from such specifications, the theoretical guarantees are bleak, including the impossibility of Probably Approximately Correct (PAC) guarantee for reachability specifications. Given the impossibility result, in this work we consider the problem of RL from reachability specifications along with the information of expected conditional distance (ECD). We present (a) lower bound results which establish the necessity of ECD information for PAC guarantees and (b) an algorithm that establishes PAC-guarantees given the ECD information. To the best of our knowledge, this is the first RL from reachability specifications that does not make any assumptions on the underlying environment to learn policies.
APA
Svoboda, J., Bansal, S. & Chatterjee, K.. (2024). Reinforcement Learning from Reachability Specifications: PAC Guarantees with Expected Conditional Distance. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:47331-47344 Available from https://proceedings.mlr.press/v235/svoboda24a.html.

Related Material