BarrierBench: Evaluating Large Language Models for Safety Verification in Dynamical Systems

Ali Taheri; Alireza Taban; Sadegh Soudjani; Ashutosh Trivedi

BarrierBench: Evaluating Large Language Models for Safety Verification in Dynamical Systems

Ali Taheri, Alireza Taban, Sadegh Soudjani, Ashutosh Trivedi

Proceedings of The 8th Annual Learning for Dynamics and Control Conference, PMLR 331:640-661, 2026.

Abstract

Safety verification of dynamical systems via barrier certificates is central to ensuring correctness in autonomous and other safety-critical applications. Synthesizing such certificates requires discovering mathematical functions that characterize inductive state invariants and provably separate safe and unsafe regions. Existing approaches, however, often struggle to scale computationally, depend on carefully designed templates, and rely on exhaustive or incremental searches over function spaces. They also demand substantial manual ingenuity and mathematical sophistication in constructing the search infrastructure, including selecting template families, choosing appropriate solvers, tuning hyperparameters in data-driven methods, and designing effective sampling procedures. As a result, successful barrier certificate synthesis requires both a deep understanding of dynamical systems and control theory and practical experience with existing synthesis techniques. Much of this expertise has traditionally been transmitted among practitioners through natural language rather than formalized mathematical procedures. This observation raises a natural question: can the linguistic and analogical reasoning that experts use informally be captured and operationalized by large language models (LLMs)? Motivated by this question, we present an *LLM-agentic framework for barrier certificate synthesis* that uses natural language reasoning to propose, refine, and validate candidate certificates. The framework combines *LLM-driven template discovery* with *SMT-based verification* and supports *barrier-controller co-synthesis* for controlled systems, thereby ensuring mathematical compatibility between safety certificates and feedback control laws. To evaluate this capability, we introduce *BarrierBench*, a benchmark of 100 dynamical systems spanning linear, nonlinear, discrete-time, and continuous-time settings, including 68 controlled systems that require barrier-controller co-synthesis. Our experiments assess not only the effectiveness of LLM-guided barrier synthesis but also the value of *retrieval-augmented generation (RAG)* and *agentic coordination strategies* in improving reliability and performance. Across these tasks, the framework achieves over 90% success in generating valid certificates and demonstrates structural diversity, ranging from simple quadratic forms to high-order coupled polynomials. By releasing BarrierBench with the accompanying toolchain, we aim to establish a *community testbed* for advancing the integration of language-based reasoning with formal verification for dynamical systems. The benchmark is publicly available at https://hycodev.com/dataset/barrierbench.

Cite this Paper

BibTeX

@InProceedings{pmlr-v331-taheri26a,
  title = 	 {BarrierBench: Evaluating Large Language Models for Safety Verification in Dynamical Systems},
  author =       {Taheri, Ali and Taban, Alireza and Soudjani, Sadegh and Trivedi, Ashutosh},
  booktitle = 	 {Proceedings of The 8th Annual Learning for Dynamics and Control Conference},
  pages = 	 {640--661},
  year = 	 {2026},
  editor = 	 {Sukhatme, Gaurav and Lindemann, Lars and Tu, Stephen and Wierman, Adam and Atanasov, Nikolay},
  volume = 	 {331},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v331/main/assets/taheri26a/taheri26a.pdf},
  url = 	 {https://proceedings.mlr.press/v331/taheri26a.html},
  abstract = 	 {Safety verification of dynamical systems via barrier certificates is central to ensuring correctness in autonomous and other safety-critical applications. Synthesizing such certificates requires discovering mathematical functions that characterize inductive state invariants and provably separate safe and unsafe regions. Existing approaches, however, often struggle to scale computationally, depend on carefully designed templates, and rely on exhaustive or incremental searches over function spaces. They also demand substantial manual ingenuity and mathematical sophistication in constructing the search infrastructure, including selecting template families, choosing appropriate solvers, tuning hyperparameters in data-driven methods, and designing effective sampling procedures. As a result, successful barrier certificate synthesis requires both a deep understanding of dynamical systems and control theory and practical experience with existing synthesis techniques. Much of this expertise has traditionally been transmitted among practitioners through natural language rather than formalized mathematical procedures. This observation raises a natural question: can the linguistic and analogical reasoning that experts use informally be captured and operationalized by large language models (LLMs)? Motivated by this question, we present an *LLM-agentic framework for barrier certificate synthesis* that uses natural language reasoning to propose, refine, and validate candidate certificates. The framework combines *LLM-driven template discovery* with *SMT-based verification* and supports *barrier-controller co-synthesis* for controlled systems, thereby ensuring mathematical compatibility between safety certificates and feedback control laws. To evaluate this capability, we introduce *BarrierBench*, a benchmark of 100 dynamical systems spanning linear, nonlinear, discrete-time, and continuous-time settings, including 68 controlled systems that require barrier-controller co-synthesis. Our experiments assess not only the effectiveness of LLM-guided barrier synthesis but also the value of *retrieval-augmented generation (RAG)* and *agentic coordination strategies* in improving reliability and performance. Across these tasks, the framework achieves over 90% success in generating valid certificates and demonstrates structural diversity, ranging from simple quadratic forms to high-order coupled polynomials. By releasing BarrierBench with the accompanying toolchain, we aim to establish a *community testbed* for advancing the integration of language-based reasoning with formal verification for dynamical systems. The benchmark is publicly available at https://hycodev.com/dataset/barrierbench.}
}

Endnote

%0 Conference Paper
%T BarrierBench: Evaluating Large Language Models for Safety Verification in Dynamical Systems
%A Ali Taheri
%A Alireza Taban
%A Sadegh Soudjani
%A Ashutosh Trivedi
%B Proceedings of The 8th Annual Learning for Dynamics and Control Conference
%C Proceedings of Machine Learning Research
%D 2026
%E Gaurav Sukhatme
%E Lars Lindemann
%E Stephen Tu
%E Adam Wierman
%E Nikolay Atanasov	
%F pmlr-v331-taheri26a
%I PMLR
%P 640--661
%U https://proceedings.mlr.press/v331/taheri26a.html
%V 331
%X Safety verification of dynamical systems via barrier certificates is central to ensuring correctness in autonomous and other safety-critical applications. Synthesizing such certificates requires discovering mathematical functions that characterize inductive state invariants and provably separate safe and unsafe regions. Existing approaches, however, often struggle to scale computationally, depend on carefully designed templates, and rely on exhaustive or incremental searches over function spaces. They also demand substantial manual ingenuity and mathematical sophistication in constructing the search infrastructure, including selecting template families, choosing appropriate solvers, tuning hyperparameters in data-driven methods, and designing effective sampling procedures. As a result, successful barrier certificate synthesis requires both a deep understanding of dynamical systems and control theory and practical experience with existing synthesis techniques. Much of this expertise has traditionally been transmitted among practitioners through natural language rather than formalized mathematical procedures. This observation raises a natural question: can the linguistic and analogical reasoning that experts use informally be captured and operationalized by large language models (LLMs)? Motivated by this question, we present an *LLM-agentic framework for barrier certificate synthesis* that uses natural language reasoning to propose, refine, and validate candidate certificates. The framework combines *LLM-driven template discovery* with *SMT-based verification* and supports *barrier-controller co-synthesis* for controlled systems, thereby ensuring mathematical compatibility between safety certificates and feedback control laws. To evaluate this capability, we introduce *BarrierBench*, a benchmark of 100 dynamical systems spanning linear, nonlinear, discrete-time, and continuous-time settings, including 68 controlled systems that require barrier-controller co-synthesis. Our experiments assess not only the effectiveness of LLM-guided barrier synthesis but also the value of *retrieval-augmented generation (RAG)* and *agentic coordination strategies* in improving reliability and performance. Across these tasks, the framework achieves over 90% success in generating valid certificates and demonstrates structural diversity, ranging from simple quadratic forms to high-order coupled polynomials. By releasing BarrierBench with the accompanying toolchain, we aim to establish a *community testbed* for advancing the integration of language-based reasoning with formal verification for dynamical systems. The benchmark is publicly available at https://hycodev.com/dataset/barrierbench.

APA

Taheri, A., Taban, A., Soudjani, S. & Trivedi, A.. (2026). BarrierBench: Evaluating Large Language Models for Safety Verification in Dynamical Systems. Proceedings of The 8th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 331:640-661 Available from https://proceedings.mlr.press/v331/taheri26a.html.

BarrierBench: Evaluating Large Language Models for Safety Verification in Dynamical Systems

Abstract

Cite this Paper

Related Material