Generating Safety-Critical Automotive C-programs using LLMs with Formal Verification

Merlijn Sevenhuijsen, Minal Suresh Patil, Mattias Nyberg, Gustav Ung
Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning, PMLR 284:353-378, 2025.

Abstract

We evaluate the feasibility of generating formally verified C code that adheres to both functional and non-functional requirements using Large Language Models (LLMs) for three real industrial, automotive safety-critical software modules. We explore the capabilities of ten LLMs and four prompting techniques — Zero-Shot, Zero-Shot Chain-of-Thought, One-Shot, and One-Shot Chain-of-Thought — to generate C programs for the three modules. Functional correctness of generated programs is assessed through functional verification, and adherence to non-functional requirements is evaluated using an industrial static analyzer, along with human evaluation. The results demonstrate that it is feasible for LLMs to generate functionally correct code, with success rates of 540/800, 59/800, and 46/800 for the three modules. Additionally, the generated programs frequently adhere to the defined non-functional requirements. In the cases where the LLM-generated programs did not adhere to the non-functional requirements, deviations typically involve violations of single-read and single-write access patterns or minimal variable scope constraints. These findings highlight the promise and limitations of using LLMs to generate industrial safety-critical C programs, providing insight into improving automated LLM-based program generation in the automotive safety-critical domain.

Cite this Paper


BibTeX
@InProceedings{pmlr-v284-sevenhuijsen25a, title = {Generating Safety-Critical Automotive C-programs using LLMs with Formal Verification}, author = {Sevenhuijsen, Merlijn and Patil, Minal Suresh and Nyberg, Mattias and Ung, Gustav}, booktitle = {Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning}, pages = {353--378}, year = {2025}, editor = {H. Gilpin, Leilani and Giunchiglia, Eleonora and Hitzler, Pascal and van Krieken, Emile}, volume = {284}, series = {Proceedings of Machine Learning Research}, month = {08--10 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v284/main/assets/sevenhuijsen25a/sevenhuijsen25a.pdf}, url = {https://proceedings.mlr.press/v284/sevenhuijsen25a.html}, abstract = {We evaluate the feasibility of generating formally verified C code that adheres to both functional and non-functional requirements using Large Language Models (LLMs) for three real industrial, automotive safety-critical software modules. We explore the capabilities of ten LLMs and four prompting techniques — Zero-Shot, Zero-Shot Chain-of-Thought, One-Shot, and One-Shot Chain-of-Thought — to generate C programs for the three modules. Functional correctness of generated programs is assessed through functional verification, and adherence to non-functional requirements is evaluated using an industrial static analyzer, along with human evaluation. The results demonstrate that it is feasible for LLMs to generate functionally correct code, with success rates of 540/800, 59/800, and 46/800 for the three modules. Additionally, the generated programs frequently adhere to the defined non-functional requirements. In the cases where the LLM-generated programs did not adhere to the non-functional requirements, deviations typically involve violations of single-read and single-write access patterns or minimal variable scope constraints. These findings highlight the promise and limitations of using LLMs to generate industrial safety-critical C programs, providing insight into improving automated LLM-based program generation in the automotive safety-critical domain.} }
Endnote
%0 Conference Paper %T Generating Safety-Critical Automotive C-programs using LLMs with Formal Verification %A Merlijn Sevenhuijsen %A Minal Suresh Patil %A Mattias Nyberg %A Gustav Ung %B Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning %C Proceedings of Machine Learning Research %D 2025 %E Leilani H. Gilpin %E Eleonora Giunchiglia %E Pascal Hitzler %E Emile van Krieken %F pmlr-v284-sevenhuijsen25a %I PMLR %P 353--378 %U https://proceedings.mlr.press/v284/sevenhuijsen25a.html %V 284 %X We evaluate the feasibility of generating formally verified C code that adheres to both functional and non-functional requirements using Large Language Models (LLMs) for three real industrial, automotive safety-critical software modules. We explore the capabilities of ten LLMs and four prompting techniques — Zero-Shot, Zero-Shot Chain-of-Thought, One-Shot, and One-Shot Chain-of-Thought — to generate C programs for the three modules. Functional correctness of generated programs is assessed through functional verification, and adherence to non-functional requirements is evaluated using an industrial static analyzer, along with human evaluation. The results demonstrate that it is feasible for LLMs to generate functionally correct code, with success rates of 540/800, 59/800, and 46/800 for the three modules. Additionally, the generated programs frequently adhere to the defined non-functional requirements. In the cases where the LLM-generated programs did not adhere to the non-functional requirements, deviations typically involve violations of single-read and single-write access patterns or minimal variable scope constraints. These findings highlight the promise and limitations of using LLMs to generate industrial safety-critical C programs, providing insight into improving automated LLM-based program generation in the automotive safety-critical domain.
APA
Sevenhuijsen, M., Patil, M.S., Nyberg, M. & Ung, G.. (2025). Generating Safety-Critical Automotive C-programs using LLMs with Formal Verification. Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning, in Proceedings of Machine Learning Research 284:353-378 Available from https://proceedings.mlr.press/v284/sevenhuijsen25a.html.

Related Material