Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification

Balaji Rao; William Eiers; Carlo Lipizzi

Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification

Balaji Rao, William Eiers, Carlo Lipizzi

Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning, PMLR 284:814-829, 2025.

Abstract

Formally verifying properties of software code has been a highly desirable task, especially with the emergence of LLM-generated code. In the same vein, they provide an interesting avenue for the exploration of formal verification and mechanistic interpretability. Since the introduction of code-specific models, despite their successes in generating code in Lean4 and Isabelle, the task of generalized theorem proving still remains far from being fully solved and will be a benchmark for reasoning capability in LLMs. In this work, we introduce a framework that generates whole proofs in a formal language to be used within systems that utilize the power of built-in tactics and off-the-shelf automated theorem provers. Our framework includes 3 components: generating natural language statements of the code to be verified, an LLM that generates formal proofs for the given statement, and a module employing heuristics for building the final proof. To train the LLM, we employ a 2-stage fine-tuning process, where we first use SFT-based training to enable the model to generate syntactically correct Isabelle code and then RL-based training that encourages the model to generate proofs verified by a theorem prover. We validate our framework using the miniF2F-test benchmark and the Isabelle proof assistant and design a use case to verify the correctness of the AWS S3 bucket access policy code. We also curate a dataset based on the FVEL\textsubscript{\textnormal{ER}} dataset for future training tasks.

Cite this Paper

BibTeX

@InProceedings{pmlr-v284-rao25a,
  title = 	 {Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification},
  author =       {Rao, Balaji and Eiers, William and Lipizzi, Carlo},
  booktitle = 	 {Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning},
  pages = 	 {814--829},
  year = 	 {2025},
  editor = 	 {H. Gilpin, Leilani and Giunchiglia, Eleonora and Hitzler, Pascal and van Krieken, Emile},
  volume = 	 {284},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {08--10 Sep},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v284/main/assets/rao25a/rao25a.pdf},
  url = 	 {https://proceedings.mlr.press/v284/rao25a.html},
  abstract = 	 {Formally verifying properties of software code has been a highly desirable task, especially with the emergence of LLM-generated code. In the same vein, they provide an interesting avenue for the exploration of formal verification and mechanistic interpretability. Since the introduction of code-specific models, despite their successes in generating code in Lean4 and Isabelle, the task of generalized theorem proving still remains far from being fully solved and will be a benchmark for reasoning capability in LLMs. In this work, we introduce a framework that generates whole proofs in a formal language to be used within systems that utilize the power of built-in tactics and off-the-shelf automated theorem provers. Our framework includes 3 components: generating natural language statements of the code to be verified, an LLM that generates formal proofs for the given statement, and a module employing heuristics for building the final proof. To train the LLM, we employ a 2-stage fine-tuning process, where we first use SFT-based training to enable the model to generate syntactically correct Isabelle code and then RL-based training that encourages the model to generate proofs verified by a theorem prover. We validate our framework using the miniF2F-test benchmark and the Isabelle proof assistant and design a use case to verify the correctness of the AWS S3 bucket access policy code. We also curate a dataset based on the FVEL\textsubscript{\textnormal{ER}} dataset for future training tasks.}
}

Endnote

%0 Conference Paper
%T Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification
%A Balaji Rao
%A William Eiers
%A Carlo Lipizzi
%B Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning
%C Proceedings of Machine Learning Research
%D 2025
%E Leilani H. Gilpin
%E Eleonora Giunchiglia
%E Pascal Hitzler
%E Emile van Krieken	
%F pmlr-v284-rao25a
%I PMLR
%P 814--829
%U https://proceedings.mlr.press/v284/rao25a.html
%V 284
%X Formally verifying properties of software code has been a highly desirable task, especially with the emergence of LLM-generated code. In the same vein, they provide an interesting avenue for the exploration of formal verification and mechanistic interpretability. Since the introduction of code-specific models, despite their successes in generating code in Lean4 and Isabelle, the task of generalized theorem proving still remains far from being fully solved and will be a benchmark for reasoning capability in LLMs. In this work, we introduce a framework that generates whole proofs in a formal language to be used within systems that utilize the power of built-in tactics and off-the-shelf automated theorem provers. Our framework includes 3 components: generating natural language statements of the code to be verified, an LLM that generates formal proofs for the given statement, and a module employing heuristics for building the final proof. To train the LLM, we employ a 2-stage fine-tuning process, where we first use SFT-based training to enable the model to generate syntactically correct Isabelle code and then RL-based training that encourages the model to generate proofs verified by a theorem prover. We validate our framework using the miniF2F-test benchmark and the Isabelle proof assistant and design a use case to verify the correctness of the AWS S3 bucket access policy code. We also curate a dataset based on the FVEL\textsubscript{\textnormal{ER}} dataset for future training tasks.

APA

Rao, B., Eiers, W. & Lipizzi, C.. (2025). Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification. Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning, in Proceedings of Machine Learning Research 284:814-829 Available from https://proceedings.mlr.press/v284/rao25a.html.

Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification

Abstract

Cite this Paper

Related Material