Position: Trustworthy AI Agents Require the Integration of Large Language Models and Formal Methods

Yedi Zhang, Yufan Cai, Xinyue Zuo, Xiaokun Luan, Kailong Wang, Zhe Hou, Yifan Zhang, Zhiyuan Wei, Meng Sun, Jun Sun, Jing Sun, Jin Song Dong
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:82441-82459, 2025.

Abstract

Large Language Models (LLMs) have emerged as a transformative AI paradigm, profoundly influencing broad aspects of daily life. Despite their remarkable performance, LLMs exhibit a fundamental limitation: hallucination—the tendency to produce misleading outputs that appear plausible. This inherent unreliability poses significant risks, particularly in high-stakes domains where trustworthiness is essential. On the other hand, Formal Methods (FMs), which share foundations with symbolic AI, provide mathematically rigorous techniques for modeling, specifying, reasoning, and verifying the correctness of systems. These methods have been widely employed in mission-critical domains such as aerospace, defense, and cybersecurity. However, the broader adoption of FMs remains constrained by significant challenges, including steep learning curves, limited scalability, and difficulties in adapting to the dynamic requirements of daily applications. To build trustworthy AI agents, we argue that the integration of LLMs and FMs is necessary to overcome the limitations of both paradigms. LLMs offer adaptability and human-like reasoning but lack formal guarantees of correctness and reliability. FMs provide rigor but need enhanced accessibility and automation to support broader adoption from LLMs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-zhang25ds, title = {Position: Trustworthy {AI} Agents Require the Integration of Large Language Models and Formal Methods}, author = {Zhang, Yedi and Cai, Yufan and Zuo, Xinyue and Luan, Xiaokun and Wang, Kailong and Hou, Zhe and Zhang, Yifan and Wei, Zhiyuan and Sun, Meng and Sun, Jun and Sun, Jing and Dong, Jin Song}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {82441--82459}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zhang25ds/zhang25ds.pdf}, url = {https://proceedings.mlr.press/v267/zhang25ds.html}, abstract = {Large Language Models (LLMs) have emerged as a transformative AI paradigm, profoundly influencing broad aspects of daily life. Despite their remarkable performance, LLMs exhibit a fundamental limitation: hallucination—the tendency to produce misleading outputs that appear plausible. This inherent unreliability poses significant risks, particularly in high-stakes domains where trustworthiness is essential. On the other hand, Formal Methods (FMs), which share foundations with symbolic AI, provide mathematically rigorous techniques for modeling, specifying, reasoning, and verifying the correctness of systems. These methods have been widely employed in mission-critical domains such as aerospace, defense, and cybersecurity. However, the broader adoption of FMs remains constrained by significant challenges, including steep learning curves, limited scalability, and difficulties in adapting to the dynamic requirements of daily applications. To build trustworthy AI agents, we argue that the integration of LLMs and FMs is necessary to overcome the limitations of both paradigms. LLMs offer adaptability and human-like reasoning but lack formal guarantees of correctness and reliability. FMs provide rigor but need enhanced accessibility and automation to support broader adoption from LLMs.} }
Endnote
%0 Conference Paper %T Position: Trustworthy AI Agents Require the Integration of Large Language Models and Formal Methods %A Yedi Zhang %A Yufan Cai %A Xinyue Zuo %A Xiaokun Luan %A Kailong Wang %A Zhe Hou %A Yifan Zhang %A Zhiyuan Wei %A Meng Sun %A Jun Sun %A Jing Sun %A Jin Song Dong %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-zhang25ds %I PMLR %P 82441--82459 %U https://proceedings.mlr.press/v267/zhang25ds.html %V 267 %X Large Language Models (LLMs) have emerged as a transformative AI paradigm, profoundly influencing broad aspects of daily life. Despite their remarkable performance, LLMs exhibit a fundamental limitation: hallucination—the tendency to produce misleading outputs that appear plausible. This inherent unreliability poses significant risks, particularly in high-stakes domains where trustworthiness is essential. On the other hand, Formal Methods (FMs), which share foundations with symbolic AI, provide mathematically rigorous techniques for modeling, specifying, reasoning, and verifying the correctness of systems. These methods have been widely employed in mission-critical domains such as aerospace, defense, and cybersecurity. However, the broader adoption of FMs remains constrained by significant challenges, including steep learning curves, limited scalability, and difficulties in adapting to the dynamic requirements of daily applications. To build trustworthy AI agents, we argue that the integration of LLMs and FMs is necessary to overcome the limitations of both paradigms. LLMs offer adaptability and human-like reasoning but lack formal guarantees of correctness and reliability. FMs provide rigor but need enhanced accessibility and automation to support broader adoption from LLMs.
APA
Zhang, Y., Cai, Y., Zuo, X., Luan, X., Wang, K., Hou, Z., Zhang, Y., Wei, Z., Sun, M., Sun, J., Sun, J. & Dong, J.S.. (2025). Position: Trustworthy AI Agents Require the Integration of Large Language Models and Formal Methods. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:82441-82459 Available from https://proceedings.mlr.press/v267/zhang25ds.html.

Related Material