EnIGMA: Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities

Talor Abramovich, Meet Udeshi, Minghao Shao, Kilian Lieret, Haoran Xi, Kimberly Milner, Sofija Jancheska, John Yang, Carlos E Jimenez, Farshad Khorrami, Prashanth Krishnamurthy, Brendan Dolan-Gavitt, Muhammad Shafique, Karthik R Narasimhan, Ramesh Karri, Ofir Press
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:246-355, 2025.

Abstract

Although language model (LM) agents have demonstrated increased performance in multiple domains, including coding and web-browsing, their success in cybersecurity has been limited. We present EnIGMA, an LM agent for autonomously solving Capture The Flag (CTF) challenges. We introduce new tools and interfaces to improve the agent’s ability to find and exploit security vulnerabilities, focusing on interactive terminal programs. These novel Interactive Agent Tools enable LM agents, for the first time, to run interactive utilities, such as a debugger and a server connection tool, which are essential for solving these challenges. Empirical analysis on 390 CTF challenges across four benchmarks demonstrate that these new tools and interfaces substantially improve our agent’s performance, achieving state-of-the-art results on NYU CTF, Intercode-CTF, and CyBench. Finally, we analyze data leakage, developing new methods to quantify it and identifying a new phenomenon we term soliloquizing, where the model self-generates hallucinated observations without interacting with the environment.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-abramovich25a, title = {{E}n{IGMA}: Interactive Tools Substantially Assist {LM} Agents in Finding Security Vulnerabilities}, author = {Abramovich, Talor and Udeshi, Meet and Shao, Minghao and Lieret, Kilian and Xi, Haoran and Milner, Kimberly and Jancheska, Sofija and Yang, John and Jimenez, Carlos E and Khorrami, Farshad and Krishnamurthy, Prashanth and Dolan-Gavitt, Brendan and Shafique, Muhammad and Narasimhan, Karthik R and Karri, Ramesh and Press, Ofir}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {246--355}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/abramovich25a/abramovich25a.pdf}, url = {https://proceedings.mlr.press/v267/abramovich25a.html}, abstract = {Although language model (LM) agents have demonstrated increased performance in multiple domains, including coding and web-browsing, their success in cybersecurity has been limited. We present EnIGMA, an LM agent for autonomously solving Capture The Flag (CTF) challenges. We introduce new tools and interfaces to improve the agent’s ability to find and exploit security vulnerabilities, focusing on interactive terminal programs. These novel Interactive Agent Tools enable LM agents, for the first time, to run interactive utilities, such as a debugger and a server connection tool, which are essential for solving these challenges. Empirical analysis on 390 CTF challenges across four benchmarks demonstrate that these new tools and interfaces substantially improve our agent’s performance, achieving state-of-the-art results on NYU CTF, Intercode-CTF, and CyBench. Finally, we analyze data leakage, developing new methods to quantify it and identifying a new phenomenon we term soliloquizing, where the model self-generates hallucinated observations without interacting with the environment.} }
Endnote
%0 Conference Paper %T EnIGMA: Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities %A Talor Abramovich %A Meet Udeshi %A Minghao Shao %A Kilian Lieret %A Haoran Xi %A Kimberly Milner %A Sofija Jancheska %A John Yang %A Carlos E Jimenez %A Farshad Khorrami %A Prashanth Krishnamurthy %A Brendan Dolan-Gavitt %A Muhammad Shafique %A Karthik R Narasimhan %A Ramesh Karri %A Ofir Press %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-abramovich25a %I PMLR %P 246--355 %U https://proceedings.mlr.press/v267/abramovich25a.html %V 267 %X Although language model (LM) agents have demonstrated increased performance in multiple domains, including coding and web-browsing, their success in cybersecurity has been limited. We present EnIGMA, an LM agent for autonomously solving Capture The Flag (CTF) challenges. We introduce new tools and interfaces to improve the agent’s ability to find and exploit security vulnerabilities, focusing on interactive terminal programs. These novel Interactive Agent Tools enable LM agents, for the first time, to run interactive utilities, such as a debugger and a server connection tool, which are essential for solving these challenges. Empirical analysis on 390 CTF challenges across four benchmarks demonstrate that these new tools and interfaces substantially improve our agent’s performance, achieving state-of-the-art results on NYU CTF, Intercode-CTF, and CyBench. Finally, we analyze data leakage, developing new methods to quantify it and identifying a new phenomenon we term soliloquizing, where the model self-generates hallucinated observations without interacting with the environment.
APA
Abramovich, T., Udeshi, M., Shao, M., Lieret, K., Xi, H., Milner, K., Jancheska, S., Yang, J., Jimenez, C.E., Khorrami, F., Krishnamurthy, P., Dolan-Gavitt, B., Shafique, M., Narasimhan, K.R., Karri, R. & Press, O.. (2025). EnIGMA: Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:246-355 Available from https://proceedings.mlr.press/v267/abramovich25a.html.

Related Material