Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents

Karina Zainullina; Alexander Golubev; Maria Trofimova; Sergei Polezhaev; Ibragim Badertdinov; Daria Litvintseva; Simon Karasik; Filipp Fisin; Sergei Skvortsov; Maksim Nekrashevich; Anton Shevtsov; Boris Yangel

Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents

Karina Zainullina, Alexander Golubev, Maria Trofimova, Sergei Polezhaev, Ibragim Badertdinov, Daria Litvintseva, Simon Karasik, Filipp Fisin, Sergei Skvortsov, Maksim Nekrashevich, Anton Shevtsov, Boris Yangel

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:73957-73973, 2025.

Abstract

Large language models (LLMs) have recently achieved remarkable results in complex multi-step tasks, such as mathematical reasoning and agentic software engineering. However, they often struggle to maintain consistent performance across multiple solution attempts. One effective approach to narrow the gap between average-case and best-case performance is guided test-time search, which explores multiple solution paths to identify the most promising one. Unfortunately, effective search techniques (e.g. MCTS) are often unsuitable for non-serializable RL environments, such as Docker containers, where intermediate environment states cannot be easily saved and restored. We investigate two complementary search strategies applicable to such environments: 1-step lookahead and trajectory selection, both guided by a learned action-value function estimator. On the SWE-bench Verified benchmark, a key testbed for agentic software engineering, we find these methods to double the average success rate of a fine-tuned Qwen-72B model, achieving $40.8$%, the new state-of-the-art for open-weights models. Additionally, we show that these techniques are transferable to more advanced closed models, yielding similar improvements with GPT-4o.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-zainullina25a,
  title = 	 {Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents},
  author =       {Zainullina, Karina and Golubev, Alexander and Trofimova, Maria and Polezhaev, Sergei and Badertdinov, Ibragim and Litvintseva, Daria and Karasik, Simon and Fisin, Filipp and Skvortsov, Sergei and Nekrashevich, Maksim and Shevtsov, Anton and Yangel, Boris},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {73957--73973},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zainullina25a/zainullina25a.pdf},
  url = 	 {https://proceedings.mlr.press/v267/zainullina25a.html},
  abstract = 	 {Large language models (LLMs) have recently achieved remarkable results in complex multi-step tasks, such as mathematical reasoning and agentic software engineering. However, they often struggle to maintain consistent performance across multiple solution attempts. One effective approach to narrow the gap between average-case and best-case performance is guided test-time search, which explores multiple solution paths to identify the most promising one. Unfortunately, effective search techniques (e.g. MCTS) are often unsuitable for non-serializable RL environments, such as Docker containers, where intermediate environment states cannot be easily saved and restored. We investigate two complementary search strategies applicable to such environments: 1-step lookahead and trajectory selection, both guided by a learned action-value function estimator. On the SWE-bench Verified benchmark, a key testbed for agentic software engineering, we find these methods to double the average success rate of a fine-tuned Qwen-72B model, achieving $40.8$%, the new state-of-the-art for open-weights models. Additionally, we show that these techniques are transferable to more advanced closed models, yielding similar improvements with GPT-4o.}
}

Endnote

%0 Conference Paper
%T Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents
%A Karina Zainullina
%A Alexander Golubev
%A Maria Trofimova
%A Sergei Polezhaev
%A Ibragim Badertdinov
%A Daria Litvintseva
%A Simon Karasik
%A Filipp Fisin
%A Sergei Skvortsov
%A Maksim Nekrashevich
%A Anton Shevtsov
%A Boris Yangel
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-zainullina25a
%I PMLR
%P 73957--73973
%U https://proceedings.mlr.press/v267/zainullina25a.html
%V 267
%X Large language models (LLMs) have recently achieved remarkable results in complex multi-step tasks, such as mathematical reasoning and agentic software engineering. However, they often struggle to maintain consistent performance across multiple solution attempts. One effective approach to narrow the gap between average-case and best-case performance is guided test-time search, which explores multiple solution paths to identify the most promising one. Unfortunately, effective search techniques (e.g. MCTS) are often unsuitable for non-serializable RL environments, such as Docker containers, where intermediate environment states cannot be easily saved and restored. We investigate two complementary search strategies applicable to such environments: 1-step lookahead and trajectory selection, both guided by a learned action-value function estimator. On the SWE-bench Verified benchmark, a key testbed for agentic software engineering, we find these methods to double the average success rate of a fine-tuned Qwen-72B model, achieving $40.8$%, the new state-of-the-art for open-weights models. Additionally, we show that these techniques are transferable to more advanced closed models, yielding similar improvements with GPT-4o.

APA

Zainullina, K., Golubev, A., Trofimova, M., Polezhaev, S., Badertdinov, I., Litvintseva, D., Karasik, S., Fisin, F., Skvortsov, S., Nekrashevich, M., Shevtsov, A. & Yangel, B.. (2025). Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:73957-73973 Available from https://proceedings.mlr.press/v267/zainullina25a.html.

Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents

Abstract

Cite this Paper

Related Material