R2E: Turning any Github Repository into a Programming Agent Environment

Naman Jain, Manish Shetty, Tianjun Zhang, King Han, Koushik Sen, Ion Stoica
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:21196-21224, 2024.

Abstract

While Large Language Models’ (LLMs) coding capabilities have advanced rapidly, corresponding evaluation benchmarks on real-world programming setups are yet to catch up. Building a scalable and interactive testbed for evaluating general-purpose AI coding agents for real-world code has been challenging, particularly due to a lack of high-quality test suites available. In this paper, we present Repository to Environment (R2E), a framework that can turn any GitHub repository into a test environment to evaluate the performance of code-generating systems, both static and interactive. R2E is powered by a synergistic combination of program analysis and LLMs to construct equivalence test harnesses for any GitHub function. We instantiate our framework to build the first large-scale benchmark, R2E-Eval1, for building realistic environments for AI coding assistants. Our results demonstrate that even when SOTA models cannot generate correct solutions with advanced prompting techniques, they can effectively use environment feedback highlighting the need to move from static functional coding to interactive programming paradigm. We hope that our framework (and the instantiated benchmark) can motivate research directions by providing web-scale open-ended coding environments. R2E code is available at https://r2e.dev/

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-jain24c, title = {{R}2{E}: Turning any Github Repository into a Programming Agent Environment}, author = {Jain, Naman and Shetty, Manish and Zhang, Tianjun and Han, King and Sen, Koushik and Stoica, Ion}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {21196--21224}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/jain24c/jain24c.pdf}, url = {https://proceedings.mlr.press/v235/jain24c.html}, abstract = {While Large Language Models’ (LLMs) coding capabilities have advanced rapidly, corresponding evaluation benchmarks on real-world programming setups are yet to catch up. Building a scalable and interactive testbed for evaluating general-purpose AI coding agents for real-world code has been challenging, particularly due to a lack of high-quality test suites available. In this paper, we present Repository to Environment (R2E), a framework that can turn any GitHub repository into a test environment to evaluate the performance of code-generating systems, both static and interactive. R2E is powered by a synergistic combination of program analysis and LLMs to construct equivalence test harnesses for any GitHub function. We instantiate our framework to build the first large-scale benchmark, R2E-Eval1, for building realistic environments for AI coding assistants. Our results demonstrate that even when SOTA models cannot generate correct solutions with advanced prompting techniques, they can effectively use environment feedback highlighting the need to move from static functional coding to interactive programming paradigm. We hope that our framework (and the instantiated benchmark) can motivate research directions by providing web-scale open-ended coding environments. R2E code is available at https://r2e.dev/} }
Endnote
%0 Conference Paper %T R2E: Turning any Github Repository into a Programming Agent Environment %A Naman Jain %A Manish Shetty %A Tianjun Zhang %A King Han %A Koushik Sen %A Ion Stoica %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-jain24c %I PMLR %P 21196--21224 %U https://proceedings.mlr.press/v235/jain24c.html %V 235 %X While Large Language Models’ (LLMs) coding capabilities have advanced rapidly, corresponding evaluation benchmarks on real-world programming setups are yet to catch up. Building a scalable and interactive testbed for evaluating general-purpose AI coding agents for real-world code has been challenging, particularly due to a lack of high-quality test suites available. In this paper, we present Repository to Environment (R2E), a framework that can turn any GitHub repository into a test environment to evaluate the performance of code-generating systems, both static and interactive. R2E is powered by a synergistic combination of program analysis and LLMs to construct equivalence test harnesses for any GitHub function. We instantiate our framework to build the first large-scale benchmark, R2E-Eval1, for building realistic environments for AI coding assistants. Our results demonstrate that even when SOTA models cannot generate correct solutions with advanced prompting techniques, they can effectively use environment feedback highlighting the need to move from static functional coding to interactive programming paradigm. We hope that our framework (and the instantiated benchmark) can motivate research directions by providing web-scale open-ended coding environments. R2E code is available at https://r2e.dev/
APA
Jain, N., Shetty, M., Zhang, T., Han, K., Sen, K. & Stoica, I.. (2024). R2E: Turning any Github Repository into a Programming Agent Environment. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:21196-21224 Available from https://proceedings.mlr.press/v235/jain24c.html.

Related Material