ReGAL: Refactoring Programs to Discover Generalizable Abstractions

Elias Stengel-Eskin, Archiki Prasad, Mohit Bansal
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:46605-46624, 2024.

Abstract

While large language models (LLMs) are increasingly being used for program synthesis, they lack the global view needed to develop useful abstractions; they generally predict programs one at a time, often repeating the same functionality. Generating redundant code from scratch is both inefficient and error-prone. To address this, we propose Refactoring for Generalizable Abstraction Learning (ReGAL), a gradient-free method for learning a library of reusable functions via code refactorization, i.e., restructuring code without changing its execution output. ReGAL learns from a small set of existing programs, iteratively verifying and refining its abstractions via execution. We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains. On five datasets – LOGO graphics generation, Date reasoning, TextCraft (a Minecraft-based text-game) MATH, and TabMWP – both open-source and proprietary LLMs improve in accuracy when predicting programs with REGAL functions. For CodeLlama-13B, REGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, out-performing GPT-3.5 in two of three domains. Our analysis reveals REGAL’s abstractions encapsulate frequently-used subroutines as well as environment dynamics.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-stengel-eskin24a, title = {{R}e{GAL}: Refactoring Programs to Discover Generalizable Abstractions}, author = {Stengel-Eskin, Elias and Prasad, Archiki and Bansal, Mohit}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {46605--46624}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/stengel-eskin24a/stengel-eskin24a.pdf}, url = {https://proceedings.mlr.press/v235/stengel-eskin24a.html}, abstract = {While large language models (LLMs) are increasingly being used for program synthesis, they lack the global view needed to develop useful abstractions; they generally predict programs one at a time, often repeating the same functionality. Generating redundant code from scratch is both inefficient and error-prone. To address this, we propose Refactoring for Generalizable Abstraction Learning (ReGAL), a gradient-free method for learning a library of reusable functions via code refactorization, i.e., restructuring code without changing its execution output. ReGAL learns from a small set of existing programs, iteratively verifying and refining its abstractions via execution. We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains. On five datasets – LOGO graphics generation, Date reasoning, TextCraft (a Minecraft-based text-game) MATH, and TabMWP – both open-source and proprietary LLMs improve in accuracy when predicting programs with REGAL functions. For CodeLlama-13B, REGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, out-performing GPT-3.5 in two of three domains. Our analysis reveals REGAL’s abstractions encapsulate frequently-used subroutines as well as environment dynamics.} }
Endnote
%0 Conference Paper %T ReGAL: Refactoring Programs to Discover Generalizable Abstractions %A Elias Stengel-Eskin %A Archiki Prasad %A Mohit Bansal %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-stengel-eskin24a %I PMLR %P 46605--46624 %U https://proceedings.mlr.press/v235/stengel-eskin24a.html %V 235 %X While large language models (LLMs) are increasingly being used for program synthesis, they lack the global view needed to develop useful abstractions; they generally predict programs one at a time, often repeating the same functionality. Generating redundant code from scratch is both inefficient and error-prone. To address this, we propose Refactoring for Generalizable Abstraction Learning (ReGAL), a gradient-free method for learning a library of reusable functions via code refactorization, i.e., restructuring code without changing its execution output. ReGAL learns from a small set of existing programs, iteratively verifying and refining its abstractions via execution. We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains. On five datasets – LOGO graphics generation, Date reasoning, TextCraft (a Minecraft-based text-game) MATH, and TabMWP – both open-source and proprietary LLMs improve in accuracy when predicting programs with REGAL functions. For CodeLlama-13B, REGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, out-performing GPT-3.5 in two of three domains. Our analysis reveals REGAL’s abstractions encapsulate frequently-used subroutines as well as environment dynamics.
APA
Stengel-Eskin, E., Prasad, A. & Bansal, M.. (2024). ReGAL: Refactoring Programs to Discover Generalizable Abstractions. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:46605-46624 Available from https://proceedings.mlr.press/v235/stengel-eskin24a.html.

Related Material