Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI

Julien Pourcel, Cédric Colas, Pierre-Yves Oudeyer
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:49659-49688, 2025.

Abstract

Many program synthesis tasks prove too challenging for even state-of-the-art language models to solve in single attempts. Search-based evolutionary methods offer a promising alternative by exploring solution spaces iteratively, but their effectiveness remain limited by the fixed capabilities of the underlying generative model. We propose SOAR, a method that learns program synthesis by integrating language models into a self-improving evolutionary loop. SOAR alternates between (1) an evolutionary search that uses an LLM to sample and refine candidate solutions, and (2) a hindsight learning phase that converts search attempts into valid problem-solution pairs used to fine-tune the LLM’s sampling and refinement capabilities—enabling increasingly effective search in subsequent iterations. On the challenging ARC-AGI benchmark, SOAR achieves significant performance gains across model scales and iterations, leveraging positive transfer between the sampling and refinement finetuning tasks. These improvements carry over to test-time adaptation, enabling SOAR to solve 52% of the public test set.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-pourcel25a, title = {Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on {ARC}-{AGI}}, author = {Pourcel, Julien and Colas, C\'{e}dric and Oudeyer, Pierre-Yves}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {49659--49688}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/pourcel25a/pourcel25a.pdf}, url = {https://proceedings.mlr.press/v267/pourcel25a.html}, abstract = {Many program synthesis tasks prove too challenging for even state-of-the-art language models to solve in single attempts. Search-based evolutionary methods offer a promising alternative by exploring solution spaces iteratively, but their effectiveness remain limited by the fixed capabilities of the underlying generative model. We propose SOAR, a method that learns program synthesis by integrating language models into a self-improving evolutionary loop. SOAR alternates between (1) an evolutionary search that uses an LLM to sample and refine candidate solutions, and (2) a hindsight learning phase that converts search attempts into valid problem-solution pairs used to fine-tune the LLM’s sampling and refinement capabilities—enabling increasingly effective search in subsequent iterations. On the challenging ARC-AGI benchmark, SOAR achieves significant performance gains across model scales and iterations, leveraging positive transfer between the sampling and refinement finetuning tasks. These improvements carry over to test-time adaptation, enabling SOAR to solve 52% of the public test set.} }
Endnote
%0 Conference Paper %T Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI %A Julien Pourcel %A Cédric Colas %A Pierre-Yves Oudeyer %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-pourcel25a %I PMLR %P 49659--49688 %U https://proceedings.mlr.press/v267/pourcel25a.html %V 267 %X Many program synthesis tasks prove too challenging for even state-of-the-art language models to solve in single attempts. Search-based evolutionary methods offer a promising alternative by exploring solution spaces iteratively, but their effectiveness remain limited by the fixed capabilities of the underlying generative model. We propose SOAR, a method that learns program synthesis by integrating language models into a self-improving evolutionary loop. SOAR alternates between (1) an evolutionary search that uses an LLM to sample and refine candidate solutions, and (2) a hindsight learning phase that converts search attempts into valid problem-solution pairs used to fine-tune the LLM’s sampling and refinement capabilities—enabling increasingly effective search in subsequent iterations. On the challenging ARC-AGI benchmark, SOAR achieves significant performance gains across model scales and iterations, leveraging positive transfer between the sampling and refinement finetuning tasks. These improvements carry over to test-time adaptation, enabling SOAR to solve 52% of the public test set.
APA
Pourcel, J., Colas, C. & Oudeyer, P.. (2025). Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:49659-49688 Available from https://proceedings.mlr.press/v267/pourcel25a.html.

Related Material