A large-scale benchmark for few-shot program induction and synthesis

Ferran Alet, Javier Lopez-Contreras, James Koppel, Maxwell Nye, Armando Solar-Lezama, Tomas Lozano-Perez, Leslie Kaelbling, Joshua Tenenbaum
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:175-186, 2021.

Abstract

A landmark challenge for AI is to learn flexible, powerful representations from small numbers of examples. On an important class of tasks, hypotheses in the form of programs provide extreme generalization capabilities from surprisingly few examples. However, whereas large natural few-shot learning image benchmarks have spurred progress in meta-learning for deep networks, there is no comparably big, natural program-synthesis dataset that can play a similar role. This is because, whereas images are relatively easy to label from internet meta-data or annotated by non-experts, generating meaningful input-output examples for program induction has proven hard to scale. In this work, we propose a new way of leveraging unit tests and natural inputs for small programs as meaningful input-output examples for each sub-program of the overall program. This allows us to create a large-scale naturalistic few-shot program-induction benchmark and propose new challenges in this domain. The evaluation of multiple program induction and synthesis algorithms points to shortcomings of current methods and suggests multiple avenues for future work.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-alet21a, title = {A large-scale benchmark for few-shot program induction and synthesis}, author = {Alet, Ferran and Lopez-Contreras, Javier and Koppel, James and Nye, Maxwell and Solar-Lezama, Armando and Lozano-Perez, Tomas and Kaelbling, Leslie and Tenenbaum, Joshua}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {175--186}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/alet21a/alet21a.pdf}, url = {https://proceedings.mlr.press/v139/alet21a.html}, abstract = {A landmark challenge for AI is to learn flexible, powerful representations from small numbers of examples. On an important class of tasks, hypotheses in the form of programs provide extreme generalization capabilities from surprisingly few examples. However, whereas large natural few-shot learning image benchmarks have spurred progress in meta-learning for deep networks, there is no comparably big, natural program-synthesis dataset that can play a similar role. This is because, whereas images are relatively easy to label from internet meta-data or annotated by non-experts, generating meaningful input-output examples for program induction has proven hard to scale. In this work, we propose a new way of leveraging unit tests and natural inputs for small programs as meaningful input-output examples for each sub-program of the overall program. This allows us to create a large-scale naturalistic few-shot program-induction benchmark and propose new challenges in this domain. The evaluation of multiple program induction and synthesis algorithms points to shortcomings of current methods and suggests multiple avenues for future work.} }
Endnote
%0 Conference Paper %T A large-scale benchmark for few-shot program induction and synthesis %A Ferran Alet %A Javier Lopez-Contreras %A James Koppel %A Maxwell Nye %A Armando Solar-Lezama %A Tomas Lozano-Perez %A Leslie Kaelbling %A Joshua Tenenbaum %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-alet21a %I PMLR %P 175--186 %U https://proceedings.mlr.press/v139/alet21a.html %V 139 %X A landmark challenge for AI is to learn flexible, powerful representations from small numbers of examples. On an important class of tasks, hypotheses in the form of programs provide extreme generalization capabilities from surprisingly few examples. However, whereas large natural few-shot learning image benchmarks have spurred progress in meta-learning for deep networks, there is no comparably big, natural program-synthesis dataset that can play a similar role. This is because, whereas images are relatively easy to label from internet meta-data or annotated by non-experts, generating meaningful input-output examples for program induction has proven hard to scale. In this work, we propose a new way of leveraging unit tests and natural inputs for small programs as meaningful input-output examples for each sub-program of the overall program. This allows us to create a large-scale naturalistic few-shot program-induction benchmark and propose new challenges in this domain. The evaluation of multiple program induction and synthesis algorithms points to shortcomings of current methods and suggests multiple avenues for future work.
APA
Alet, F., Lopez-Contreras, J., Koppel, J., Nye, M., Solar-Lezama, A., Lozano-Perez, T., Kaelbling, L. & Tenenbaum, J.. (2021). A large-scale benchmark for few-shot program induction and synthesis. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:175-186 Available from https://proceedings.mlr.press/v139/alet21a.html.

Related Material