Data Generation for Neural Programming by Example

Judith Clymo, Haik Manukian, Nathanael Fijalkow, Adria Gascon, Brooks Paige
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:3450-3459, 2020.

Abstract

Programming by example is the problem of synthesizing a program from a small set of input / output pairs. Recent works applying machine learning methods to this task show promise, but are typically reliant on generating synthetic examples for training. A particular challenge lies in generating meaningful sets of inputs and outputs, which well-characterize a given program and accurately demonstrate its behavior. Where examples used for testing are generated by the same method as training data then the performance of a model may be partly reliant on this similarity. In this paper we introduce a novel approach using an SMT solver to synthesize inputs which cover a diverse set of behaviors for a given program. We carry out a case study comparing this method to existing synthetic data generation procedures in the literature, and find that data generated using our approach improves both the discriminatory power of example sets and the ability of trained machine learning models to generalize to unfamiliar data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-clymo20a, title = {Data Generation for Neural Programming by Example}, author = {Clymo, Judith and Manukian, Haik and Fijalkow, Nathanael and Gascon, Adria and Paige, Brooks}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {3450--3459}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/clymo20a/clymo20a.pdf}, url = {https://proceedings.mlr.press/v108/clymo20a.html}, abstract = {Programming by example is the problem of synthesizing a program from a small set of input / output pairs. Recent works applying machine learning methods to this task show promise, but are typically reliant on generating synthetic examples for training. A particular challenge lies in generating meaningful sets of inputs and outputs, which well-characterize a given program and accurately demonstrate its behavior. Where examples used for testing are generated by the same method as training data then the performance of a model may be partly reliant on this similarity. In this paper we introduce a novel approach using an SMT solver to synthesize inputs which cover a diverse set of behaviors for a given program. We carry out a case study comparing this method to existing synthetic data generation procedures in the literature, and find that data generated using our approach improves both the discriminatory power of example sets and the ability of trained machine learning models to generalize to unfamiliar data.} }
Endnote
%0 Conference Paper %T Data Generation for Neural Programming by Example %A Judith Clymo %A Haik Manukian %A Nathanael Fijalkow %A Adria Gascon %A Brooks Paige %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-clymo20a %I PMLR %P 3450--3459 %U https://proceedings.mlr.press/v108/clymo20a.html %V 108 %X Programming by example is the problem of synthesizing a program from a small set of input / output pairs. Recent works applying machine learning methods to this task show promise, but are typically reliant on generating synthetic examples for training. A particular challenge lies in generating meaningful sets of inputs and outputs, which well-characterize a given program and accurately demonstrate its behavior. Where examples used for testing are generated by the same method as training data then the performance of a model may be partly reliant on this similarity. In this paper we introduce a novel approach using an SMT solver to synthesize inputs which cover a diverse set of behaviors for a given program. We carry out a case study comparing this method to existing synthetic data generation procedures in the literature, and find that data generated using our approach improves both the discriminatory power of example sets and the ability of trained machine learning models to generalize to unfamiliar data.
APA
Clymo, J., Manukian, H., Fijalkow, N., Gascon, A. & Paige, B.. (2020). Data Generation for Neural Programming by Example. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:3450-3459 Available from https://proceedings.mlr.press/v108/clymo20a.html.

Related Material