Structural Language Models of Code

Uri Alon, Roy Sadaka, Omer Levy, Eran Yahav
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:245-256, 2020.

Abstract

We address the problem of any-code completion - generating a missing piece of source code in a given program without any restriction on the vocabulary or structure. We introduce a new approach to any-code completion that leverages the strict syntax of programming languages to model a code snippet as a tree - structural language modeling (SLM). SLM estimates the probability of the program’s abstract syntax tree (AST) by decomposing it into a product of conditional probabilities over its nodes. We present a neural model that computes these conditional probabilities by considering all AST paths leading to a target node. Unlike previous techniques that have severely restricted the kinds of expressions that can be generated in this task, our approach can generate arbitrary code in any programming language. Our model significantly outperforms both seq2seq and a variety of structured approaches in generating Java and C# code. Our code, data, and trained models are available at http://github.com/tech-srl/slm-code-generation/. An online demo is available at http://AnyCodeGen.org.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-alon20a, title = {Structural Language Models of Code}, author = {Alon, Uri and Sadaka, Roy and Levy, Omer and Yahav, Eran}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {245--256}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/alon20a/alon20a.pdf}, url = {https://proceedings.mlr.press/v119/alon20a.html}, abstract = {We address the problem of any-code completion - generating a missing piece of source code in a given program without any restriction on the vocabulary or structure. We introduce a new approach to any-code completion that leverages the strict syntax of programming languages to model a code snippet as a tree - structural language modeling (SLM). SLM estimates the probability of the program’s abstract syntax tree (AST) by decomposing it into a product of conditional probabilities over its nodes. We present a neural model that computes these conditional probabilities by considering all AST paths leading to a target node. Unlike previous techniques that have severely restricted the kinds of expressions that can be generated in this task, our approach can generate arbitrary code in any programming language. Our model significantly outperforms both seq2seq and a variety of structured approaches in generating Java and C# code. Our code, data, and trained models are available at http://github.com/tech-srl/slm-code-generation/. An online demo is available at http://AnyCodeGen.org.} }
Endnote
%0 Conference Paper %T Structural Language Models of Code %A Uri Alon %A Roy Sadaka %A Omer Levy %A Eran Yahav %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-alon20a %I PMLR %P 245--256 %U https://proceedings.mlr.press/v119/alon20a.html %V 119 %X We address the problem of any-code completion - generating a missing piece of source code in a given program without any restriction on the vocabulary or structure. We introduce a new approach to any-code completion that leverages the strict syntax of programming languages to model a code snippet as a tree - structural language modeling (SLM). SLM estimates the probability of the program’s abstract syntax tree (AST) by decomposing it into a product of conditional probabilities over its nodes. We present a neural model that computes these conditional probabilities by considering all AST paths leading to a target node. Unlike previous techniques that have severely restricted the kinds of expressions that can be generated in this task, our approach can generate arbitrary code in any programming language. Our model significantly outperforms both seq2seq and a variety of structured approaches in generating Java and C# code. Our code, data, and trained models are available at http://github.com/tech-srl/slm-code-generation/. An online demo is available at http://AnyCodeGen.org.
APA
Alon, U., Sadaka, R., Levy, O. & Yahav, E.. (2020). Structural Language Models of Code. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:245-256 Available from https://proceedings.mlr.press/v119/alon20a.html.

Related Material