Structured Generative Models of Natural Source Code
Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):649-657, 2014.
We study the problem of building generative models of natural source code (NSC); that is, source code written and understood by humans. Our primary contribution is to describe a family of generative models for NSC that have two key properties: First, they incorporate both sequential and hierarchical structure. Second, they are capable of integrating closely with a compiler, which allows leveraging compiler logic and abstractions when building structure into the model. We also develop an extension that includes more complex structure, refining how the model generates identifier tokens based on what variables are currently in scope. Our models can be learned efficiently, and we show empirically that including appropriate structure greatly improves the probability of generating test programs.