Structured Generative Models of Natural Source Code

Chris Maddison; Daniel Tarlow

Structured Generative Models of Natural Source Code

Chris Maddison, Daniel Tarlow

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):649-657, 2014.

Abstract

We study the problem of building generative models of natural source code (NSC); that is, source code written and understood by humans. Our primary contribution is to describe a family of generative models for NSC that have two key properties: First, they incorporate both sequential and hierarchical structure. Second, they are capable of integrating closely with a compiler, which allows leveraging compiler logic and abstractions when building structure into the model. We also develop an extension that includes more complex structure, refining how the model generates identifier tokens based on what variables are currently in scope. Our models can be learned efficiently, and we show empirically that including appropriate structure greatly improves the probability of generating test programs.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-maddison14,
  title = 	 {Structured Generative Models of Natural Source Code},
  author = 	 {Maddison, Chris and Tarlow, Daniel},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {649--657},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/maddison14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/maddison14.html},
  abstract = 	 {We study the problem of building generative models of natural source code (NSC); that is, source code written and understood by humans. Our primary contribution is to describe a family of generative models for NSC that have two key properties: First, they incorporate both sequential and hierarchical structure. Second, they are capable of integrating closely with a compiler, which allows leveraging compiler logic and abstractions when building structure into the model. We also develop an extension that includes more complex structure, refining how the model generates identifier tokens based on what variables are currently in scope.  Our models can be learned efficiently, and we show empirically that including appropriate structure greatly improves the probability of generating test programs.}
}

Endnote

%0 Conference Paper
%T Structured Generative Models of Natural Source Code
%A Chris Maddison
%A Daniel Tarlow
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-maddison14
%I PMLR
%P 649--657
%U https://proceedings.mlr.press/v32/maddison14.html
%V 32
%N 2
%X We study the problem of building generative models of natural source code (NSC); that is, source code written and understood by humans. Our primary contribution is to describe a family of generative models for NSC that have two key properties: First, they incorporate both sequential and hierarchical structure. Second, they are capable of integrating closely with a compiler, which allows leveraging compiler logic and abstractions when building structure into the model. We also develop an extension that includes more complex structure, refining how the model generates identifier tokens based on what variables are currently in scope.  Our models can be learned efficiently, and we show empirically that including appropriate structure greatly improves the probability of generating test programs.

RIS


TY  - CPAPER
TI  - Structured Generative Models of Natural Source Code
AU  - Chris Maddison
AU  - Daniel Tarlow
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/06/18
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-maddison14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 2
SP  - 649
EP  - 657
L1  - http://proceedings.mlr.press/v32/maddison14.pdf
UR  - https://proceedings.mlr.press/v32/maddison14.html
AB  - We study the problem of building generative models of natural source code (NSC); that is, source code written and understood by humans. Our primary contribution is to describe a family of generative models for NSC that have two key properties: First, they incorporate both sequential and hierarchical structure. Second, they are capable of integrating closely with a compiler, which allows leveraging compiler logic and abstractions when building structure into the model. We also develop an extension that includes more complex structure, refining how the model generates identifier tokens based on what variables are currently in scope.  Our models can be learned efficiently, and we show empirically that including appropriate structure greatly improves the probability of generating test programs.
ER  -

APA


Maddison, C. & Tarlow, D.. (2014). Structured Generative Models of Natural Source Code. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):649-657 Available from https://proceedings.mlr.press/v32/maddison14.html.

Structured Generative Models of Natural Source Code

Abstract

Cite this Paper

Related Material