Bimodal Modelling of Source Code and Natural Language

Miltos Allamanis, Daniel Tarlow, Andrew Gordon, Yi Wei
; Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:2123-2132, 2015.

Abstract

We consider the problem of building probabilistic models that jointly model short natural language utterances and source code snippets. The aim is to bring together recent work on statistical modelling of source code and work on bimodal models of images and natural language. The resulting models are useful for a variety of tasks that involve natural language and source code. We demonstrate their performance on two retrieval tasks: retrieving source code snippets given a natural language query, and retrieving natural language descriptions given a source code query (i.e., source code captioning). The experiments show there to be promise in this direction, and that modelling the structure of source code is helpful towards the retrieval tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v37-allamanis15, title = {Bimodal Modelling of Source Code and Natural Language}, author = {Miltos Allamanis and Daniel Tarlow and Andrew Gordon and Yi Wei}, pages = {2123--2132}, year = {2015}, editor = {Francis Bach and David Blei}, volume = {37}, series = {Proceedings of Machine Learning Research}, address = {Lille, France}, month = {07--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v37/allamanis15.pdf}, url = {http://proceedings.mlr.press/v37/allamanis15.html}, abstract = {We consider the problem of building probabilistic models that jointly model short natural language utterances and source code snippets. The aim is to bring together recent work on statistical modelling of source code and work on bimodal models of images and natural language. The resulting models are useful for a variety of tasks that involve natural language and source code. We demonstrate their performance on two retrieval tasks: retrieving source code snippets given a natural language query, and retrieving natural language descriptions given a source code query (i.e., source code captioning). The experiments show there to be promise in this direction, and that modelling the structure of source code is helpful towards the retrieval tasks.} }
Endnote
%0 Conference Paper %T Bimodal Modelling of Source Code and Natural Language %A Miltos Allamanis %A Daniel Tarlow %A Andrew Gordon %A Yi Wei %B Proceedings of the 32nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2015 %E Francis Bach %E David Blei %F pmlr-v37-allamanis15 %I PMLR %J Proceedings of Machine Learning Research %P 2123--2132 %U http://proceedings.mlr.press %V 37 %W PMLR %X We consider the problem of building probabilistic models that jointly model short natural language utterances and source code snippets. The aim is to bring together recent work on statistical modelling of source code and work on bimodal models of images and natural language. The resulting models are useful for a variety of tasks that involve natural language and source code. We demonstrate their performance on two retrieval tasks: retrieving source code snippets given a natural language query, and retrieving natural language descriptions given a source code query (i.e., source code captioning). The experiments show there to be promise in this direction, and that modelling the structure of source code is helpful towards the retrieval tasks.
RIS
TY - CPAPER TI - Bimodal Modelling of Source Code and Natural Language AU - Miltos Allamanis AU - Daniel Tarlow AU - Andrew Gordon AU - Yi Wei BT - Proceedings of the 32nd International Conference on Machine Learning PY - 2015/06/01 DA - 2015/06/01 ED - Francis Bach ED - David Blei ID - pmlr-v37-allamanis15 PB - PMLR SP - 2123 DP - PMLR EP - 2132 L1 - http://proceedings.mlr.press/v37/allamanis15.pdf UR - http://proceedings.mlr.press/v37/allamanis15.html AB - We consider the problem of building probabilistic models that jointly model short natural language utterances and source code snippets. The aim is to bring together recent work on statistical modelling of source code and work on bimodal models of images and natural language. The resulting models are useful for a variety of tasks that involve natural language and source code. We demonstrate their performance on two retrieval tasks: retrieving source code snippets given a natural language query, and retrieving natural language descriptions given a source code query (i.e., source code captioning). The experiments show there to be promise in this direction, and that modelling the structure of source code is helpful towards the retrieval tasks. ER -
APA
Allamanis, M., Tarlow, D., Gordon, A. & Wei, Y.. (2015). Bimodal Modelling of Source Code and Natural Language. Proceedings of the 32nd International Conference on Machine Learning, in PMLR 37:2123-2132

Related Material