TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer

Berkay Berabi; Jingxuan He; Veselin Raychev; Martin Vechev

TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer

Berkay Berabi, Jingxuan He, Veselin Raychev, Martin Vechev

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:780-791, 2021.

Abstract

The problem of fixing errors in programs has attracted substantial interest over the years. The key challenge for building an effective code fixing tool is to capture a wide range of errors and meanwhile maintain high accuracy. In this paper, we address this challenge and present a new learning-based system, called TFix. TFix works directly on program text and phrases the problem of code fixing as a text-to-text task. In turn, this enables it to leverage a powerful Transformer based model pre-trained on natural language and fine-tuned to generate code fixes (via a large, high-quality dataset obtained from GitHub commits). TFix is not specific to a particular programming language or class of defects and, in fact, improved its precision by simultaneously fine-tuning on 52 different error types reported by a popular static analyzer. Our evaluation on a massive dataset of JavaScript programs shows that TFix is practically effective: it is able to synthesize code that fixes the error in 67 percent of cases and significantly outperforms existing learning-based approaches.

Cite this Paper

BibTeX


@InProceedings{pmlr-v139-berabi21a,
  title = 	 {TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer},
  author =       {Berabi, Berkay and He, Jingxuan and Raychev, Veselin and Vechev, Martin},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {780--791},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/berabi21a/berabi21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/berabi21a.html},
  abstract = 	 {The problem of fixing errors in programs has attracted substantial interest over the years. The key challenge for building an effective code fixing tool is to capture a wide range of errors and meanwhile maintain high accuracy. In this paper, we address this challenge and present a new learning-based system, called TFix. TFix works directly on program text and phrases the problem of code fixing as a text-to-text task. In turn, this enables it to leverage a powerful Transformer based model pre-trained on natural language and fine-tuned to generate code fixes (via a large, high-quality dataset obtained from GitHub commits). TFix is not specific to a particular programming language or class of defects and, in fact, improved its precision by simultaneously fine-tuning on 52 different error types reported by a popular static analyzer. Our evaluation on a massive dataset of JavaScript programs shows that TFix is practically effective: it is able to synthesize code that fixes the error in  67 percent of cases and significantly outperforms existing learning-based approaches.}
}

Endnote

%0 Conference Paper
%T TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer
%A Berkay Berabi
%A Jingxuan He
%A Veselin Raychev
%A Martin Vechev
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-berabi21a
%I PMLR
%P 780--791
%U https://proceedings.mlr.press/v139/berabi21a.html
%V 139
%X The problem of fixing errors in programs has attracted substantial interest over the years. The key challenge for building an effective code fixing tool is to capture a wide range of errors and meanwhile maintain high accuracy. In this paper, we address this challenge and present a new learning-based system, called TFix. TFix works directly on program text and phrases the problem of code fixing as a text-to-text task. In turn, this enables it to leverage a powerful Transformer based model pre-trained on natural language and fine-tuned to generate code fixes (via a large, high-quality dataset obtained from GitHub commits). TFix is not specific to a particular programming language or class of defects and, in fact, improved its precision by simultaneously fine-tuning on 52 different error types reported by a popular static analyzer. Our evaluation on a massive dataset of JavaScript programs shows that TFix is practically effective: it is able to synthesize code that fixes the error in  67 percent of cases and significantly outperforms existing learning-based approaches.

APA


Berabi, B., He, J., Raychev, V. & Vechev, M.. (2021). TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:780-791 Available from https://proceedings.mlr.press/v139/berabi21a.html.

TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer

Abstract

Cite this Paper

Related Material