PinNet: Pinpoint Instructive Information for Retrieval Augmented Code-to-Text Generation

Han Fu, Jian Tan, Pinhan Zhang, Feifei Li, Jianling Sun
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:14157-14174, 2024.

Abstract

Automatically generating high-quality code descriptions greatly improves the readability and maintainability of the codebase. Recently, retrieval augmented code-to-text generation has proven to be an effective solution, which has achieved state-of-the-art results on various benchmarks. It brings out the potential to leverage large unlabeled code descriptions to further improve the generation quality. Despite the promising performance, retrieval-augmented models however suffer from being deluded by inconducive retrieved references, due to irrelevant or even misleading information contained therein. To this end, we design PinNet, a new framework for code-to-text generation. PinNet relies on a discriminator to measure how well the retrievals match the semantics of the input code. Remarkably, the hidden representation of the reference before the output layer of the discriminator can be leveraged to significantly improve the code-to-text generation by modifying the attention weights. It essentially pays high attention to valuable information and eliminates misleadingness. To effectively execute this idea, we also propose a novel contrastive learning method to quantify the semantical similarities between unlabeled references. Using extensive experiments on code summarization and SQL-to-text generation, we demonstrate that the proposed method can significantly outperform all of the baselines.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-fu24f, title = {{P}in{N}et: Pinpoint Instructive Information for Retrieval Augmented Code-to-Text Generation}, author = {Fu, Han and Tan, Jian and Zhang, Pinhan and Li, Feifei and Sun, Jianling}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {14157--14174}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/fu24f/fu24f.pdf}, url = {https://proceedings.mlr.press/v235/fu24f.html}, abstract = {Automatically generating high-quality code descriptions greatly improves the readability and maintainability of the codebase. Recently, retrieval augmented code-to-text generation has proven to be an effective solution, which has achieved state-of-the-art results on various benchmarks. It brings out the potential to leverage large unlabeled code descriptions to further improve the generation quality. Despite the promising performance, retrieval-augmented models however suffer from being deluded by inconducive retrieved references, due to irrelevant or even misleading information contained therein. To this end, we design PinNet, a new framework for code-to-text generation. PinNet relies on a discriminator to measure how well the retrievals match the semantics of the input code. Remarkably, the hidden representation of the reference before the output layer of the discriminator can be leveraged to significantly improve the code-to-text generation by modifying the attention weights. It essentially pays high attention to valuable information and eliminates misleadingness. To effectively execute this idea, we also propose a novel contrastive learning method to quantify the semantical similarities between unlabeled references. Using extensive experiments on code summarization and SQL-to-text generation, we demonstrate that the proposed method can significantly outperform all of the baselines.} }
Endnote
%0 Conference Paper %T PinNet: Pinpoint Instructive Information for Retrieval Augmented Code-to-Text Generation %A Han Fu %A Jian Tan %A Pinhan Zhang %A Feifei Li %A Jianling Sun %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-fu24f %I PMLR %P 14157--14174 %U https://proceedings.mlr.press/v235/fu24f.html %V 235 %X Automatically generating high-quality code descriptions greatly improves the readability and maintainability of the codebase. Recently, retrieval augmented code-to-text generation has proven to be an effective solution, which has achieved state-of-the-art results on various benchmarks. It brings out the potential to leverage large unlabeled code descriptions to further improve the generation quality. Despite the promising performance, retrieval-augmented models however suffer from being deluded by inconducive retrieved references, due to irrelevant or even misleading information contained therein. To this end, we design PinNet, a new framework for code-to-text generation. PinNet relies on a discriminator to measure how well the retrievals match the semantics of the input code. Remarkably, the hidden representation of the reference before the output layer of the discriminator can be leveraged to significantly improve the code-to-text generation by modifying the attention weights. It essentially pays high attention to valuable information and eliminates misleadingness. To effectively execute this idea, we also propose a novel contrastive learning method to quantify the semantical similarities between unlabeled references. Using extensive experiments on code summarization and SQL-to-text generation, we demonstrate that the proposed method can significantly outperform all of the baselines.
APA
Fu, H., Tan, J., Zhang, P., Li, F. & Sun, J.. (2024). PinNet: Pinpoint Instructive Information for Retrieval Augmented Code-to-Text Generation. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:14157-14174 Available from https://proceedings.mlr.press/v235/fu24f.html.

Related Material