Bridging Code-Text Representation Gap using Explanation

Hojae Han, Youngwon Lee, Minsoo Kim, Hwang Seung-won
Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:1033-1048, 2021.

Abstract

This paper studies Code-Text Representation (CTR) learning, aiming to learn general-purpose representations that support downstream code/text applications such as code search, finding code matching textual queries. However, state-of-the-arts do not focus on matching the gap between code/text modalities. In this paper, we complement this gap by providing an intermediate representation, and view it as “explanation.” Our contribution is three fold: First, we propose four types of explanation utilization methods for CTR, and compare their effectiveness. Second, we show that using explanation as the model input is desirable. Third, we confirm that even automatically generated explanation can lead to a drastic performance gain. To the best of our knowledge, this is the first work to define and categorize code explanation, for enhancing code understanding/representation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v157-han21a, title = {Bridging Code-Text Representation Gap using Explanation}, author = {Han, Hojae and Lee, Youngwon and Kim, Minsoo and Seung-won, Hwang}, booktitle = {Proceedings of The 13th Asian Conference on Machine Learning}, pages = {1033--1048}, year = {2021}, editor = {Balasubramanian, Vineeth N. and Tsang, Ivor}, volume = {157}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v157/han21a/han21a.pdf}, url = {https://proceedings.mlr.press/v157/han21a.html}, abstract = {This paper studies Code-Text Representation (CTR) learning, aiming to learn general-purpose representations that support downstream code/text applications such as code search, finding code matching textual queries. However, state-of-the-arts do not focus on matching the gap between code/text modalities. In this paper, we complement this gap by providing an intermediate representation, and view it as “explanation.” Our contribution is three fold: First, we propose four types of explanation utilization methods for CTR, and compare their effectiveness. Second, we show that using explanation as the model input is desirable. Third, we confirm that even automatically generated explanation can lead to a drastic performance gain. To the best of our knowledge, this is the first work to define and categorize code explanation, for enhancing code understanding/representation.} }
Endnote
%0 Conference Paper %T Bridging Code-Text Representation Gap using Explanation %A Hojae Han %A Youngwon Lee %A Minsoo Kim %A Hwang Seung-won %B Proceedings of The 13th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Vineeth N. Balasubramanian %E Ivor Tsang %F pmlr-v157-han21a %I PMLR %P 1033--1048 %U https://proceedings.mlr.press/v157/han21a.html %V 157 %X This paper studies Code-Text Representation (CTR) learning, aiming to learn general-purpose representations that support downstream code/text applications such as code search, finding code matching textual queries. However, state-of-the-arts do not focus on matching the gap between code/text modalities. In this paper, we complement this gap by providing an intermediate representation, and view it as “explanation.” Our contribution is three fold: First, we propose four types of explanation utilization methods for CTR, and compare their effectiveness. Second, we show that using explanation as the model input is desirable. Third, we confirm that even automatically generated explanation can lead to a drastic performance gain. To the best of our knowledge, this is the first work to define and categorize code explanation, for enhancing code understanding/representation.
APA
Han, H., Lee, Y., Kim, M. & Seung-won, H.. (2021). Bridging Code-Text Representation Gap using Explanation. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:1033-1048 Available from https://proceedings.mlr.press/v157/han21a.html.

Related Material