Bridging Code-Text Representation Gap using Explanation
Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:1033-1048, 2021.
This paper studies Code-Text Representation (CTR) learning, aiming to learn general-purpose representations that support downstream code/text applications such as code search, finding code matching textual queries. However, state-of-the-arts do not focus on matching the gap between code/text modalities. In this paper, we complement this gap by providing an intermediate representation, and view it as “explanation.” Our contribution is three fold: First, we propose four types of explanation utilization methods for CTR, and compare their effectiveness. Second, we show that using explanation as the model input is desirable. Third, we confirm that even automatically generated explanation can lead to a drastic performance gain. To the best of our knowledge, this is the first work to define and categorize code explanation, for enhancing code understanding/representation.