Can Large Language Models Understand Intermediate Representations in Compilers?

Hailong Jiang, Jianfeng Zhu, Yao Wan, Bo Fang, Hongyu Zhang, Ruoming Jin, Qiang Guan
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:27851-27872, 2025.

Abstract

Intermediate Representations (IRs) play a critical role in compiler design and program analysis, yet their comprehension by Large Language Models (LLMs) remains underexplored. In this paper, we present an explorative empirical study evaluating the capabilities of six state-of-the-art LLMs—GPT-4, GPT-3, DeepSeek, Gemma 2, Llama 3, and Code Llama—in understanding IRs. Specifically, we assess model performance across four core tasks: control flow graph reconstruction, decompilation, code summarization, and execution reasoning. While LLMs exhibit competence in parsing IR syntax and identifying high-level structures, they consistently struggle with instruction-level reasoning, especially in control flow reasoning, loop handling, and dynamic execution. Common failure modes include misinterpreting branching instructions, omitting critical operations, and relying on heuristic reasoning rather than on precise instruction-level logic. Our findings highlight the need for IR-specific enhancements in LLM design. We recommend fine-tuning on structured IR datasets and integrating control-flow-sensitive architectures to improve the models’ effectiveness on IR-related tasks. All the experimental data and source code are publicly available at https://github.com/hjiang13/LLM4IR.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-jiang25p, title = {Can Large Language Models Understand Intermediate Representations in Compilers?}, author = {Jiang, Hailong and Zhu, Jianfeng and Wan, Yao and Fang, Bo and Zhang, Hongyu and Jin, Ruoming and Guan, Qiang}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {27851--27872}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/jiang25p/jiang25p.pdf}, url = {https://proceedings.mlr.press/v267/jiang25p.html}, abstract = {Intermediate Representations (IRs) play a critical role in compiler design and program analysis, yet their comprehension by Large Language Models (LLMs) remains underexplored. In this paper, we present an explorative empirical study evaluating the capabilities of six state-of-the-art LLMs—GPT-4, GPT-3, DeepSeek, Gemma 2, Llama 3, and Code Llama—in understanding IRs. Specifically, we assess model performance across four core tasks: control flow graph reconstruction, decompilation, code summarization, and execution reasoning. While LLMs exhibit competence in parsing IR syntax and identifying high-level structures, they consistently struggle with instruction-level reasoning, especially in control flow reasoning, loop handling, and dynamic execution. Common failure modes include misinterpreting branching instructions, omitting critical operations, and relying on heuristic reasoning rather than on precise instruction-level logic. Our findings highlight the need for IR-specific enhancements in LLM design. We recommend fine-tuning on structured IR datasets and integrating control-flow-sensitive architectures to improve the models’ effectiveness on IR-related tasks. All the experimental data and source code are publicly available at https://github.com/hjiang13/LLM4IR.} }
Endnote
%0 Conference Paper %T Can Large Language Models Understand Intermediate Representations in Compilers? %A Hailong Jiang %A Jianfeng Zhu %A Yao Wan %A Bo Fang %A Hongyu Zhang %A Ruoming Jin %A Qiang Guan %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-jiang25p %I PMLR %P 27851--27872 %U https://proceedings.mlr.press/v267/jiang25p.html %V 267 %X Intermediate Representations (IRs) play a critical role in compiler design and program analysis, yet their comprehension by Large Language Models (LLMs) remains underexplored. In this paper, we present an explorative empirical study evaluating the capabilities of six state-of-the-art LLMs—GPT-4, GPT-3, DeepSeek, Gemma 2, Llama 3, and Code Llama—in understanding IRs. Specifically, we assess model performance across four core tasks: control flow graph reconstruction, decompilation, code summarization, and execution reasoning. While LLMs exhibit competence in parsing IR syntax and identifying high-level structures, they consistently struggle with instruction-level reasoning, especially in control flow reasoning, loop handling, and dynamic execution. Common failure modes include misinterpreting branching instructions, omitting critical operations, and relying on heuristic reasoning rather than on precise instruction-level logic. Our findings highlight the need for IR-specific enhancements in LLM design. We recommend fine-tuning on structured IR datasets and integrating control-flow-sensitive architectures to improve the models’ effectiveness on IR-related tasks. All the experimental data and source code are publicly available at https://github.com/hjiang13/LLM4IR.
APA
Jiang, H., Zhu, J., Wan, Y., Fang, B., Zhang, H., Jin, R. & Guan, Q.. (2025). Can Large Language Models Understand Intermediate Representations in Compilers?. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:27851-27872 Available from https://proceedings.mlr.press/v267/jiang25p.html.

Related Material