Do Large Code Models Understand Programming Concepts? Counterfactual Analysis for Code Predicates

Ashish Hooda, Mihai Christodorescu, Miltiadis Allamanis, Aaron Wilson, Kassem Fawaz, Somesh Jha
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:18738-18748, 2024.

Abstract

Large Language Models’ success in text generation has also made them better at code generation and coding tasks. While a lot of work has demonstrated their remarkable performance on tasks such as code completion and editing, it is still unclear as to why. We help bridge this gap by exploring to what degree auto-regressive models understand the logical constructs of the underlying programs. We propose Counterfactual Analysis for Programming Concept Predicates (CACP) as a counterfactual testing framework to evaluate whether Large Code Models understand programming concepts. With only black-box access to the model, we use CACP to evaluate ten popular Large Code Models for four different programming concepts. Our findings suggest that current models lack understanding of concepts such as data flow and control flow.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-hooda24a, title = {Do Large Code Models Understand Programming Concepts? {C}ounterfactual Analysis for Code Predicates}, author = {Hooda, Ashish and Christodorescu, Mihai and Allamanis, Miltiadis and Wilson, Aaron and Fawaz, Kassem and Jha, Somesh}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {18738--18748}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/hooda24a/hooda24a.pdf}, url = {https://proceedings.mlr.press/v235/hooda24a.html}, abstract = {Large Language Models’ success in text generation has also made them better at code generation and coding tasks. While a lot of work has demonstrated their remarkable performance on tasks such as code completion and editing, it is still unclear as to why. We help bridge this gap by exploring to what degree auto-regressive models understand the logical constructs of the underlying programs. We propose Counterfactual Analysis for Programming Concept Predicates (CACP) as a counterfactual testing framework to evaluate whether Large Code Models understand programming concepts. With only black-box access to the model, we use CACP to evaluate ten popular Large Code Models for four different programming concepts. Our findings suggest that current models lack understanding of concepts such as data flow and control flow.} }
Endnote
%0 Conference Paper %T Do Large Code Models Understand Programming Concepts? Counterfactual Analysis for Code Predicates %A Ashish Hooda %A Mihai Christodorescu %A Miltiadis Allamanis %A Aaron Wilson %A Kassem Fawaz %A Somesh Jha %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-hooda24a %I PMLR %P 18738--18748 %U https://proceedings.mlr.press/v235/hooda24a.html %V 235 %X Large Language Models’ success in text generation has also made them better at code generation and coding tasks. While a lot of work has demonstrated their remarkable performance on tasks such as code completion and editing, it is still unclear as to why. We help bridge this gap by exploring to what degree auto-regressive models understand the logical constructs of the underlying programs. We propose Counterfactual Analysis for Programming Concept Predicates (CACP) as a counterfactual testing framework to evaluate whether Large Code Models understand programming concepts. With only black-box access to the model, we use CACP to evaluate ten popular Large Code Models for four different programming concepts. Our findings suggest that current models lack understanding of concepts such as data flow and control flow.
APA
Hooda, A., Christodorescu, M., Allamanis, M., Wilson, A., Fawaz, K. & Jha, S.. (2024). Do Large Code Models Understand Programming Concepts? Counterfactual Analysis for Code Predicates. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:18738-18748 Available from https://proceedings.mlr.press/v235/hooda24a.html.

Related Material