GitHub’s Copilot Code Review: Can AI Spot Security Flaws Before You Commit?

Amena Amro, Manar Alalfi
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:825-832, 2026.

Abstract

As software development practices increasingly adopt AI-powered tools, ensuring that such tools can support secure coding has become critical. This study evaluates the effectiveness of GitHub Copilot’s recently introduced code review feature in detecting security vulnerabilities. Using a curated set of labeled vulnerable code samples drawn from diverse open-source projects spanning multiple programming languages and application domains, we systematically assessed Copilot’s ability to identify and provide feedback on common security flaws. Contrary to expectations, our results reveal that Copilot’s code review frequently fails to detect critical vulnerabilities such as SQL injection, cross-site scripting (XSS), and insecure deserialization. Instead, its feedback primarily addresses low-severity issues, such as coding style and typographical errors. These findings expose a significant gap between the perceived capabilities of AI-assisted code review and its actual effectiveness in supporting secure development practices. Our results highlight the continued necessity of dedicated security tools and manual code audits to ensure robust software security.

Cite this Paper


BibTeX
@InProceedings{pmlr-v318-amro26a, title = {GitHub’s Copilot Code Review: Can AI Spot Security Flaws Before You Commit?}, author = {Amro, Amena and Alalfi, Manar}, booktitle = {Proceedings of the The 39th Canadian Conference on Artificial Intelligence}, pages = {825--832}, year = {2026}, editor = {Bouzar-Benlabiod, Lydia and Leung, Carson}, volume = {318}, series = {Proceedings of Machine Learning Research}, month = {25--29 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v318/main/assets/amro26a/amro26a.pdf}, url = {https://proceedings.mlr.press/v318/amro26a.html}, abstract = {As software development practices increasingly adopt AI-powered tools, ensuring that such tools can support secure coding has become critical. This study evaluates the effectiveness of GitHub Copilot’s recently introduced code review feature in detecting security vulnerabilities. Using a curated set of labeled vulnerable code samples drawn from diverse open-source projects spanning multiple programming languages and application domains, we systematically assessed Copilot’s ability to identify and provide feedback on common security flaws. Contrary to expectations, our results reveal that Copilot’s code review frequently fails to detect critical vulnerabilities such as SQL injection, cross-site scripting (XSS), and insecure deserialization. Instead, its feedback primarily addresses low-severity issues, such as coding style and typographical errors. These findings expose a significant gap between the perceived capabilities of AI-assisted code review and its actual effectiveness in supporting secure development practices. Our results highlight the continued necessity of dedicated security tools and manual code audits to ensure robust software security.} }
Endnote
%0 Conference Paper %T GitHub’s Copilot Code Review: Can AI Spot Security Flaws Before You Commit? %A Amena Amro %A Manar Alalfi %B Proceedings of the The 39th Canadian Conference on Artificial Intelligence %C Proceedings of Machine Learning Research %D 2026 %E Lydia Bouzar-Benlabiod %E Carson Leung %F pmlr-v318-amro26a %I PMLR %P 825--832 %U https://proceedings.mlr.press/v318/amro26a.html %V 318 %X As software development practices increasingly adopt AI-powered tools, ensuring that such tools can support secure coding has become critical. This study evaluates the effectiveness of GitHub Copilot’s recently introduced code review feature in detecting security vulnerabilities. Using a curated set of labeled vulnerable code samples drawn from diverse open-source projects spanning multiple programming languages and application domains, we systematically assessed Copilot’s ability to identify and provide feedback on common security flaws. Contrary to expectations, our results reveal that Copilot’s code review frequently fails to detect critical vulnerabilities such as SQL injection, cross-site scripting (XSS), and insecure deserialization. Instead, its feedback primarily addresses low-severity issues, such as coding style and typographical errors. These findings expose a significant gap between the perceived capabilities of AI-assisted code review and its actual effectiveness in supporting secure development practices. Our results highlight the continued necessity of dedicated security tools and manual code audits to ensure robust software security.
APA
Amro, A. & Alalfi, M.. (2026). GitHub’s Copilot Code Review: Can AI Spot Security Flaws Before You Commit?. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:825-832 Available from https://proceedings.mlr.press/v318/amro26a.html.

Related Material