Just-in-Time Defect Prediction Using Cost-Efficient Boosting Models

Elza Jung, MD Asif Khan
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:600-611, 2026.

Abstract

Just-in-time (JIT) software defect prediction (SDP) handles imbalanced commit data, reflecting real-world software where bugs are rare. Higher recall and a model’s ability to predict defects are crucial in such settings. Recently, many JIT-SDP approaches have been proposed, predominantly utilizing deep-learning (DL) models. However, tuned XGBoost among traditional classifiers, known for cost-efficiency, has not been explored. Therefore, we explore how hyperparameter(HP) tuned and SMOTE-rebalanced XGBoost perform in imbalanced datasets, focusing on AUC-ROC and Recall. Our findings indicate that selecting five key features can be as effective as using fourteen features. We further explain how HP tuning and the oversampling method improve XGBoost by 1.19%-6.48% in AUC-ROC and 19.32%-43.70% in Recall. Statistical analysis shows that the final XGBoost model achieves the best average performance among the evaluated baselines, with 0.7442 AUC-ROC, 0.4747 F1-Score, and 0.7099 Recall.

Cite this Paper


BibTeX
@InProceedings{pmlr-v318-jung26a, title = {Just-in-Time Defect Prediction Using Cost-Efficient Boosting Models}, author = {Jung, Elza and Khan, MD Asif}, booktitle = {Proceedings of the The 39th Canadian Conference on Artificial Intelligence}, pages = {600--611}, year = {2026}, editor = {Bouzar-Benlabiod, Lydia and Leung, Carson}, volume = {318}, series = {Proceedings of Machine Learning Research}, month = {25--29 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v318/main/assets/jung26a/jung26a.pdf}, url = {https://proceedings.mlr.press/v318/jung26a.html}, abstract = {Just-in-time (JIT) software defect prediction (SDP) handles imbalanced commit data, reflecting real-world software where bugs are rare. Higher recall and a model’s ability to predict defects are crucial in such settings. Recently, many JIT-SDP approaches have been proposed, predominantly utilizing deep-learning (DL) models. However, tuned XGBoost among traditional classifiers, known for cost-efficiency, has not been explored. Therefore, we explore how hyperparameter(HP) tuned and SMOTE-rebalanced XGBoost perform in imbalanced datasets, focusing on AUC-ROC and Recall. Our findings indicate that selecting five key features can be as effective as using fourteen features. We further explain how HP tuning and the oversampling method improve XGBoost by 1.19%-6.48% in AUC-ROC and 19.32%-43.70% in Recall. Statistical analysis shows that the final XGBoost model achieves the best average performance among the evaluated baselines, with 0.7442 AUC-ROC, 0.4747 F1-Score, and 0.7099 Recall.} }
Endnote
%0 Conference Paper %T Just-in-Time Defect Prediction Using Cost-Efficient Boosting Models %A Elza Jung %A MD Asif Khan %B Proceedings of the The 39th Canadian Conference on Artificial Intelligence %C Proceedings of Machine Learning Research %D 2026 %E Lydia Bouzar-Benlabiod %E Carson Leung %F pmlr-v318-jung26a %I PMLR %P 600--611 %U https://proceedings.mlr.press/v318/jung26a.html %V 318 %X Just-in-time (JIT) software defect prediction (SDP) handles imbalanced commit data, reflecting real-world software where bugs are rare. Higher recall and a model’s ability to predict defects are crucial in such settings. Recently, many JIT-SDP approaches have been proposed, predominantly utilizing deep-learning (DL) models. However, tuned XGBoost among traditional classifiers, known for cost-efficiency, has not been explored. Therefore, we explore how hyperparameter(HP) tuned and SMOTE-rebalanced XGBoost perform in imbalanced datasets, focusing on AUC-ROC and Recall. Our findings indicate that selecting five key features can be as effective as using fourteen features. We further explain how HP tuning and the oversampling method improve XGBoost by 1.19%-6.48% in AUC-ROC and 19.32%-43.70% in Recall. Statistical analysis shows that the final XGBoost model achieves the best average performance among the evaluated baselines, with 0.7442 AUC-ROC, 0.4747 F1-Score, and 0.7099 Recall.
APA
Jung, E. & Khan, M.A.. (2026). Just-in-Time Defect Prediction Using Cost-Efficient Boosting Models. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:600-611 Available from https://proceedings.mlr.press/v318/jung26a.html.

Related Material