[edit]
Just-in-Time Defect Prediction Using Cost-Efficient Boosting Models
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:600-611, 2026.
Abstract
Just-in-time (JIT) software defect prediction (SDP) handles imbalanced commit data, reflecting real-world software where bugs are rare. Higher recall and a model’s ability to predict defects are crucial in such settings. Recently, many JIT-SDP approaches have been proposed, predominantly utilizing deep-learning (DL) models. However, tuned XGBoost among traditional classifiers, known for cost-efficiency, has not been explored. Therefore, we explore how hyperparameter(HP) tuned and SMOTE-rebalanced XGBoost perform in imbalanced datasets, focusing on AUC-ROC and Recall. Our findings indicate that selecting five key features can be as effective as using fourteen features. We further explain how HP tuning and the oversampling method improve XGBoost by 1.19%-6.48% in AUC-ROC and 19.32%-43.70% in Recall. Statistical analysis shows that the final XGBoost model achieves the best average performance among the evaluated baselines, with 0.7442 AUC-ROC, 0.4747 F1-Score, and 0.7099 Recall.