Software Component Prediction for Bug Reports
; Proceedings of The Eleventh Asian Conference on Machine Learning, PMLR 101:806-821, 2019.
In a software life cycle, bugs could happen at any time. Assigning bugs to relevant components/developers is a crucial task for software development. It is also a tough and resource consuming job. First, there are many components in a complex system and it is hard to understand their interactions and identify the root cause. Second, the list of components keeps growing for actively developed products and it is not easy to catch all updates. This task also faces several challenges from the machine learning point of view: 1) the ground truth is mixed with multiple levels of labels; 2) the data are severely imbalanced. 3). concept drift as future bugs are unlikely to come from the same distribution as the historical data. In this paper, we present a machine learning based solution for the bug assignment problem. We build component classifiers using a multi-layer Neural Network, based on features that were learned from data directly. A hierarchical classification framework is proposed to address the mixed label problem and improve the prediction accuracy. We also introduce a recency based sampling procedure to alleviate the data imbalance and concept drift problem. Our solution can easily accommodate new data and handle continuous system development/update.