Software Component Prediction for Bug Reports

Wei Zhang, Chris Challis
Proceedings of The Eleventh Asian Conference on Machine Learning, PMLR 101:806-821, 2019.

Abstract

In a software life cycle, bugs could happen at any time. Assigning bugs to relevant components/developers is a crucial task for software development. It is also a tough and resource consuming job. First, there are many components in a complex system and it is hard to understand their interactions and identify the root cause. Second, the list of components keeps growing for actively developed products and it is not easy to catch all updates. This task also faces several challenges from the machine learning point of view: 1) the ground truth is mixed with multiple levels of labels; 2) the data are severely imbalanced. 3). concept drift as future bugs are unlikely to come from the same distribution as the historical data. In this paper, we present a machine learning based solution for the bug assignment problem. We build component classifiers using a multi-layer Neural Network, based on features that were learned from data directly. A hierarchical classification framework is proposed to address the mixed label problem and improve the prediction accuracy. We also introduce a recency based sampling procedure to alleviate the data imbalance and concept drift problem. Our solution can easily accommodate new data and handle continuous system development/update.

Cite this Paper


BibTeX
@InProceedings{pmlr-v101-zhang19c, title = {Software Component Prediction for Bug Reports}, author = {Zhang, Wei and Challis, Chris}, booktitle = {Proceedings of The Eleventh Asian Conference on Machine Learning}, pages = {806--821}, year = {2019}, editor = {Lee, Wee Sun and Suzuki, Taiji}, volume = {101}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v101/zhang19c/zhang19c.pdf}, url = {https://proceedings.mlr.press/v101/zhang19c.html}, abstract = {In a software life cycle, bugs could happen at any time. Assigning bugs to relevant components/developers is a crucial task for software development. It is also a tough and resource consuming job. First, there are many components in a complex system and it is hard to understand their interactions and identify the root cause. Second, the list of components keeps growing for actively developed products and it is not easy to catch all updates. This task also faces several challenges from the machine learning point of view: 1) the ground truth is mixed with multiple levels of labels; 2) the data are severely imbalanced. 3). concept drift as future bugs are unlikely to come from the same distribution as the historical data. In this paper, we present a machine learning based solution for the bug assignment problem. We build component classifiers using a multi-layer Neural Network, based on features that were learned from data directly. A hierarchical classification framework is proposed to address the mixed label problem and improve the prediction accuracy. We also introduce a recency based sampling procedure to alleviate the data imbalance and concept drift problem. Our solution can easily accommodate new data and handle continuous system development/update.} }
Endnote
%0 Conference Paper %T Software Component Prediction for Bug Reports %A Wei Zhang %A Chris Challis %B Proceedings of The Eleventh Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Wee Sun Lee %E Taiji Suzuki %F pmlr-v101-zhang19c %I PMLR %P 806--821 %U https://proceedings.mlr.press/v101/zhang19c.html %V 101 %X In a software life cycle, bugs could happen at any time. Assigning bugs to relevant components/developers is a crucial task for software development. It is also a tough and resource consuming job. First, there are many components in a complex system and it is hard to understand their interactions and identify the root cause. Second, the list of components keeps growing for actively developed products and it is not easy to catch all updates. This task also faces several challenges from the machine learning point of view: 1) the ground truth is mixed with multiple levels of labels; 2) the data are severely imbalanced. 3). concept drift as future bugs are unlikely to come from the same distribution as the historical data. In this paper, we present a machine learning based solution for the bug assignment problem. We build component classifiers using a multi-layer Neural Network, based on features that were learned from data directly. A hierarchical classification framework is proposed to address the mixed label problem and improve the prediction accuracy. We also introduce a recency based sampling procedure to alleviate the data imbalance and concept drift problem. Our solution can easily accommodate new data and handle continuous system development/update.
APA
Zhang, W. & Challis, C.. (2019). Software Component Prediction for Bug Reports. Proceedings of The Eleventh Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 101:806-821 Available from https://proceedings.mlr.press/v101/zhang19c.html.

Related Material