[edit]
Using Credal C4.5 for Calibrated Label Ranking in Multi-Label Classification
Proceedings of the Twelveth International Symposium on Imprecise Probability: Theories and Applications, PMLR 147:220-228, 2021.
Abstract
The Multi-Label Classification (MLC) task aims to predict the set of labels that correspond to an instance. It differs from traditional classification, which assumes that each instance has associated a single value of a class variable. Within MLC, the Calibrated Label Ranking algorithm (CLR) considers a binary classification problem for each pair of labels to determine a label ranking for a given instance, exploiting in this way correlations between pairs of labels. Moreover, CLR mitigates the class imbalance problem that frequently appears in MLC motivated by the fact that, in MLC, there are usually very few instances that have associated a certain label. For solving the binary classification problems, a traditional classification algorithm is needed. The C4.5 algorithm, based on Decision Trees, has been widely employed in this domain. In this work, we show that the Credal C4.5 method, a version of C4.5 recently proposed that uses imprecise probabilities, is more suitable than C4.5 for solving the binary classification problems in CLR. An exhaustive experimental analysis carried out in this research shows that Credal C4.5 performs better than C4.5 when both algorithms are employed in CLR, being the improvement more notable as there is more noise in the labels.