PD-Sparse : A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:3069-3077, 2016.
We consider Multiclass and Multilabel classification with extremely large number of classes, of which only few are labeled to each instance. In such setting, standard methods that have training, prediction cost linear to the number of classes become intractable. State-of-the-art methods thus aim to reduce the complexity by exploiting correlation between labels under assumption that the similarity between labels can be captured by structures such as low-rank matrix or balanced tree. However, as the diversity of labels increases in the feature space, structural assumption can be easily violated, which leads to degrade in the testing performance. In this work, we show that a margin-maximizing loss with l1 penalty, in case of Extreme Classification, yields extremely sparse solution both in primal and in dual without sacrificing the expressive power of predictor. We thus propose a Fully-Corrective Block-Coordinate Frank-Wolfe (FC-BCFW) algorithm that exploits both primal and dual sparsity to achieve a complexity sublinear to the number of primal and dual variables. A bi-stochastic search method is proposed to further improve the efficiency. In our experiments on both Multiclass and Multilabel problems, the proposed method achieves significant higher accuracy than existing approaches of Extreme Classification with very competitive training and prediction time.