A Network Perspective on Stratification of Multi-Label Data

Piotr Szymański, Tomasz Kajdanowicz
Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR 74:22-35, 2017.

Abstract

We present a new approach to stratifying multi-label data for classification purposes based on the iterative stratification approach proposed by Sechidis et. al. in an ECML PKDD 2011 paper. Our method extends the iterative approach to take into account second-order relationships between labels. Obtained results are evaluated using statistical properties of obtained strata as presented by Sechidis. We also propose new statistical measures relevant to second-order quality: label pairs distribution, the percentage of label pairs without positive evidence in folds and label pair - fold pairs that have no positive evidence for the label pair. We verify the impact of new methods on classification performance of Binary Relevance, Label Powerset and a fast greedy community detection based label space partitioning classifier. The proposed approach lowers the variance of classification quality, improves label pair oriented measures and example distribution while maintaining a competitive quality in label-oriented measures. We also witness an increase in stability of network characteristics.

Cite this Paper


BibTeX
@InProceedings{pmlr-v74-szymański17a, title = {A Network Perspective on Stratification of Multi-Label Data}, author = {Szymański, Piotr and Kajdanowicz, Tomasz}, booktitle = {Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications}, pages = {22--35}, year = {2017}, editor = {Luís Torgo, Paula Branco and Moniz, Nuno}, volume = {74}, series = {Proceedings of Machine Learning Research}, month = {22 Sep}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v74/szymański17a/szymański17a.pdf}, url = {https://proceedings.mlr.press/v74/szyma%C5%84ski17a.html}, abstract = {We present a new approach to stratifying multi-label data for classification purposes based on the iterative stratification approach proposed by Sechidis et. al. in an ECML PKDD 2011 paper. Our method extends the iterative approach to take into account second-order relationships between labels. Obtained results are evaluated using statistical properties of obtained strata as presented by Sechidis. We also propose new statistical measures relevant to second-order quality: label pairs distribution, the percentage of label pairs without positive evidence in folds and label pair - fold pairs that have no positive evidence for the label pair. We verify the impact of new methods on classification performance of Binary Relevance, Label Powerset and a fast greedy community detection based label space partitioning classifier. The proposed approach lowers the variance of classification quality, improves label pair oriented measures and example distribution while maintaining a competitive quality in label-oriented measures. We also witness an increase in stability of network characteristics.} }
Endnote
%0 Conference Paper %T A Network Perspective on Stratification of Multi-Label Data %A Piotr Szymański %A Tomasz Kajdanowicz %B Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications %C Proceedings of Machine Learning Research %D 2017 %E Paula Branco Luís Torgo %E Nuno Moniz %F pmlr-v74-szymański17a %I PMLR %P 22--35 %U https://proceedings.mlr.press/v74/szyma%C5%84ski17a.html %V 74 %X We present a new approach to stratifying multi-label data for classification purposes based on the iterative stratification approach proposed by Sechidis et. al. in an ECML PKDD 2011 paper. Our method extends the iterative approach to take into account second-order relationships between labels. Obtained results are evaluated using statistical properties of obtained strata as presented by Sechidis. We also propose new statistical measures relevant to second-order quality: label pairs distribution, the percentage of label pairs without positive evidence in folds and label pair - fold pairs that have no positive evidence for the label pair. We verify the impact of new methods on classification performance of Binary Relevance, Label Powerset and a fast greedy community detection based label space partitioning classifier. The proposed approach lowers the variance of classification quality, improves label pair oriented measures and example distribution while maintaining a competitive quality in label-oriented measures. We also witness an increase in stability of network characteristics.
APA
Szymański, P. & Kajdanowicz, T.. (2017). A Network Perspective on Stratification of Multi-Label Data. Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications, in Proceedings of Machine Learning Research 74:22-35 Available from https://proceedings.mlr.press/v74/szyma%C5%84ski17a.html.

Related Material