Extreme Multi-label Classification from Aggregated Labels

Yanyao Shen, Hsiang-Fu Yu, Sujay Sanghavi, Inderjit Dhillon
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:8752-8762, 2020.

Abstract

Extreme multi-label classification (XMC) is the problem of finding the relevant labels for an input, from a very large universe of possible labels. We consider XMC in the setting where labels are available only for groups of samples - but not for individual ones. Current XMC approaches are not built for such multi-instance multi-label (MIML) training data, and MIML approaches do not scale to XMC sizes. We develop a new and scalable algorithm to impute individual-sample labels from the group labels; this can be paired with any existing XMC method to solve the aggregated label problem. We characterize the statistical properties of our algorithm under mild assumptions, and provide a new end-to-end framework for MIML as an extension. Experiments on both aggregated label XMC and MIML tasks show the advantages over existing approaches.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-shen20f, title = {Extreme Multi-label Classification from Aggregated Labels}, author = {Shen, Yanyao and Yu, Hsiang-Fu and Sanghavi, Sujay and Dhillon, Inderjit}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {8752--8762}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/shen20f/shen20f.pdf}, url = {https://proceedings.mlr.press/v119/shen20f.html}, abstract = {Extreme multi-label classification (XMC) is the problem of finding the relevant labels for an input, from a very large universe of possible labels. We consider XMC in the setting where labels are available only for groups of samples - but not for individual ones. Current XMC approaches are not built for such multi-instance multi-label (MIML) training data, and MIML approaches do not scale to XMC sizes. We develop a new and scalable algorithm to impute individual-sample labels from the group labels; this can be paired with any existing XMC method to solve the aggregated label problem. We characterize the statistical properties of our algorithm under mild assumptions, and provide a new end-to-end framework for MIML as an extension. Experiments on both aggregated label XMC and MIML tasks show the advantages over existing approaches.} }
Endnote
%0 Conference Paper %T Extreme Multi-label Classification from Aggregated Labels %A Yanyao Shen %A Hsiang-Fu Yu %A Sujay Sanghavi %A Inderjit Dhillon %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-shen20f %I PMLR %P 8752--8762 %U https://proceedings.mlr.press/v119/shen20f.html %V 119 %X Extreme multi-label classification (XMC) is the problem of finding the relevant labels for an input, from a very large universe of possible labels. We consider XMC in the setting where labels are available only for groups of samples - but not for individual ones. Current XMC approaches are not built for such multi-instance multi-label (MIML) training data, and MIML approaches do not scale to XMC sizes. We develop a new and scalable algorithm to impute individual-sample labels from the group labels; this can be paired with any existing XMC method to solve the aggregated label problem. We characterize the statistical properties of our algorithm under mild assumptions, and provide a new end-to-end framework for MIML as an extension. Experiments on both aggregated label XMC and MIML tasks show the advantages over existing approaches.
APA
Shen, Y., Yu, H., Sanghavi, S. & Dhillon, I.. (2020). Extreme Multi-label Classification from Aggregated Labels. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:8752-8762 Available from https://proceedings.mlr.press/v119/shen20f.html.

Related Material