Towards Understanding Generalization of Macro-AUC in Multi-label Learning

Guoqiang Wu; Chongxuan Li; Yilong Yin

Towards Understanding Generalization of Macro-AUC in Multi-label Learning

Guoqiang Wu, Chongxuan Li, Yilong Yin

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:37540-37570, 2023.

Abstract

Macro-AUC is the arithmetic mean of the class-wise AUCs in multi-label learning and is commonly used in practice. However, its theoretical understanding is far lacking. Toward solving it, we characterize the generalization properties of various learning algorithms based on the corresponding surrogate losses w.r.t. Macro-AUC. We theoretically identify a critical factor of the dataset affecting the generalization bounds: the label-wise class imbalance. Our results on the imbalance-aware error bounds show that the widely-used univariate loss-based algorithm is more sensitive to the label-wise class imbalance than the proposed pairwise and reweighted loss-based ones, which probably implies its worse performance. Moreover, empirical results on various datasets corroborate our theory findings. To establish it, technically, we propose a new (and more general) McDiarmid-type concentration inequality, which may be of independent interest.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-wu23l,
  title = 	 {Towards Understanding Generalization of Macro-{AUC} in Multi-label Learning},
  author =       {Wu, Guoqiang and Li, Chongxuan and Yin, Yilong},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {37540--37570},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/wu23l/wu23l.pdf},
  url = 	 {https://proceedings.mlr.press/v202/wu23l.html},
  abstract = 	 {Macro-AUC is the arithmetic mean of the class-wise AUCs in multi-label learning and is commonly used in practice. However, its theoretical understanding is far lacking. Toward solving it, we characterize the generalization properties of various learning algorithms based on the corresponding surrogate losses w.r.t. Macro-AUC. We theoretically identify a critical factor of the dataset affecting the generalization bounds: the label-wise class imbalance. Our results on the imbalance-aware error bounds show that the widely-used univariate loss-based algorithm is more sensitive to the label-wise class imbalance than the proposed pairwise and reweighted loss-based ones, which probably implies its worse performance. Moreover, empirical results on various datasets corroborate our theory findings. To establish it, technically, we propose a new (and more general) McDiarmid-type concentration inequality, which may be of independent interest.}
}

Endnote

%0 Conference Paper
%T Towards Understanding Generalization of Macro-AUC in Multi-label Learning
%A Guoqiang Wu
%A Chongxuan Li
%A Yilong Yin
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-wu23l
%I PMLR
%P 37540--37570
%U https://proceedings.mlr.press/v202/wu23l.html
%V 202
%X Macro-AUC is the arithmetic mean of the class-wise AUCs in multi-label learning and is commonly used in practice. However, its theoretical understanding is far lacking. Toward solving it, we characterize the generalization properties of various learning algorithms based on the corresponding surrogate losses w.r.t. Macro-AUC. We theoretically identify a critical factor of the dataset affecting the generalization bounds: the label-wise class imbalance. Our results on the imbalance-aware error bounds show that the widely-used univariate loss-based algorithm is more sensitive to the label-wise class imbalance than the proposed pairwise and reweighted loss-based ones, which probably implies its worse performance. Moreover, empirical results on various datasets corroborate our theory findings. To establish it, technically, we propose a new (and more general) McDiarmid-type concentration inequality, which may be of independent interest.

APA


Wu, G., Li, C. & Yin, Y.. (2023). Towards Understanding Generalization of Macro-AUC in Multi-label Learning. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:37540-37570 Available from https://proceedings.mlr.press/v202/wu23l.html.

Towards Understanding Generalization of Macro-AUC in Multi-label Learning

Abstract

Cite this Paper

Related Material