Understanding Aggregations of Proper Learners in Multiclass Classification

Julian Asilis; Mikael Møller Høgsgaard; Grigoris Velegkas

Understanding Aggregations of Proper Learners in Multiclass Classification

Julian Asilis, Mikael Møller Høgsgaard, Grigoris Velegkas

Proceedings of The 36th International Conference on Algorithmic Learning Theory, PMLR 272:89-111, 2025.

Abstract

Multiclass learnability is known to exhibit a properness barrier: there are learnable classes which cannot be learned by any proper learner. Binary classification faces no such barrier for learnability, but a similar one for optimal learning, which can in general only be achieved by improper learners. Fortunately, recent advances in binary classification have demonstrated that this requirement can be satisfied using aggregations of proper learners, some of which are strikingly simple. This raises a natural question: to what extent can simple aggregations of proper learners overcome the properness barrier in multiclass classification? We give a positive answer to this question for classes which have finite Graph dimension, $d_G$. Namely, we demonstrate that the optimal binary learners of Hanneke, Larsen, and Aden-Ali et al. (appropriately generalized to the multiclass setting) achieve sample complexity $O\left( \frac{d_G + \ln(1 / \delta)}{\varepsilon}\right)$. This forms a strict improvement upon the sample complexity of ERM. We complement this with a lower bound demonstrating that for certain classes of Graph dimension $d_G$, majorities of ERM learners require $\Omega \left( \frac{d_G + \ln(1 / \delta)}{\varepsilon}\right)$ samples. Furthermore, we show that a single ERM requires $\Omega \left(\frac{d_G \ln(1 / \varepsilon) + \ln(1 / \delta)}{\varepsilon}\right)$ samples on such classes, exceeding the lower bound of Daniel et al. (2015) by a factor of $\ln(1 / \varepsilon)$. For multiclass learning in full generality — i.e., for classes of finite DS dimension but possibly infinite Graph dimension — we give a strong refutation to these learning strategies, by exhibiting a learnable class which cannot be learned to constant error by any aggregation of a finite number of proper learners.

Cite this Paper

BibTeX

@InProceedings{pmlr-v272-asilis25a,
  title = 	 {Understanding Aggregations of Proper Learners in Multiclass Classification},
  author =       {Asilis, Julian and H{\o}gsgaard, Mikael M{\o}ller and Velegkas, Grigoris},
  booktitle = 	 {Proceedings of The 36th International Conference on Algorithmic Learning Theory},
  pages = 	 {89--111},
  year = 	 {2025},
  editor = 	 {Kamath, Gautam and Loh, Po-Ling},
  volume = 	 {272},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {24--27 Feb},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v272/main/assets/asilis25a/asilis25a.pdf},
  url = 	 {https://proceedings.mlr.press/v272/asilis25a.html},
  abstract = 	 {Multiclass learnability is known to exhibit a properness barrier: there are learnable classes which cannot be learned by any proper learner. Binary classification faces no such barrier for learnability, but a similar one for optimal learning, which can in general only be achieved by improper learners. Fortunately, recent advances in binary classification have demonstrated that this requirement can be satisfied using aggregations of proper learners, some of which are strikingly simple. This raises a natural question: to what extent can simple aggregations of proper learners overcome the properness barrier in multiclass classification?  We give a positive answer to this question for classes which have finite Graph dimension, $d_G$. Namely, we demonstrate that the optimal binary learners of Hanneke, Larsen, and Aden-Ali et al. (appropriately generalized to the multiclass setting) achieve sample complexity $O\left( \frac{d_G + \ln(1 / \delta)}{\varepsilon}\right)$. This forms a strict improvement upon the sample complexity of ERM. We complement this with a lower bound demonstrating that for certain classes of Graph dimension $d_G$, majorities of ERM learners require $\Omega \left( \frac{d_G + \ln(1 / \delta)}{\varepsilon}\right)$ samples. Furthermore, we show that a single ERM requires $\Omega \left(\frac{d_G \ln(1 / \varepsilon) + \ln(1 / \delta)}{\varepsilon}\right)$ samples on such classes, exceeding the lower bound of Daniel et al. (2015) by a factor of $\ln(1 / \varepsilon)$. For multiclass learning in full generality  — i.e., for classes of finite DS dimension but possibly infinite Graph dimension —  we give a strong refutation to these learning strategies, by exhibiting a learnable class which cannot be learned to constant error by any aggregation of a finite number of proper learners.}
}

Endnote

%0 Conference Paper
%T Understanding Aggregations of Proper Learners in Multiclass Classification
%A Julian Asilis
%A Mikael Møller Høgsgaard
%A Grigoris Velegkas
%B Proceedings of The 36th International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2025
%E Gautam Kamath
%E Po-Ling Loh	
%F pmlr-v272-asilis25a
%I PMLR
%P 89--111
%U https://proceedings.mlr.press/v272/asilis25a.html
%V 272
%X Multiclass learnability is known to exhibit a properness barrier: there are learnable classes which cannot be learned by any proper learner. Binary classification faces no such barrier for learnability, but a similar one for optimal learning, which can in general only be achieved by improper learners. Fortunately, recent advances in binary classification have demonstrated that this requirement can be satisfied using aggregations of proper learners, some of which are strikingly simple. This raises a natural question: to what extent can simple aggregations of proper learners overcome the properness barrier in multiclass classification?  We give a positive answer to this question for classes which have finite Graph dimension, $d_G$. Namely, we demonstrate that the optimal binary learners of Hanneke, Larsen, and Aden-Ali et al. (appropriately generalized to the multiclass setting) achieve sample complexity $O\left( \frac{d_G + \ln(1 / \delta)}{\varepsilon}\right)$. This forms a strict improvement upon the sample complexity of ERM. We complement this with a lower bound demonstrating that for certain classes of Graph dimension $d_G$, majorities of ERM learners require $\Omega \left( \frac{d_G + \ln(1 / \delta)}{\varepsilon}\right)$ samples. Furthermore, we show that a single ERM requires $\Omega \left(\frac{d_G \ln(1 / \varepsilon) + \ln(1 / \delta)}{\varepsilon}\right)$ samples on such classes, exceeding the lower bound of Daniel et al. (2015) by a factor of $\ln(1 / \varepsilon)$. For multiclass learning in full generality  — i.e., for classes of finite DS dimension but possibly infinite Graph dimension —  we give a strong refutation to these learning strategies, by exhibiting a learnable class which cannot be learned to constant error by any aggregation of a finite number of proper learners.

APA

Asilis, J., Høgsgaard, M.M. & Velegkas, G.. (2025). Understanding Aggregations of Proper Learners in Multiclass Classification. Proceedings of The 36th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 272:89-111 Available from https://proceedings.mlr.press/v272/asilis25a.html.

Understanding Aggregations of Proper Learners in Multiclass Classification

Abstract

Cite this Paper

Related Material