The sample complexity of multi-distribution learning

Binghui Peng

The sample complexity of multi-distribution learning

Binghui Peng

Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:4185-4204, 2024.

Abstract

Multi-distribution learning generalizes the classic PAC learning to handle data coming from multiple distributions. Given a set of

$k$ data distributions and a hypothesis class of VC dimension

$d$ , the goal is to learn a hypothesis that minimizes the maximum population loss over

$k$ distributions, up to

$\epsilon$ additive error. In this paper, we settle the sample complexity of multi-distribution learning by giving an algorithm of sample complexity

$\widetilde{O}((d+k)\epsilon^{-2}) \cdot (k/\epsilon)^{o(1)}$ . This matches the lower bound up to sub-polynomial factor and resolves the COLT 2023 open problem of Awasthi, Haghtalab and Zhao.

Cite this Paper

BibTeX


@InProceedings{pmlr-v247-peng24b,
  title = 	 {The sample complexity of multi-distribution learning},
  author =       {Peng, Binghui},
  booktitle = 	 {Proceedings of Thirty Seventh Conference on Learning Theory},
  pages = 	 {4185--4204},
  year = 	 {2024},
  editor = 	 {Agrawal, Shipra and Roth, Aaron},
  volume = 	 {247},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v247/peng24b/peng24b.pdf},
  url = 	 {https://proceedings.mlr.press/v247/peng24b.html},
  abstract = 	 { Multi-distribution learning generalizes the classic PAC learning to handle data coming from multiple distributions. Given a set of $k$ data distributions and a hypothesis class of VC dimension $d$, the goal is to learn a hypothesis that minimizes the maximum population loss over $k$ distributions, up to $\epsilon$ additive error. In this paper, we settle the sample complexity of multi-distribution learning by giving an algorithm of sample complexity $\widetilde{O}((d+k)\epsilon^{-2}) \cdot (k/\epsilon)^{o(1)}$. This matches the lower bound up to sub-polynomial factor and resolves the COLT 2023 open problem of Awasthi, Haghtalab and Zhao.}
}

Endnote

%0 Conference Paper
%T The sample complexity of multi-distribution learning
%A Binghui Peng
%B Proceedings of Thirty Seventh Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2024
%E Shipra Agrawal
%E Aaron Roth	
%F pmlr-v247-peng24b
%I PMLR
%P 4185--4204
%U https://proceedings.mlr.press/v247/peng24b.html
%V 247
%X  Multi-distribution learning generalizes the classic PAC learning to handle data coming from multiple distributions. Given a set of $k$ data distributions and a hypothesis class of VC dimension $d$, the goal is to learn a hypothesis that minimizes the maximum population loss over $k$ distributions, up to $\epsilon$ additive error. In this paper, we settle the sample complexity of multi-distribution learning by giving an algorithm of sample complexity $\widetilde{O}((d+k)\epsilon^{-2}) \cdot (k/\epsilon)^{o(1)}$. This matches the lower bound up to sub-polynomial factor and resolves the COLT 2023 open problem of Awasthi, Haghtalab and Zhao.

APA


Peng, B.. (2024). The sample complexity of multi-distribution learning. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:4185-4204 Available from https://proceedings.mlr.press/v247/peng24b.html.

Related Material

Download PDF