A Direct Sum Result for the Information Complexity of Learning

Ido Nachum; Jonathan Shafer; Amir Yehudayoff

A Direct Sum Result for the Information Complexity of Learning

Ido Nachum, Jonathan Shafer, Amir Yehudayoff

Proceedings of the 31st Conference On Learning Theory, PMLR 75:1547-1568, 2018.

Abstract

How many bits of information are required to PAC learn a class of hypotheses of VC dimension $d$? The mathematical setting we follow is that of Bassily et al., where the value of interest is the mutual information $\mathrm{I}(S;A(S))$ between the input sample $S$ and the hypothesis outputted by the learning algorithm $A$. We introduce a class of functions of VC dimension $d$ over the domain $\mathcal{X}$ with information complexity at least $\Omega \left(d\log \log \frac{|\mathcal{X}|}{d}\right)$ bits for any consistent and proper algorithm (deterministic or random). Bassily et al. proved a similar (but quantitatively weaker) result for the case $d=1$. The above result is in fact a special case of a more general phenomenon we explore. We define the notion of {\em information complexity} of a given class of functions $\cH$. Intuitively, it is the minimum amount of information that an algorithm for $\mathcal{X}$ must retain about its input to ensure consistency and properness. We prove a direct sum result for information complexity in this context; roughly speaking, the information complexity sums when combining several classes.

Cite this Paper

BibTeX


@InProceedings{pmlr-v75-nachum18a,
  title = 	 {A Direct Sum Result for the Information Complexity of Learning},
  author =       {Nachum, Ido and Shafer, Jonathan and Yehudayoff, Amir},
  booktitle = 	 {Proceedings of the 31st  Conference On Learning Theory},
  pages = 	 {1547--1568},
  year = 	 {2018},
  editor = 	 {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe},
  volume = 	 {75},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v75/nachum18a/nachum18a.pdf},
  url = 	 {https://proceedings.mlr.press/v75/nachum18a.html},
  abstract = 	 { How many bits of information are required to PAC learn a class of hypotheses of VC dimension $d$? The mathematical setting we follow is that of Bassily et al., where the value of interest is the mutual information $\mathrm{I}(S;A(S))$ between the input sample $S$ and the hypothesis outputted by the learning algorithm $A$. We introduce a class of functions of VC dimension $d$ over the domain $\mathcal{X}$ with information complexity at least $\Omega \left(d\log \log \frac{|\mathcal{X}|}{d}\right)$ bits for any consistent and proper algorithm (deterministic or random). Bassily et al. proved a similar (but quantitatively weaker) result for the case $d=1$. The above result is in fact a special case of a more general phenomenon we explore. We define the notion of {\em information complexity} of a given class of functions $\cH$. Intuitively, it is the minimum amount of information that an algorithm for $\mathcal{X}$ must retain about its input to ensure consistency and properness. We prove a direct sum result for information complexity in this context; roughly speaking, the information complexity sums when combining several classes. }
}

Endnote

%0 Conference Paper
%T A Direct Sum Result for the Information Complexity of Learning
%A Ido Nachum
%A Jonathan Shafer
%A Amir Yehudayoff
%B Proceedings of the 31st  Conference On Learning Theory
%C Proceedings of Machine Learning Research
%D 2018
%E Sébastien Bubeck
%E Vianney Perchet
%E Philippe Rigollet	
%F pmlr-v75-nachum18a
%I PMLR
%P 1547--1568
%U https://proceedings.mlr.press/v75/nachum18a.html
%V 75
%X  How many bits of information are required to PAC learn a class of hypotheses of VC dimension $d$? The mathematical setting we follow is that of Bassily et al., where the value of interest is the mutual information $\mathrm{I}(S;A(S))$ between the input sample $S$ and the hypothesis outputted by the learning algorithm $A$. We introduce a class of functions of VC dimension $d$ over the domain $\mathcal{X}$ with information complexity at least $\Omega \left(d\log \log \frac{|\mathcal{X}|}{d}\right)$ bits for any consistent and proper algorithm (deterministic or random). Bassily et al. proved a similar (but quantitatively weaker) result for the case $d=1$. The above result is in fact a special case of a more general phenomenon we explore. We define the notion of {\em information complexity} of a given class of functions $\cH$. Intuitively, it is the minimum amount of information that an algorithm for $\mathcal{X}$ must retain about its input to ensure consistency and properness. We prove a direct sum result for information complexity in this context; roughly speaking, the information complexity sums when combining several classes.

APA


Nachum, I., Shafer, J. & Yehudayoff, A.. (2018). A Direct Sum Result for the Information Complexity of Learning. Proceedings of the 31st  Conference On Learning Theory, in Proceedings of Machine Learning Research 75:1547-1568 Available from https://proceedings.mlr.press/v75/nachum18a.html.

Related Material

Download PDF