Learners that Use Little Information

Raef Bassily, Shay Moran, Ido Nachum, Jonathan Shafer, Amir Yehudayoff
Proceedings of Algorithmic Learning Theory, PMLR 83:25-55, 2018.

Abstract

We study learning algorithms that are restricted to using a small amount of information from their input sample. We introduce a category of learning algorithms we term {\em $d$-bit information learners}, which are algorithms whose output conveys at most $d$ bits of information of their input. A central theme in this work is that such algorithms generalize. We focus on the learning capacity of these algorithms, and prove sample complexity bounds with tight dependencies on the confidence and error parameters. We also observe connections with well studied notions such as sample compression schemes, Occam’s razor, PAC-Bayes and differential privacy. We discuss an approach that allows us to prove upper bounds on the amount of information that algorithms reveal about their inputs, and also provide a lower bound by showing a simple concept class for which every (possibly randomized) empirical risk minimizer must reveal a lot of information. On the other hand, we show that in the distribution-dependent setting every VC class has empirical risk minimizers that do not reveal a lot of information.

Cite this Paper


BibTeX
@InProceedings{pmlr-v83-bassily18a, title = {Learners that Use Little Information}, author = {Bassily, Raef and Moran, Shay and Nachum, Ido and Shafer, Jonathan and Yehudayoff, Amir}, booktitle = {Proceedings of Algorithmic Learning Theory}, pages = {25--55}, year = {2018}, editor = {Janoos, Firdaus and Mohri, Mehryar and Sridharan, Karthik}, volume = {83}, series = {Proceedings of Machine Learning Research}, month = {07--09 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v83/bassily18a/bassily18a.pdf}, url = {https://proceedings.mlr.press/v83/bassily18a.html}, abstract = { We study learning algorithms that are restricted to using a small amount of information from their input sample. We introduce a category of learning algorithms we term {\em $d$-bit information learners}, which are algorithms whose output conveys at most $d$ bits of information of their input. A central theme in this work is that such algorithms generalize. We focus on the learning capacity of these algorithms, and prove sample complexity bounds with tight dependencies on the confidence and error parameters. We also observe connections with well studied notions such as sample compression schemes, Occam’s razor, PAC-Bayes and differential privacy. We discuss an approach that allows us to prove upper bounds on the amount of information that algorithms reveal about their inputs, and also provide a lower bound by showing a simple concept class for which every (possibly randomized) empirical risk minimizer must reveal a lot of information. On the other hand, we show that in the distribution-dependent setting every VC class has empirical risk minimizers that do not reveal a lot of information. } }
Endnote
%0 Conference Paper %T Learners that Use Little Information %A Raef Bassily %A Shay Moran %A Ido Nachum %A Jonathan Shafer %A Amir Yehudayoff %B Proceedings of Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2018 %E Firdaus Janoos %E Mehryar Mohri %E Karthik Sridharan %F pmlr-v83-bassily18a %I PMLR %P 25--55 %U https://proceedings.mlr.press/v83/bassily18a.html %V 83 %X We study learning algorithms that are restricted to using a small amount of information from their input sample. We introduce a category of learning algorithms we term {\em $d$-bit information learners}, which are algorithms whose output conveys at most $d$ bits of information of their input. A central theme in this work is that such algorithms generalize. We focus on the learning capacity of these algorithms, and prove sample complexity bounds with tight dependencies on the confidence and error parameters. We also observe connections with well studied notions such as sample compression schemes, Occam’s razor, PAC-Bayes and differential privacy. We discuss an approach that allows us to prove upper bounds on the amount of information that algorithms reveal about their inputs, and also provide a lower bound by showing a simple concept class for which every (possibly randomized) empirical risk minimizer must reveal a lot of information. On the other hand, we show that in the distribution-dependent setting every VC class has empirical risk minimizers that do not reveal a lot of information.
APA
Bassily, R., Moran, S., Nachum, I., Shafer, J. & Yehudayoff, A.. (2018). Learners that Use Little Information. Proceedings of Algorithmic Learning Theory, in Proceedings of Machine Learning Research 83:25-55 Available from https://proceedings.mlr.press/v83/bassily18a.html.

Related Material