Rademacher Observations, Private Data, and Boosting

Richard Nock, Giorgio Patrini, Arik Friedman
Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:948-956, 2015.

Abstract

The minimization of the logistic loss is a popular approach to batch supervised learning. Our paper starts from the surprising observation that, when fitting linear classifiers, the minimization of the logistic loss is \textitequivalent to the minimization of an exponential \textitrado-loss computed (i) over transformed data that we call Rademacher observations (rados), and (ii) over the \textitsame classifier as the one of the logistic loss. Thus, a classifier learnt from rados can be \textitdirectly used to classify \textitobservations. We provide a learning algorithm over rados with boosting-compliant convergence rates on the \textitlogistic loss (computed over examples). Experiments on domains with up to millions of examples, backed up by theoretical arguments, display that learning over a small set of random rados can challenge the state of the art that learns over the \textitcomplete set of examples. We show that rados comply with various privacy requirements that make them good candidates for machine learning in a privacy framework. We give several algebraic, geometric and computational hardness results on reconstructing examples from rados. We also show how it is possible to craft, and efficiently learn from, rados in a differential privacy framework. Tests reveal that learning from differentially private rados brings non-trivial privacy vs accuracy tradeoffs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v37-nock15, title = {Rademacher Observations, Private Data, and Boosting}, author = {Nock, Richard and Patrini, Giorgio and Friedman, Arik}, booktitle = {Proceedings of the 32nd International Conference on Machine Learning}, pages = {948--956}, year = {2015}, editor = {Bach, Francis and Blei, David}, volume = {37}, series = {Proceedings of Machine Learning Research}, address = {Lille, France}, month = {07--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v37/nock15.pdf}, url = {https://proceedings.mlr.press/v37/nock15.html}, abstract = {The minimization of the logistic loss is a popular approach to batch supervised learning. Our paper starts from the surprising observation that, when fitting linear classifiers, the minimization of the logistic loss is \textitequivalent to the minimization of an exponential \textitrado-loss computed (i) over transformed data that we call Rademacher observations (rados), and (ii) over the \textitsame classifier as the one of the logistic loss. Thus, a classifier learnt from rados can be \textitdirectly used to classify \textitobservations. We provide a learning algorithm over rados with boosting-compliant convergence rates on the \textitlogistic loss (computed over examples). Experiments on domains with up to millions of examples, backed up by theoretical arguments, display that learning over a small set of random rados can challenge the state of the art that learns over the \textitcomplete set of examples. We show that rados comply with various privacy requirements that make them good candidates for machine learning in a privacy framework. We give several algebraic, geometric and computational hardness results on reconstructing examples from rados. We also show how it is possible to craft, and efficiently learn from, rados in a differential privacy framework. Tests reveal that learning from differentially private rados brings non-trivial privacy vs accuracy tradeoffs.} }
Endnote
%0 Conference Paper %T Rademacher Observations, Private Data, and Boosting %A Richard Nock %A Giorgio Patrini %A Arik Friedman %B Proceedings of the 32nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2015 %E Francis Bach %E David Blei %F pmlr-v37-nock15 %I PMLR %P 948--956 %U https://proceedings.mlr.press/v37/nock15.html %V 37 %X The minimization of the logistic loss is a popular approach to batch supervised learning. Our paper starts from the surprising observation that, when fitting linear classifiers, the minimization of the logistic loss is \textitequivalent to the minimization of an exponential \textitrado-loss computed (i) over transformed data that we call Rademacher observations (rados), and (ii) over the \textitsame classifier as the one of the logistic loss. Thus, a classifier learnt from rados can be \textitdirectly used to classify \textitobservations. We provide a learning algorithm over rados with boosting-compliant convergence rates on the \textitlogistic loss (computed over examples). Experiments on domains with up to millions of examples, backed up by theoretical arguments, display that learning over a small set of random rados can challenge the state of the art that learns over the \textitcomplete set of examples. We show that rados comply with various privacy requirements that make them good candidates for machine learning in a privacy framework. We give several algebraic, geometric and computational hardness results on reconstructing examples from rados. We also show how it is possible to craft, and efficiently learn from, rados in a differential privacy framework. Tests reveal that learning from differentially private rados brings non-trivial privacy vs accuracy tradeoffs.
RIS
TY - CPAPER TI - Rademacher Observations, Private Data, and Boosting AU - Richard Nock AU - Giorgio Patrini AU - Arik Friedman BT - Proceedings of the 32nd International Conference on Machine Learning DA - 2015/06/01 ED - Francis Bach ED - David Blei ID - pmlr-v37-nock15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 37 SP - 948 EP - 956 L1 - http://proceedings.mlr.press/v37/nock15.pdf UR - https://proceedings.mlr.press/v37/nock15.html AB - The minimization of the logistic loss is a popular approach to batch supervised learning. Our paper starts from the surprising observation that, when fitting linear classifiers, the minimization of the logistic loss is \textitequivalent to the minimization of an exponential \textitrado-loss computed (i) over transformed data that we call Rademacher observations (rados), and (ii) over the \textitsame classifier as the one of the logistic loss. Thus, a classifier learnt from rados can be \textitdirectly used to classify \textitobservations. We provide a learning algorithm over rados with boosting-compliant convergence rates on the \textitlogistic loss (computed over examples). Experiments on domains with up to millions of examples, backed up by theoretical arguments, display that learning over a small set of random rados can challenge the state of the art that learns over the \textitcomplete set of examples. We show that rados comply with various privacy requirements that make them good candidates for machine learning in a privacy framework. We give several algebraic, geometric and computational hardness results on reconstructing examples from rados. We also show how it is possible to craft, and efficiently learn from, rados in a differential privacy framework. Tests reveal that learning from differentially private rados brings non-trivial privacy vs accuracy tradeoffs. ER -
APA
Nock, R., Patrini, G. & Friedman, A.. (2015). Rademacher Observations, Private Data, and Boosting. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:948-956 Available from https://proceedings.mlr.press/v37/nock15.html.

Related Material