Learning privately from multiparty data

Jihun Hamm; Yingjun Cao; Mikhail Belkin

Learning privately from multiparty data

Jihun Hamm, Yingjun Cao, Mikhail Belkin

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:555-563, 2016.

Abstract

Learning a classifier from private data distributed across multiple parties is an important problem that has many potential applications. How can we build an accurate and differentially private global classifier by combining locally-trained classifiers from different parties, without access to any party’s private data? We propose to transfer the “knowledge” of the local classifier ensemble by first creating labeled data from auxiliary unlabeled data, and then train a global differentially private classifier. We show that majority voting is too sensitive and therefore propose a new risk weighted by class probabilities estimated from the ensemble. Relative to a non-private solution, our private solution has a generalization error bounded by O(ε^-2 M^-2). This allows strong privacy without performance loss when the number of participating parties M is large, such as in crowdsensing applications. We demonstrate the performance of our framework with realistic tasks of activity recognition, network intrusion detection, and malicious URL detection.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-hamm16,
  title = 	 {Learning privately from multiparty data},
  author = 	 {Hamm, Jihun and Cao, Yingjun and Belkin, Mikhail},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {555--563},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/hamm16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/hamm16.html},
  abstract = 	 {Learning a classifier from private data distributed across multiple parties is an important problem that has many potential applications. How can we build an accurate and differentially private global classifier by combining locally-trained classifiers from different parties, without access to any party’s private data? We propose to transfer the “knowledge” of the local classifier ensemble by first creating labeled data from auxiliary unlabeled data, and then train a global differentially private classifier. We show that majority voting is too sensitive and therefore propose a new risk weighted by class probabilities estimated from the ensemble. Relative to a non-private solution, our private solution has a generalization error bounded by O(ε^-2 M^-2). This allows strong privacy without performance loss when the number of participating parties M is large, such as in crowdsensing applications. We demonstrate the performance of our framework with realistic tasks of activity recognition, network intrusion detection, and malicious URL detection.}
}

Endnote

%0 Conference Paper
%T Learning privately from multiparty data
%A Jihun Hamm
%A Yingjun Cao
%A Mikhail Belkin
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-hamm16
%I PMLR
%P 555--563
%U https://proceedings.mlr.press/v48/hamm16.html
%V 48
%X Learning a classifier from private data distributed across multiple parties is an important problem that has many potential applications. How can we build an accurate and differentially private global classifier by combining locally-trained classifiers from different parties, without access to any party’s private data? We propose to transfer the “knowledge” of the local classifier ensemble by first creating labeled data from auxiliary unlabeled data, and then train a global differentially private classifier. We show that majority voting is too sensitive and therefore propose a new risk weighted by class probabilities estimated from the ensemble. Relative to a non-private solution, our private solution has a generalization error bounded by O(ε^-2 M^-2). This allows strong privacy without performance loss when the number of participating parties M is large, such as in crowdsensing applications. We demonstrate the performance of our framework with realistic tasks of activity recognition, network intrusion detection, and malicious URL detection.

RIS


TY  - CPAPER
TI  - Learning privately from multiparty data
AU  - Jihun Hamm
AU  - Yingjun Cao
AU  - Mikhail Belkin
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-hamm16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 555
EP  - 563
L1  - http://proceedings.mlr.press/v48/hamm16.pdf
UR  - https://proceedings.mlr.press/v48/hamm16.html
AB  - Learning a classifier from private data distributed across multiple parties is an important problem that has many potential applications. How can we build an accurate and differentially private global classifier by combining locally-trained classifiers from different parties, without access to any party’s private data? We propose to transfer the “knowledge” of the local classifier ensemble by first creating labeled data from auxiliary unlabeled data, and then train a global differentially private classifier. We show that majority voting is too sensitive and therefore propose a new risk weighted by class probabilities estimated from the ensemble. Relative to a non-private solution, our private solution has a generalization error bounded by O(ε^-2 M^-2). This allows strong privacy without performance loss when the number of participating parties M is large, such as in crowdsensing applications. We demonstrate the performance of our framework with realistic tasks of activity recognition, network intrusion detection, and malicious URL detection.
ER  -

APA


Hamm, J., Cao, Y. & Belkin, M.. (2016). Learning privately from multiparty data. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:555-563 Available from https://proceedings.mlr.press/v48/hamm16.html.

Learning privately from multiparty data

Abstract

Cite this Paper

Related Material