Learning from Data with Heterogeneous Noise using SGD

Shuang Song; Kamalika Chaudhuri; Anand Sarwate

Learning from Data with Heterogeneous Noise using SGD

Shuang Song, Kamalika Chaudhuri, Anand Sarwate

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR 38:894-902, 2015.

Abstract

We consider learning from data of variable quality that may be obtained from different heterogeneous sources. Addressing learning from heterogeneous data in its full generality is a challenging problem. In this paper, we adopt instead a model in which data is observed through heterogeneous noise, where the noise level reflects the quality of the data source. We study how to use stochastic gradient algorithms to learn in this model. Our study is motivated by two concrete examples where this problem arises naturally: learning with local differential privacy based on data from multiple sources with different privacy requirements, and learning from data with labels of variable quality. The main contribution of this paper is to identify how heterogeneous noise impacts performance. We show that given two datasets with heterogeneous noise, the order in which to use them in standard SGD depends on the learning rate. We propose a method for changing the learning rate as a function of the heterogeneity, and prove new regret bounds for our method in two cases of interest. Finally, we evaluate the performance of our algorithm on real data.

Cite this Paper

BibTeX


@InProceedings{pmlr-v38-song15,
  title = 	 {{Learning from Data with Heterogeneous Noise using SGD}},
  author = 	 {Song, Shuang and Chaudhuri, Kamalika and Sarwate, Anand},
  booktitle = 	 {Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {894--902},
  year = 	 {2015},
  editor = 	 {Lebanon, Guy and Vishwanathan, S. V. N.},
  volume = 	 {38},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {San Diego, California, USA},
  month = 	 {09--12 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v38/song15.pdf},
  url = 	 {https://proceedings.mlr.press/v38/song15.html},
  abstract = 	 {We consider learning from data of variable quality that may be obtained from different heterogeneous sources. Addressing learning from heterogeneous data in its full generality is a challenging problem. In this paper, we adopt instead a model in which data is observed through heterogeneous noise, where the noise level reflects the quality of the data source. We study how to use stochastic gradient algorithms to learn in this model. Our study is motivated by two concrete examples where this problem arises naturally: learning with local differential privacy based on data from multiple sources with different privacy requirements, and learning from data with labels of variable quality.  The main contribution of this paper is to identify how heterogeneous noise impacts performance. We show that given two datasets with heterogeneous noise, the order in which to use them in standard SGD depends on the learning rate. We propose a method for changing the learning rate as a function of the heterogeneity, and prove new regret bounds for our method in two cases of interest. Finally, we evaluate the performance of our algorithm on real data.}
}

Endnote

%0 Conference Paper
%T Learning from Data with Heterogeneous Noise using SGD
%A Shuang Song
%A Kamalika Chaudhuri
%A Anand Sarwate
%B Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2015
%E Guy Lebanon
%E S. V. N. Vishwanathan	
%F pmlr-v38-song15
%I PMLR
%P 894--902
%U https://proceedings.mlr.press/v38/song15.html
%V 38
%X We consider learning from data of variable quality that may be obtained from different heterogeneous sources. Addressing learning from heterogeneous data in its full generality is a challenging problem. In this paper, we adopt instead a model in which data is observed through heterogeneous noise, where the noise level reflects the quality of the data source. We study how to use stochastic gradient algorithms to learn in this model. Our study is motivated by two concrete examples where this problem arises naturally: learning with local differential privacy based on data from multiple sources with different privacy requirements, and learning from data with labels of variable quality.  The main contribution of this paper is to identify how heterogeneous noise impacts performance. We show that given two datasets with heterogeneous noise, the order in which to use them in standard SGD depends on the learning rate. We propose a method for changing the learning rate as a function of the heterogeneity, and prove new regret bounds for our method in two cases of interest. Finally, we evaluate the performance of our algorithm on real data.

RIS


TY  - CPAPER
TI  - Learning from Data with Heterogeneous Noise using SGD
AU  - Shuang Song
AU  - Kamalika Chaudhuri
AU  - Anand Sarwate
BT  - Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
DA  - 2015/02/21
ED  - Guy Lebanon
ED  - S. V. N. Vishwanathan	
ID  - pmlr-v38-song15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 38
SP  - 894
EP  - 902
L1  - http://proceedings.mlr.press/v38/song15.pdf
UR  - https://proceedings.mlr.press/v38/song15.html
AB  - We consider learning from data of variable quality that may be obtained from different heterogeneous sources. Addressing learning from heterogeneous data in its full generality is a challenging problem. In this paper, we adopt instead a model in which data is observed through heterogeneous noise, where the noise level reflects the quality of the data source. We study how to use stochastic gradient algorithms to learn in this model. Our study is motivated by two concrete examples where this problem arises naturally: learning with local differential privacy based on data from multiple sources with different privacy requirements, and learning from data with labels of variable quality.  The main contribution of this paper is to identify how heterogeneous noise impacts performance. We show that given two datasets with heterogeneous noise, the order in which to use them in standard SGD depends on the learning rate. We propose a method for changing the learning rate as a function of the heterogeneity, and prove new regret bounds for our method in two cases of interest. Finally, we evaluate the performance of our algorithm on real data.
ER  -

APA


Song, S., Chaudhuri, K. & Sarwate, A.. (2015). Learning from Data with Heterogeneous Noise using SGD. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 38:894-902 Available from https://proceedings.mlr.press/v38/song15.html.

Learning from Data with Heterogeneous Noise using SGD

Abstract

Cite this Paper

Related Material