Private Convex Empirical Risk Minimization and High-dimensional Regression

Daniel Kifer; Adam Smith; Abhradeep Thakurta

Private Convex Empirical Risk Minimization and High-dimensional Regression

Daniel Kifer, Adam Smith, Abhradeep Thakurta

Proceedings of the 25th Annual Conference on Learning Theory, PMLR 23:25.1-25.40, 2012.

Abstract

We consider \emphdifferentially private algorithms for convex empirical risk minimization (ERM). Differential privacy (Dwork et al., 2006b) is a recently introduced notion of privacy which guarantees that an algorithm’s output does not depend on the data of any individual in the dataset. This is crucial in fields that handle sensitive data, such as genomics, collaborative filtering, and economics. Our motivation is the design of private algorithms for sparse learning problems, in which one aims to find solutions (e.g., regression parameters) with few non-zero coefficients. To this end: (a) We significantly extend the analysis of the “objective perturbation” algorithm of Chaudhuri et al. (2011) for convex ERM problems. We show that their method can be modified to use less noise (be more accurate), and to apply to problems with hard constraints and non-differentiable regularizers. We also give a tighter, data-dependent analysis of the additional error introduced by their method. A key tool in our analysis is a new nontrivial limit theorem for differential privacy which is of independent interest: if a sequence of differentially private algorithms converges, in a \emphweak sense, then the limit algorithm is also differentially private. In particular, our methods give the best known algorithms for differentially private linear regression. These methods work in settings where the number of parameters p is less than the number of samples n. (b) We give the first two private algorithms for \emphsparse regression problems in high-dimensional settings, where p is much larger than n. We analyze their performance for linear regression: under standard assumptions on the data, our algorithms have vanishing empirical risk for n = poly(s, \log p) when there exists a good regression vector with s nonzero coefficients. Our algorithms demonstrate that randomized algorithms for sparse regression problems can be both stable and accurate - a combination which is impossible for deterministic algorithms.

Cite this Paper

BibTeX

@InProceedings{pmlr-v23-kifer12,
  title = 	 {Private Convex Empirical Risk Minimization and High-dimensional Regression},
  author = 	 {Kifer, Daniel and Smith, Adam and Thakurta, Abhradeep},
  booktitle = 	 {Proceedings of the 25th Annual Conference on Learning Theory},
  pages = 	 {25.1--25.40},
  year = 	 {2012},
  editor = 	 {Mannor, Shie and Srebro, Nathan and Williamson, Robert C.},
  volume = 	 {23},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Edinburgh, Scotland},
  month = 	 {25--27 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v23/kifer12/kifer12.pdf},
  url = 	 {https://proceedings.mlr.press/v23/kifer12.html},
  abstract = 	 {We consider \emphdifferentially private algorithms for convex empirical risk minimization (ERM). Differential privacy (Dwork et al., 2006b) is a recently introduced notion of privacy which guarantees that an algorithm’s output does not depend on the data of any individual in the dataset. This is crucial in fields that handle sensitive data, such as genomics, collaborative filtering, and economics. Our motivation is the design of private algorithms for sparse learning problems, in which one aims to find solutions (e.g., regression parameters) with few non-zero coefficients. To this end: (a) We significantly extend the analysis of the “objective perturbation” algorithm of Chaudhuri et al. (2011) for convex ERM problems. We show that their method can be modified to use less noise (be more accurate), and to apply to problems with hard constraints and non-differentiable regularizers. We also give a tighter, data-dependent analysis of the additional error introduced by their method. A key tool in our analysis is a new nontrivial limit theorem for differential privacy which is of independent interest: if a sequence of differentially private algorithms converges, in a \emphweak sense, then the limit algorithm is also differentially private. In particular, our methods give the best known algorithms for differentially private linear regression. These methods work in settings where the number of parameters p is less than the number of samples n. (b) We give the first two private algorithms for \emphsparse regression problems in high-dimensional settings, where p is much larger than n. We analyze their performance for linear regression: under standard assumptions on the data, our algorithms have vanishing empirical risk for n = poly(s, \log p) when there exists a good regression vector with s nonzero coefficients. Our algorithms demonstrate that randomized algorithms for sparse regression problems can be both stable and accurate - a combination which is impossible for deterministic algorithms.}
}

Endnote

%0 Conference Paper
%T Private Convex Empirical Risk Minimization and High-dimensional Regression
%A Daniel Kifer
%A Adam Smith
%A Abhradeep Thakurta
%B Proceedings of the 25th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2012
%E Shie Mannor
%E Nathan Srebro
%E Robert C. Williamson	
%F pmlr-v23-kifer12
%I PMLR
%P 25.1--25.40
%U https://proceedings.mlr.press/v23/kifer12.html
%V 23
%X We consider \emphdifferentially private algorithms for convex empirical risk minimization (ERM). Differential privacy (Dwork et al., 2006b) is a recently introduced notion of privacy which guarantees that an algorithm’s output does not depend on the data of any individual in the dataset. This is crucial in fields that handle sensitive data, such as genomics, collaborative filtering, and economics. Our motivation is the design of private algorithms for sparse learning problems, in which one aims to find solutions (e.g., regression parameters) with few non-zero coefficients. To this end: (a) We significantly extend the analysis of the “objective perturbation” algorithm of Chaudhuri et al. (2011) for convex ERM problems. We show that their method can be modified to use less noise (be more accurate), and to apply to problems with hard constraints and non-differentiable regularizers. We also give a tighter, data-dependent analysis of the additional error introduced by their method. A key tool in our analysis is a new nontrivial limit theorem for differential privacy which is of independent interest: if a sequence of differentially private algorithms converges, in a \emphweak sense, then the limit algorithm is also differentially private. In particular, our methods give the best known algorithms for differentially private linear regression. These methods work in settings where the number of parameters p is less than the number of samples n. (b) We give the first two private algorithms for \emphsparse regression problems in high-dimensional settings, where p is much larger than n. We analyze their performance for linear regression: under standard assumptions on the data, our algorithms have vanishing empirical risk for n = poly(s, \log p) when there exists a good regression vector with s nonzero coefficients. Our algorithms demonstrate that randomized algorithms for sparse regression problems can be both stable and accurate - a combination which is impossible for deterministic algorithms.

RIS

TY  - CPAPER
TI  - Private Convex Empirical Risk Minimization and High-dimensional Regression
AU  - Daniel Kifer
AU  - Adam Smith
AU  - Abhradeep Thakurta
BT  - Proceedings of the 25th Annual Conference on Learning Theory
DA  - 2012/06/16
ED  - Shie Mannor
ED  - Nathan Srebro
ED  - Robert C. Williamson	
ID  - pmlr-v23-kifer12
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 23
SP  - 25.1
EP  - 25.40
L1  - http://proceedings.mlr.press/v23/kifer12/kifer12.pdf
UR  - https://proceedings.mlr.press/v23/kifer12.html
AB  - We consider \emphdifferentially private algorithms for convex empirical risk minimization (ERM). Differential privacy (Dwork et al., 2006b) is a recently introduced notion of privacy which guarantees that an algorithm’s output does not depend on the data of any individual in the dataset. This is crucial in fields that handle sensitive data, such as genomics, collaborative filtering, and economics. Our motivation is the design of private algorithms for sparse learning problems, in which one aims to find solutions (e.g., regression parameters) with few non-zero coefficients. To this end: (a) We significantly extend the analysis of the “objective perturbation” algorithm of Chaudhuri et al. (2011) for convex ERM problems. We show that their method can be modified to use less noise (be more accurate), and to apply to problems with hard constraints and non-differentiable regularizers. We also give a tighter, data-dependent analysis of the additional error introduced by their method. A key tool in our analysis is a new nontrivial limit theorem for differential privacy which is of independent interest: if a sequence of differentially private algorithms converges, in a \emphweak sense, then the limit algorithm is also differentially private. In particular, our methods give the best known algorithms for differentially private linear regression. These methods work in settings where the number of parameters p is less than the number of samples n. (b) We give the first two private algorithms for \emphsparse regression problems in high-dimensional settings, where p is much larger than n. We analyze their performance for linear regression: under standard assumptions on the data, our algorithms have vanishing empirical risk for n = poly(s, \log p) when there exists a good regression vector with s nonzero coefficients. Our algorithms demonstrate that randomized algorithms for sparse regression problems can be both stable and accurate - a combination which is impossible for deterministic algorithms.
ER  -

APA

Kifer, D., Smith, A. & Thakurta, A.. (2012). Private Convex Empirical Risk Minimization and High-dimensional Regression. Proceedings of the 25th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 23:25.1-25.40 Available from https://proceedings.mlr.press/v23/kifer12.html.

Related Material

Download PDF