Differentially Private Ordinary Least Squares
[edit]
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:31053114, 2017.
Abstract
Linear regression is one of the most prevalent techniques in machine learning; however, it is also common to use linear regression for its explanatory capabilities rather than label prediction. Ordinary Least Squares (OLS) is often used in statistics to establish a correlation between an attribute (e.g. gender) and a label (e.g. income) in the presence of other (potentially correlated) features. OLS assumes a particular model that randomly generates the data, and derives tvalues — representing the likelihood of each real value to be the true correlation. Using tvalues, OLS can release a confidence interval, which is an interval on the reals that is likely to contain the true correlation; and when this interval does not intersect the origin, we can reject the null hypothesis as it is likely that the true correlation is nonzero. Our work aims at achieving similar guarantees on data under differentially private estimators. First, we show that for wellspread data, the Gaussian JohnsonLindenstrauss Transform (JLT) gives a very good approximation of tvalues; secondly, when JLT approximates Ridge regression (linear regression with $l_2$regularization) we derive, under certain conditions, confidence intervals using the projected data; lastly, we derive, under different conditions, confidence intervals for the “Analyze Gauss” algorithm (Dwork et al 2014).
Related Material


