[edit]
(Generalized) Linear Regression on Microaggregated Data – From Nuisance Parameter Optimization to Partial Identification
Proceedings of the Tenth International Symposium on Imprecise Probability: Theories and Applications, PMLR 62:157-168, 2017.
Abstract
Protecting sensitive micro data prior to publishing or passing the data itself on is a crucial aspect: A trade-off between sufficient disclosure control and analyzability needs to be found. This paper presents a starting point to evaluate the effect of k-anonymity microaggregated data in (generalized) linear regression. Taking a rigorous imprecision perspective, microaggregated data are understood inducing a set X of potentially true data. Based on this representation two conceptually different approaches deriving estimations from the ideal likelihood are discussed. The first one picks a single element of X, for instance by naively treating the microaggregated data as true ones or by introducing a maximax approach taking the elements of X as nuisance parameters to be optimized. The second one seeks, in the spirit of Partial Identification, the set of all maximum likelihood estimators compatible with the elements of X, thus creating cautious estimators. As the simulation study corroborates, the obtained sets of estimators of the latter approach are still precise enough to be practically relevant.