[edit]
(Generalized) Linear Regression on Microaggregated Data – From Nuisance Parameter Optimization to Partial Identification
Proceedings of the Tenth International Symposium on Imprecise Probability: Theories and Applications, PMLR 62:157-168, 2017.
Abstract
Protecting sensitive micro data prior to publishing or passing the data itself on is a crucial aspect: A trade-off between sufficient disclosure control and analyzability needs to be found. This paper presents a starting point to evaluate the effect of $k$-anonymity microaggregated data in (generalized) linear regression. Taking a rigorous imprecision perspective, microaggregated data are understood inducing a set $X$ of potentially true data. Based on this representation two conceptually different approaches deriving estimations from the ideal likelihood are discussed. The first one picks a single element of $X$, for instance by naively treating the microaggregated data as true ones or by introducing a maximax approach taking the elements of $X$ as nuisance parameters to be optimized. The second one seeks, in the spirit of Partial Identification, the set of all maximum likelihood estimators compatible with the elements of $X$, thus creating cautious estimators. As the simulation study corroborates, the obtained sets of estimators of the latter approach are still precise enough to be practically relevant.