(Generalized) Linear Regression on Microaggregated Data – From Nuisance Parameter Optimization to Partial Identification

Paul Fink, Thomas Augustin
Proceedings of the Tenth International Symposium on Imprecise Probability: Theories and Applications, PMLR 62:157-168, 2017.

Abstract

Protecting sensitive micro data prior to publishing or passing the data itself on is a crucial aspect: A trade-off between sufficient disclosure control and analyzability needs to be found. This paper presents a starting point to evaluate the effect of k-anonymity microaggregated data in (generalized) linear regression. Taking a rigorous imprecision perspective, microaggregated data are understood inducing a set X of potentially true data. Based on this representation two conceptually different approaches deriving estimations from the ideal likelihood are discussed. The first one picks a single element of X, for instance by naively treating the microaggregated data as true ones or by introducing a maximax approach taking the elements of X as nuisance parameters to be optimized. The second one seeks, in the spirit of Partial Identification, the set of all maximum likelihood estimators compatible with the elements of X, thus creating cautious estimators. As the simulation study corroborates, the obtained sets of estimators of the latter approach are still precise enough to be practically relevant.

Cite this Paper


BibTeX
@InProceedings{pmlr-v62-fink17a, title = {({G}eneralized) Linear Regression on Microaggregated Data – From Nuisance Parameter Optimization to Partial Identification}, author = {Fink, Paul and Augustin, Thomas}, booktitle = {Proceedings of the Tenth International Symposium on Imprecise Probability: Theories and Applications}, pages = {157--168}, year = {2017}, editor = {Antonucci, Alessandro and Corani, Giorgio and Couso, Inés and Destercke, Sébastien}, volume = {62}, series = {Proceedings of Machine Learning Research}, month = {10--14 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v62/fink17a/fink17a.pdf}, url = {https://proceedings.mlr.press/v62/fink17a.html}, abstract = {Protecting sensitive micro data prior to publishing or passing the data itself on is a crucial aspect: A trade-off between sufficient disclosure control and analyzability needs to be found. This paper presents a starting point to evaluate the effect of $k$-anonymity microaggregated data in (generalized) linear regression. Taking a rigorous imprecision perspective, microaggregated data are understood inducing a set $X$ of potentially true data. Based on this representation two conceptually different approaches deriving estimations from the ideal likelihood are discussed. The first one picks a single element of $X$, for instance by naively treating the microaggregated data as true ones or by introducing a maximax approach taking the elements of $X$ as nuisance parameters to be optimized. The second one seeks, in the spirit of Partial Identification, the set of all maximum likelihood estimators compatible with the elements of $X$, thus creating cautious estimators. As the simulation study corroborates, the obtained sets of estimators of the latter approach are still precise enough to be practically relevant.} }
Endnote
%0 Conference Paper %T (Generalized) Linear Regression on Microaggregated Data – From Nuisance Parameter Optimization to Partial Identification %A Paul Fink %A Thomas Augustin %B Proceedings of the Tenth International Symposium on Imprecise Probability: Theories and Applications %C Proceedings of Machine Learning Research %D 2017 %E Alessandro Antonucci %E Giorgio Corani %E Inés Couso %E Sébastien Destercke %F pmlr-v62-fink17a %I PMLR %P 157--168 %U https://proceedings.mlr.press/v62/fink17a.html %V 62 %X Protecting sensitive micro data prior to publishing or passing the data itself on is a crucial aspect: A trade-off between sufficient disclosure control and analyzability needs to be found. This paper presents a starting point to evaluate the effect of $k$-anonymity microaggregated data in (generalized) linear regression. Taking a rigorous imprecision perspective, microaggregated data are understood inducing a set $X$ of potentially true data. Based on this representation two conceptually different approaches deriving estimations from the ideal likelihood are discussed. The first one picks a single element of $X$, for instance by naively treating the microaggregated data as true ones or by introducing a maximax approach taking the elements of $X$ as nuisance parameters to be optimized. The second one seeks, in the spirit of Partial Identification, the set of all maximum likelihood estimators compatible with the elements of $X$, thus creating cautious estimators. As the simulation study corroborates, the obtained sets of estimators of the latter approach are still precise enough to be practically relevant.
APA
Fink, P. & Augustin, T.. (2017). (Generalized) Linear Regression on Microaggregated Data – From Nuisance Parameter Optimization to Partial Identification. Proceedings of the Tenth International Symposium on Imprecise Probability: Theories and Applications, in Proceedings of Machine Learning Research 62:157-168 Available from https://proceedings.mlr.press/v62/fink17a.html.

Related Material