(Generalized) Linear Regression on Microaggregated Data – From Nuisance Parameter Optimization to Partial Identification

Paul Fink, Thomas Augustin
Proceedings of the Tenth International Symposium on Imprecise Probability: Theories and Applications, PMLR 62:157-168, 2017.

Abstract

Protecting sensitive micro data prior to publishing or passing the data itself on is a crucial aspect: A trade-off between sufficient disclosure control and analyzability needs to be found. This paper presents a starting point to evaluate the effect of $k$-anonymity microaggregated data in (generalized) linear regression. Taking a rigorous imprecision perspective, microaggregated data are understood inducing a set $X$ of potentially true data. Based on this representation two conceptually different approaches deriving estimations from the ideal likelihood are discussed. The first one picks a single element of $X$, for instance by naively treating the microaggregated data as true ones or by introducing a maximax approach taking the elements of $X$ as nuisance parameters to be optimized. The second one seeks, in the spirit of Partial Identification, the set of all maximum likelihood estimators compatible with the elements of $X$, thus creating cautious estimators. As the simulation study corroborates, the obtained sets of estimators of the latter approach are still precise enough to be practically relevant.

Cite this Paper


BibTeX
@InProceedings{pmlr-v62-fink17a, title = {({G}eneralized) Linear Regression on Microaggregated Data – From Nuisance Parameter Optimization to Partial Identification}, author = {Fink, Paul and Augustin, Thomas}, booktitle = {Proceedings of the Tenth International Symposium on Imprecise Probability: Theories and Applications}, pages = {157--168}, year = {2017}, editor = {Antonucci, Alessandro and Corani, Giorgio and Couso, Inés and Destercke, Sébastien}, volume = {62}, series = {Proceedings of Machine Learning Research}, month = {10--14 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v62/fink17a/fink17a.pdf}, url = {https://proceedings.mlr.press/v62/fink17a.html}, abstract = {Protecting sensitive micro data prior to publishing or passing the data itself on is a crucial aspect: A trade-off between sufficient disclosure control and analyzability needs to be found. This paper presents a starting point to evaluate the effect of $k$-anonymity microaggregated data in (generalized) linear regression. Taking a rigorous imprecision perspective, microaggregated data are understood inducing a set $X$ of potentially true data. Based on this representation two conceptually different approaches deriving estimations from the ideal likelihood are discussed. The first one picks a single element of $X$, for instance by naively treating the microaggregated data as true ones or by introducing a maximax approach taking the elements of $X$ as nuisance parameters to be optimized. The second one seeks, in the spirit of Partial Identification, the set of all maximum likelihood estimators compatible with the elements of $X$, thus creating cautious estimators. As the simulation study corroborates, the obtained sets of estimators of the latter approach are still precise enough to be practically relevant.} }
Endnote
%0 Conference Paper %T (Generalized) Linear Regression on Microaggregated Data – From Nuisance Parameter Optimization to Partial Identification %A Paul Fink %A Thomas Augustin %B Proceedings of the Tenth International Symposium on Imprecise Probability: Theories and Applications %C Proceedings of Machine Learning Research %D 2017 %E Alessandro Antonucci %E Giorgio Corani %E Inés Couso %E Sébastien Destercke %F pmlr-v62-fink17a %I PMLR %P 157--168 %U https://proceedings.mlr.press/v62/fink17a.html %V 62 %X Protecting sensitive micro data prior to publishing or passing the data itself on is a crucial aspect: A trade-off between sufficient disclosure control and analyzability needs to be found. This paper presents a starting point to evaluate the effect of $k$-anonymity microaggregated data in (generalized) linear regression. Taking a rigorous imprecision perspective, microaggregated data are understood inducing a set $X$ of potentially true data. Based on this representation two conceptually different approaches deriving estimations from the ideal likelihood are discussed. The first one picks a single element of $X$, for instance by naively treating the microaggregated data as true ones or by introducing a maximax approach taking the elements of $X$ as nuisance parameters to be optimized. The second one seeks, in the spirit of Partial Identification, the set of all maximum likelihood estimators compatible with the elements of $X$, thus creating cautious estimators. As the simulation study corroborates, the obtained sets of estimators of the latter approach are still precise enough to be practically relevant.
APA
Fink, P. & Augustin, T.. (2017). (Generalized) Linear Regression on Microaggregated Data – From Nuisance Parameter Optimization to Partial Identification. Proceedings of the Tenth International Symposium on Imprecise Probability: Theories and Applications, in Proceedings of Machine Learning Research 62:157-168 Available from https://proceedings.mlr.press/v62/fink17a.html.

Related Material