(Generalized) Linear Regression on Microaggregated Data – From Nuisance Parameter Optimization to Partial Identification

Paul Fink; Thomas Augustin

(Generalized) Linear Regression on Microaggregated Data – From Nuisance Parameter Optimization to Partial Identification

Paul Fink, Thomas Augustin

Proceedings of the Tenth International Symposium on Imprecise Probability: Theories and Applications, PMLR 62:157-168, 2017.

Abstract

Protecting sensitive micro data prior to publishing or passing the data itself on is a crucial aspect: A trade-off between sufficient disclosure control and analyzability needs to be found. This paper presents a starting point to evaluate the effect of

$k$ -anonymity microaggregated data in (generalized) linear regression. Taking a rigorous imprecision perspective, microaggregated data are understood inducing a set

$X$ of potentially true data. Based on this representation two conceptually different approaches deriving estimations from the ideal likelihood are discussed. The first one picks a single element of

$X$ , for instance by naively treating the microaggregated data as true ones or by introducing a maximax approach taking the elements of

$X$ as nuisance parameters to be optimized. The second one seeks, in the spirit of Partial Identification, the set of all maximum likelihood estimators compatible with the elements of

$X$ , thus creating cautious estimators. As the simulation study corroborates, the obtained sets of estimators of the latter approach are still precise enough to be practically relevant.

Cite this Paper

BibTeX


@InProceedings{pmlr-v62-fink17a,
  title = 	 {({G}eneralized) Linear Regression on Microaggregated Data – From Nuisance Parameter Optimization to Partial Identification},
  author = 	 {Fink, Paul and Augustin, Thomas},
  booktitle = 	 {Proceedings of the Tenth International Symposium on Imprecise Probability: Theories and Applications},
  pages = 	 {157--168},
  year = 	 {2017},
  editor = 	 {Antonucci, Alessandro and Corani, Giorgio and Couso, Inés and Destercke, Sébastien},
  volume = 	 {62},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--14 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v62/fink17a/fink17a.pdf},
  url = 	 {https://proceedings.mlr.press/v62/fink17a.html},
  abstract = 	 {Protecting sensitive micro data prior to publishing or passing the data itself on is a crucial aspect: A trade-off between sufficient disclosure control and analyzability needs to be found. This paper presents a starting point to evaluate the effect of $k$-anonymity microaggregated data in (generalized) linear regression. Taking a rigorous imprecision perspective, microaggregated data are understood inducing a set $X$ of potentially true data. Based on this representation two conceptually different approaches deriving estimations from the ideal likelihood are discussed. The first one picks a single element of $X$, for instance by naively treating the microaggregated data as true ones or by introducing a maximax approach taking the elements of $X$ as nuisance parameters to be optimized. The second one seeks, in the spirit of Partial Identification, the set of all maximum likelihood estimators compatible with the elements of $X$, thus creating cautious estimators. As the simulation study corroborates, the obtained sets of estimators of the latter approach are still precise enough to be practically relevant.}
}

Endnote

%0 Conference Paper
%T (Generalized) Linear Regression on Microaggregated Data – From Nuisance Parameter Optimization to Partial Identification
%A Paul Fink
%A Thomas Augustin
%B Proceedings of the Tenth International Symposium on Imprecise Probability: Theories and Applications
%C Proceedings of Machine Learning Research
%D 2017
%E Alessandro Antonucci
%E Giorgio Corani
%E Inés Couso
%E Sébastien Destercke	
%F pmlr-v62-fink17a
%I PMLR
%P 157--168
%U https://proceedings.mlr.press/v62/fink17a.html
%V 62
%X Protecting sensitive micro data prior to publishing or passing the data itself on is a crucial aspect: A trade-off between sufficient disclosure control and analyzability needs to be found. This paper presents a starting point to evaluate the effect of $k$-anonymity microaggregated data in (generalized) linear regression. Taking a rigorous imprecision perspective, microaggregated data are understood inducing a set $X$ of potentially true data. Based on this representation two conceptually different approaches deriving estimations from the ideal likelihood are discussed. The first one picks a single element of $X$, for instance by naively treating the microaggregated data as true ones or by introducing a maximax approach taking the elements of $X$ as nuisance parameters to be optimized. The second one seeks, in the spirit of Partial Identification, the set of all maximum likelihood estimators compatible with the elements of $X$, thus creating cautious estimators. As the simulation study corroborates, the obtained sets of estimators of the latter approach are still precise enough to be practically relevant.

APA


Fink, P. & Augustin, T.. (2017). (Generalized) Linear Regression on Microaggregated Data – From Nuisance Parameter Optimization to Partial Identification. Proceedings of the Tenth International Symposium on Imprecise Probability: Theories and Applications, in Proceedings of Machine Learning Research 62:157-168 Available from https://proceedings.mlr.press/v62/fink17a.html.

Related Material

Download PDF