Statistical Matching of Discrete Data by Bayesian Networks

Eva Endres; Thomas Augustin

Statistical Matching of Discrete Data by Bayesian Networks

Eva Endres, Thomas Augustin

Proceedings of the Eighth International Conference on Probabilistic Graphical Models, PMLR 52:159-170, 2016.

Abstract

Statistical matching (also known as data fusion, data merging, or data integration) is the umbrella term for a collection of methods which serve to combine different data sources. The objective is to obtain joint information about variables which have not jointly been collected in one survey, but on two (or more) surveys with disjoint sets of observation units. Besides specific variables for the different data files, it is indispensable to have common variables which are observed in both data sets and on basis of which the matching can be performed. Several existing statistical matching approaches are based on the assumption of conditional independence of the specific variables given the common variables. Relying on the well-known fact that d-separation is related to conditional independence for a probability distribution which factorizes along a directed acyclic graph, we suggest to use probabilistic graphical models as a powerful tool for statistical matching. In this paper, we describe and discuss first attempts for statistical matching of discrete data by Bayesian networks. The approach is exemplarily applied to data collected within the scope of the German General Social Survey.

Cite this Paper

BibTeX

@InProceedings{pmlr-v52-endres16,
  title = 	 {Statistical Matching of Discrete Data by {B}ayesian Networks},
  author = 	 {Endres, Eva and Augustin, Thomas},
  booktitle = 	 {Proceedings of the Eighth International Conference on Probabilistic Graphical Models},
  pages = 	 {159--170},
  year = 	 {2016},
  editor = 	 {Antonucci, Alessandro and Corani, Giorgio and Campos, Cassio Polpo},
  volume = 	 {52},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lugano, Switzerland},
  month = 	 {06--09 Sep},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v52/endres16.pdf},
  url = 	 {https://proceedings.mlr.press/v52/endres16.html},
  abstract = 	 {Statistical matching (also known as data fusion, data merging, or data integration) is the umbrella term for a collection of methods which serve to combine different data sources. The objective is to obtain joint information about variables which have not jointly been collected in one survey, but on two (or more) surveys with disjoint sets of observation units. Besides specific variables for the different data files, it is indispensable to have common variables which are observed in both data sets and on basis of which the matching can be performed. Several existing statistical matching approaches are based on the assumption of conditional independence of the specific variables given the common variables. Relying on the well-known fact that d-separation is related to conditional independence for a probability distribution which factorizes along a directed acyclic graph, we suggest to use probabilistic graphical models as a powerful tool for statistical matching. In this paper, we describe and discuss first attempts for statistical matching of discrete data by Bayesian networks. The approach is exemplarily applied to data collected within the scope of the German General Social Survey.}
}

Endnote

%0 Conference Paper
%T Statistical Matching of Discrete Data by Bayesian Networks
%A Eva Endres
%A Thomas Augustin
%B Proceedings of the Eighth International Conference on Probabilistic Graphical Models
%C Proceedings of Machine Learning Research
%D 2016
%E Alessandro Antonucci
%E Giorgio Corani
%E Cassio Polpo Campos	
%F pmlr-v52-endres16
%I PMLR
%P 159--170
%U https://proceedings.mlr.press/v52/endres16.html
%V 52
%X Statistical matching (also known as data fusion, data merging, or data integration) is the umbrella term for a collection of methods which serve to combine different data sources. The objective is to obtain joint information about variables which have not jointly been collected in one survey, but on two (or more) surveys with disjoint sets of observation units. Besides specific variables for the different data files, it is indispensable to have common variables which are observed in both data sets and on basis of which the matching can be performed. Several existing statistical matching approaches are based on the assumption of conditional independence of the specific variables given the common variables. Relying on the well-known fact that d-separation is related to conditional independence for a probability distribution which factorizes along a directed acyclic graph, we suggest to use probabilistic graphical models as a powerful tool for statistical matching. In this paper, we describe and discuss first attempts for statistical matching of discrete data by Bayesian networks. The approach is exemplarily applied to data collected within the scope of the German General Social Survey.

RIS

TY  - CPAPER
TI  - Statistical Matching of Discrete Data by Bayesian Networks
AU  - Eva Endres
AU  - Thomas Augustin
BT  - Proceedings of the Eighth International Conference on Probabilistic Graphical Models
DA  - 2016/08/15
ED  - Alessandro Antonucci
ED  - Giorgio Corani
ED  - Cassio Polpo Campos	
ID  - pmlr-v52-endres16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 52
SP  - 159
EP  - 170
L1  - http://proceedings.mlr.press/v52/endres16.pdf
UR  - https://proceedings.mlr.press/v52/endres16.html
AB  - Statistical matching (also known as data fusion, data merging, or data integration) is the umbrella term for a collection of methods which serve to combine different data sources. The objective is to obtain joint information about variables which have not jointly been collected in one survey, but on two (or more) surveys with disjoint sets of observation units. Besides specific variables for the different data files, it is indispensable to have common variables which are observed in both data sets and on basis of which the matching can be performed. Several existing statistical matching approaches are based on the assumption of conditional independence of the specific variables given the common variables. Relying on the well-known fact that d-separation is related to conditional independence for a probability distribution which factorizes along a directed acyclic graph, we suggest to use probabilistic graphical models as a powerful tool for statistical matching. In this paper, we describe and discuss first attempts for statistical matching of discrete data by Bayesian networks. The approach is exemplarily applied to data collected within the scope of the German General Social Survey.
ER  -

APA

Endres, E. & Augustin, T.. (2016). Statistical Matching of Discrete Data by Bayesian Networks. Proceedings of the Eighth International Conference on Probabilistic Graphical Models, in Proceedings of Machine Learning Research 52:159-170 Available from https://proceedings.mlr.press/v52/endres16.html.

Related Material

Download PDF