Learning equivalence classes of acyclic models with latent and selection variables from multiple datasets with overlapping variables

Robert Tillman; Peter Spirtes

Learning equivalence classes of acyclic models with latent and selection variables from multiple datasets with overlapping variables

Robert Tillman, Peter Spirtes

Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, PMLR 15:3-15, 2011.

Abstract

While there has been considerable research in learning probabilistic graphical models from data for predictive and causal inference, almost all existing algorithms assume a single dataset of i.i.d. observations for all variables. For many applications, it may be impossible or impractical to obtain such datasets, but multiple datasets of i.i.d. observations for different subsets of these variables may be available. Tillman et al. (2009) showed how directed graphical models learned from such datasets can be integrated to construct an equivalence class of structures over all variables. While their procedure is correct, it assumes that the structures integrated do not entail contradictory conditional independences and dependences for variables in their intersections. While this assumption is reasonable asymptotically, it rarely holds in practice with finite samples due to the frequency of statistical errors. We propose a new correct procedure for learning such equivalence classes directly from the multiple datasets which avoids this problem and is thus more practically useful. Empirical results indicate our method is not only more accurate, but also faster and requires less memory.

Cite this Paper

BibTeX


@InProceedings{pmlr-v15-tillman11a,
  title = 	 {Learning equivalence classes of acyclic models with latent and selection variables from multiple datasets with overlapping variables},
  author = 	 {Tillman, Robert and Spirtes, Peter},
  booktitle = 	 {Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {3--15},
  year = 	 {2011},
  editor = 	 {Gordon, Geoffrey and Dunson, David and Dudík, Miroslav},
  volume = 	 {15},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Fort Lauderdale, FL, USA},
  month = 	 {11--13 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v15/tillman11a/tillman11a.pdf},
  url = 	 {https://proceedings.mlr.press/v15/tillman11a.html},
  abstract = 	 {While there has been considerable research in learning probabilistic graphical models from data for predictive and causal inference, almost all existing algorithms assume a single dataset of i.i.d. observations for all variables. For many applications, it may be impossible or impractical to obtain such datasets, but multiple datasets of i.i.d. observations for different subsets of these variables may be available. Tillman et al. (2009) showed how directed graphical models learned from such datasets can be integrated to construct an equivalence class of structures over all variables. While their procedure is correct, it assumes that the structures integrated do not entail contradictory conditional independences and dependences for variables in their intersections. While this assumption is reasonable asymptotically, it rarely holds in practice with finite samples due to the frequency of statistical errors. We propose a new correct procedure for learning such equivalence classes directly from the multiple datasets which avoids this problem and is thus more practically useful. Empirical results indicate our method is not only more accurate, but also faster and requires less memory.}
}

Endnote

%0 Conference Paper
%T Learning equivalence classes of acyclic models with latent and selection variables from multiple datasets with overlapping variables
%A Robert Tillman
%A Peter Spirtes
%B Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2011
%E Geoffrey Gordon
%E David Dunson
%E Miroslav Dudík	
%F pmlr-v15-tillman11a
%I PMLR
%P 3--15
%U https://proceedings.mlr.press/v15/tillman11a.html
%V 15
%X While there has been considerable research in learning probabilistic graphical models from data for predictive and causal inference, almost all existing algorithms assume a single dataset of i.i.d. observations for all variables. For many applications, it may be impossible or impractical to obtain such datasets, but multiple datasets of i.i.d. observations for different subsets of these variables may be available. Tillman et al. (2009) showed how directed graphical models learned from such datasets can be integrated to construct an equivalence class of structures over all variables. While their procedure is correct, it assumes that the structures integrated do not entail contradictory conditional independences and dependences for variables in their intersections. While this assumption is reasonable asymptotically, it rarely holds in practice with finite samples due to the frequency of statistical errors. We propose a new correct procedure for learning such equivalence classes directly from the multiple datasets which avoids this problem and is thus more practically useful. Empirical results indicate our method is not only more accurate, but also faster and requires less memory.

RIS


TY  - CPAPER
TI  - Learning equivalence classes of acyclic models with latent and selection variables from multiple datasets with overlapping variables
AU  - Robert Tillman
AU  - Peter Spirtes
BT  - Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics
DA  - 2011/06/14
ED  - Geoffrey Gordon
ED  - David Dunson
ED  - Miroslav Dudík	
ID  - pmlr-v15-tillman11a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 15
SP  - 3
EP  - 15
L1  - http://proceedings.mlr.press/v15/tillman11a/tillman11a.pdf
UR  - https://proceedings.mlr.press/v15/tillman11a.html
AB  - While there has been considerable research in learning probabilistic graphical models from data for predictive and causal inference, almost all existing algorithms assume a single dataset of i.i.d. observations for all variables. For many applications, it may be impossible or impractical to obtain such datasets, but multiple datasets of i.i.d. observations for different subsets of these variables may be available. Tillman et al. (2009) showed how directed graphical models learned from such datasets can be integrated to construct an equivalence class of structures over all variables. While their procedure is correct, it assumes that the structures integrated do not entail contradictory conditional independences and dependences for variables in their intersections. While this assumption is reasonable asymptotically, it rarely holds in practice with finite samples due to the frequency of statistical errors. We propose a new correct procedure for learning such equivalence classes directly from the multiple datasets which avoids this problem and is thus more practically useful. Empirical results indicate our method is not only more accurate, but also faster and requires less memory.
ER  -

APA


Tillman, R. & Spirtes, P.. (2011). Learning equivalence classes of acyclic models with latent and selection variables from multiple datasets with overlapping variables. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 15:3-15 Available from https://proceedings.mlr.press/v15/tillman11a.html.

Related Material

Download PDF