Proceedings of Machine Learning ResearchProceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008
Held in Whistler, Canada on 12 December 2008
Published as Volume 6 by the Proceedings of Machine Learning Research on 18 February 2010.
Volume Edited by:
Isabelle Guyon
Dominik Janzing
Bernhard Schölkopf
Series Editors:
Neil D. Lawrence
http://proceedings.mlr.press/v6/
Mon, 29 May 2017 07:17:54 +0000Mon, 29 May 2017 07:17:54 +0000Jekyll v3.4.3Discover Local Causal Network around a Target to a Given DepthFor a given target node \emphT and a given depth \emphk ≥ 1, we propose an algorithm for discovering a local causal network around the target \emphT to depth \emphk. In our algorithm, we find parents, children and some descendants (PCD) of nodes stepwise away from the target \emphT until all edges within the depth \emphk local network cannot be oriented further. Our algorithm extends the PCD-by-PCD algorithm for prediction with intervention presented in Yin et al. (2008). Our algorithm can construct a local network to depth \emphk, has a more efficient stop rule and finds PCDs along some but not all paths starting from the target.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/zhou10a.html
http://proceedings.mlr.press/v6/zhou10a.htmlReverse Engineering of Asynchronous Boolean Networks via Minimum Explanatory Set and Maximum LikelihoodIn this paper, we propose an approach for reconstructing asynchronous Boolean networks from observed data. We find the causal relationships in Boolean networks using an asynchronous evolution approach. In our approach, we first find a minimum explanatory set for a node to reduce complexity of candidate Boolean functions, and then we choose a Boolean function for the node based on the maximum likelihood. This approach is stimulated by the task SIGNET of the causal challenge #2 pot-luck (Jenkins, 2009). Besides the data set SIGNET, we also applied our approach to two other datasets to evaluate our approach: one is generated by Professor Isabelle Guyon and the other generated ourselves from the signal transduction network of Abscisic acid in guard cell.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/zheng10a.html
http://proceedings.mlr.press/v6/zheng10a.htmlDistinguishing Causes from Effects using Nonlinear Acyclic Causal ModelsDistinguishing causes from effects is an important problem in many areas. In this paper, we propose a very general but well defined nonlinear acyclic causal model, namely, post-nonlinear acyclic causal model with inner additive noise, to tackle this problem. In this model, each observed variable is generated by a nonlinear function of its parents, with additive noise, followed by a nonlinear distortion. The nonlinearity in the second stage takes into account the effect of sensor distortions, which are usually encountered in practice. In the two-variable case, if all the nonlinearities involved in the model are invertible, by relating the proposed model to the post-nonlinear independent component analysis (ICA) problem, we give the conditions under which the causal relation can be uniquely found. We present a two-step method, which is constrained nonlinear ICA followed by statistical independence tests, to distinguish the cause from the effect in the two-variable case. We apply this method to solve the problem “CauseEffectPairs” in the Pot-luck challenge, and successfully identify causes from effects.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/zhang10a.html
http://proceedings.mlr.press/v6/zhang10a.htmlLearning Causal Models That Make Correct Manipulation Predictions With Time Series DataOne of the fundamental purposes of causal models is using them to predict the effects of manipulating various components of a system. It has been argued by Dash (2005, 2003) that the \emphDo operator will fail when applied to an equilibrium model, unless the underlying dynamic system obeys what he calls \emphEquilibration-Manipulation Commutability. Unfortunately, this fact renders most existing causal discovery algorithms unreliable for reasoning about manipulations. Motivated by this caveat, in this paper we present a novel approach to causal discovery of dynamic models from time series. The approach uses a representation of dynamic causal models motivated by Iwasaki and Simon (1994), which asserts that all “causation across time” occurs because a variable’s derivative has been affected instantaneously. We present an algorithm that exploits this representation within a constraint-based learning framework by numerically calculating derivatives and learning instantaneous relationships. We argue that due to numerical errors in higher order derivatives, care must be taken when learning causal structure, but we show that the Iwasaki-Simon representation reduces the search space considerably, allowing us to forego calculating many high-order derivatives. In order for our algorithm to discover the dynamic model, it is necessary that the time-scale of the data is much finer than any temporal process of the system. Finally, we show that our approach can correctly recover the structure of a fairly complex dynamic system, and can predict the effect of manipulations accurately when a manipulation does not cause an instability. To our knowledge, this is the first causal discovery algorithm that has demonstrated that it can correctly predict the effects of manipulations for a system that does not obey the EMC condition.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/voortman10a.html
http://proceedings.mlr.press/v6/voortman10a.htmlWhen causality matters for prediction: investigating the practical tradeoffsRecent evaluations have indicated that in practice, general methods for prediction which do not account for changes in the conditional distribution of a target variable given feature values in some cases outperform causal discovery based methods for prediction which \emphcan account for such changes. We investigate some possibilities which may explain these findings. We give theoretical conditions, which are confirmed experimentally, for when particular manipulations of variables should not affect predictions for a target. We then consider the tradeoff between errors related to causality, i.e. not accounting for changes in a distribution after variables are manipulated, and errors resulting from sample bias, overfitting, and assuming specific parametric forms that do not fit the data, which most existing causal discovery based methods are particularly prone to making.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/tillman10a.html
http://proceedings.mlr.press/v6/tillman10a.htmlTIED: An Artificially Simulated Dataset with Multiple Markov BoundariesWe present an artificially simulated dataset (TIED) constructed so that there are many minimal sets of variables with maximal predictivity (i.e., Markov boundaries) and likewise many sets of variables that are statistically indistinguishable from the set of direct causes and direct effects of the response variable. This dataset was used in the Potluck Causality Challenge to determine all statistically indistinguishable sets of direct causes and direct effects and all Markov boundaries of the response variable and also to predict the response variable in the independent test data. We also present baseline results of application of several algorithms to this dataset.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/statnikov10a.html
http://proceedings.mlr.press/v6/statnikov10a.htmlThe Use of Bernoulli Mixture Models for Identifying Corners of a Hypercube and Extracting Boolean Rules From DataThis paper describes the use of Bernoulli mixture models for extracting boolean rules from data. Bernoulli mixtures identify high data density areas on the corners of a hypercube. One corner represents a conjunction of literals in a boolean clause and the set of all identified corners, of the hypercube, indicates disjuncts of clauses to form a rule. Further class labels can be used to select features or variables, in the individual conjuncts, that are relevant to the target variable. This method was applied to the SIGNET dataset of the causality workbench challenge. The dataset is derived from a biological signaling network with 21 time steps and 43 random boolean variables. Results indicate that Bernoulli mixtures are quite effective at extracting boolean rules from data.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/saeed10a.html
http://proceedings.mlr.press/v6/saeed10a.htmlCausal InferenceThis paper reviews a theory of causal inference based on the Structural Causal Model (SCM) described in Pearl (2000a). The theory unifies the graphical, potential-outcome (Neyman-Rubin), decision analytical, and structural equation approaches to causation, and provides both a mathematical foundation and a friendly calculus for the analysis of causes and counterfactuals. In particular, the paper establishes a methodology for inferring (from a combination of data and assumptions) the answers to three types of causal queries: (1) queries about the effect of potential interventions, (2) queries about counterfactuals, and (3) queries about the direct (or indirect) effect of one event on another.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/pearl10a.html
http://proceedings.mlr.press/v6/pearl10a.htmlComparison of Granger Causality and Phase Slope IndexWe recently proposed a new measure, termed Phase Slope Index (PSI), It estimates the causal direction of interactions robustly with respect to instantaneous mixtures of independent sources with arbitrary spectral content. We compared this method to Granger Causality for linear systems containing spatially and temporarily mixed noise and found that, in contrast to PSI, the latter was not able to properly distinguish truly interacting systems from mixed noise. Here, we extent this analysis with respect to two aspects: a) we analyze Granger causality and PSI also for non-mixed noise, and b) we analyze PSI for nonlinear interactions. We found a) that Granger causality, in contrast to PSI, fails also for non-mixed noise if the memory-time of the sender of information is long compared to the transmission time of the information, and b) that PSI, being a linear method, eventually misses nonlinear interactions but is unlikely to give false positive results.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/nolte10a.html
http://proceedings.mlr.press/v6/nolte10a.htmlFast Committee-Based Structure LearningCurrent methods for causal structure learning tend to be computationally intensive or intractable for large datasets. Some recent approaches have speeded up the process by first making hard decisions about the set of parents and children for each variable, in order to break large-scale problems into sets of tractable local neighbourhoods. We use this principle in order to apply a structure learning committee for orientating edges between variables. We find that a combination of weak structure learners can be effective in recovering causal dependencies. Though such a formulation would be intractable for large problems at the global level, we show that it can run quickly when processing local neighbourhoods in turn. Experimental results show that this localized, committee-based approach has advantages over standard causal discovery algorithms both in terms of speed and accuracy.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/mwebaze10a.html
http://proceedings.mlr.press/v6/mwebaze10a.htmlDistinguishing between cause and effectWe describe eight data sets that together formed the \textttCauseEffectPairs task in the \emphCausality Challenge #2: Pot-Luck competition. Each set consists of a sample of a pair of statistically dependent random variables. One variable is known to cause the other one, but this information was hidden from the participants; the task was to identify which of the two variables was the cause and which one the effect, based upon the observed sample. The data sets were chosen such that we expect common agreement on the ground truth. Even though part of the statistical dependences may also be due to hidden common causes, common sense tells us that there is a significant cause-effect relation between the two variables in each pair. We also present baseline results using three different causal inference methods.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/mooij10a.html
http://proceedings.mlr.press/v6/mooij10a.htmlCausality Challenge: Benchmarking relevant signal components for effective monitoring and process controlA complex modern manufacturing process is normally under consistent surveillance via the monitoring of signals/variables collected from sensors. However, not all of these signals are equally valuable in a specific monitoring system. The measured signals contain a combination of useful information, irrelevant information as well as noise. It is often the case that useful information is buried in the latter two. Engineers typically have a much larger number of signals than are actually required. If we consider each type of signal as a feature, then feature selection may be used to identify the most predictive signals. Once these signals have been identified causal relevance may then be investigated to try and identify the causal features. The Process Engineers may then use these signals to ensure a small scrap rate further downstream in the process, increase the throughput and reduce the per unit production costs. Working in partnership with industry we aim to address this complex problem as part of their process control engineering in the context of wafer fabrication production and enhance current business improvement techniques with the application of causal feature selection as an intelligent systems technique.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/mccann10a.html
http://proceedings.mlr.press/v6/mccann10a.htmlBayesian Algorithms for Causal Data MiningWe present two Bayesian algorithms CD-B and CD-H for discovering unconfounded cause and effect relationships from observational data without assuming causal sufficiency which precludes hidden common causes for the observed variables. The CD-B algorithm first estimates the Markov blanket of a node \emphX using a Bayesian greedy search method and then applies Bayesian scoring methods to discriminate the parents and children of \emphX. Using the set of parents and set of children CD-B constructs a global Bayesian network and outputs the causal effects of a node \emphX based on the identification of Y arcs. Recall that if a node \emphX has two parent nodes \emphA, B and a child node \emphC such that there is no arc between \emphA, B and \emphA, B are not parents of \emphC, then the arc from \emphX to \emphC is called a Y arc. The CD-H algorithm uses the MMPC algorithm to estimate the union of parents and children of a target node \emphX. The subsequent steps are similar to those of CD-B. We evaluated the CD-B and CD-H algorithms empirically based on simulated data from four different Bayesian networks. We also present comparative results based on the identification of Y structures and Y arcs from the output of the PC, MMHC and FCI algorithms. The results appear promising for mining causal relationships that are unconfounded by hidden variables from observational data.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/mani10a.html
http://proceedings.mlr.press/v6/mani10a.htmlInference of Graphical Causal Models: Representing the Meaningful Information of Probability DistributionsThis paper studies the feasibility and interpretation of learning the causal structure from observational data with the principles behind the Kolmogorov Minimal Sufficient Statistic (KMSS). The KMSS provides a generic solution to inductive inference. It states that we should seek for the minimal model that captures all regularities of the data. The conditional independencies following from the system’s causal structure are the regularities incorporated in a graphical causal model. The meaningful information provided by a Bayesian network corresponds to the decomposition of the description of the system into Conditional Probability Distributions (CPDs). The decomposition is described by the Directed Acyclic Graph (DAG). For a causal interpretation of the DAG, the decomposition should imply modularity of the CPDs. The CPDs should match up with independent parts of reality that can be changed independently. We argue that if the shortest description of the joint distribution is given by separate descriptions of the conditional distributions for each variable given its effects, the decomposition given by the DAG should be considered as the top-ranked causal hypothesis. Even when the causal interpretation is faulty, it serves as a reference model. Modularity becomes, however, implausible if the concatenation of the description of some CPDs is compressible. Then there might be a kind of meta-mechanism governing some of the mechanisms or either a single mechanism responsible for setting the state of multiple variables.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/lemeire10a.html
http://proceedings.mlr.press/v6/lemeire10a.htmlSIGNET: Boolean Rule Determination for Abscisic Acid SignalingThis paper describes the SIGNET dataset generated for the Causality Challenge. Cellular signaling pathways are most elusive types of networks to access experimentally due to the lack of methods for determining the state of a signaling network in an intact living cell. Boolean network models are currently being used for the modeling of signaling networks due to their compact formulation and ability to adequately represent network dynamics without the need for chemical kinetics. The problem posed in the SIGNET challenge is to determine the set of Boolean rules that describe the interactions of nodes within a plant signaling network, given a set of 300 Boolean pseudodynamic simulations of the true rules. The two solution methods that were presented revealed that the problem can be solved to greater than 99% accuracy.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/jenkins10a.html
http://proceedings.mlr.press/v6/jenkins10a.htmlStructure Learning in Causal Cyclic NetworksCyclic graphical models are unnecessary for accurate representation of joint probability distributions, but are often indispensable when a causal representation of variable relationships is desired. For variables with a cyclic causal dependence structure, DAGs are guaranteed not to recover the correct causal structure, and therefore may yield false predictions about the outcomes of perturbations (and even inference.) In this paper, we introduce an approach to generalize Bayesian Network structure learning to structures with cyclic dependence. We introduce a structure learning algorithm, prove its performance given reasonable assumptions, and use simulated data to compare its results to the results of standard Bayesian network structure learning. We then propose a modified, heuristic algorithm with more modest data requirements, and test its performance on a real-life dataset from molecular biology, containing causal, cyclic dependencies.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/itani10a.html
http://proceedings.mlr.press/v6/itani10a.htmlSparse Causal Discovery in Multivariate Time SeriesOur goal is to estimate causal interactions in multivariate time series. Using vector autoregressive (VAR) models, these can be defined based on non-vanishing coefficients belonging to respective time-lagged instances. As in most cases a parsimonious causality structure is assumed, a promising approach to causal discovery consists in fitting VAR models with an additional sparsity-promoting regularization. Along this line we here propose that sparsity should be enforced for the subgroups of coefficients that belong to each pair of time series, as the absence of a causal relation requires the coefficients for all time-lags to become jointly zero. Such behavior can be achieved by means of \emphl_1,2-norm regularized regression, for which an efficient active set solver has been proposed recently. Our method is shown to outperform standard methods in recovering simulated causality graphs. The results are on par with a second novel approach which uses multiple statistical testing.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/haufe10a.html
http://proceedings.mlr.press/v6/haufe10a.htmlCausality: Objectives and AssessmentThe NIPS 2008 workshop on causality provided a forum for researchers from different horizons to share their view on causal modeling and address the difficult question of assessing causal models. There has been a vivid debate on properly separating the notion of causality from particular models such as graphical models, which have been dominating the field in the past few years. Part of the workshop was dedicated to discussing the results of a challenge, which offered a wide variety of applications of causal modeling. We have regrouped in these proceedings the best papers presented. Most lectures were videotaped or recorded. All information regarding the challenge and the lectures are found at http://www.clopinet.com/isabelle/Projects/NIPS2008/. This introduction provides a synthesis of the findings and a gentle introduction to causality topics, which are the object of active research.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/guyon10a.html
http://proceedings.mlr.press/v6/guyon10a.htmlCausal Discovery as a GameThis paper presents a game theoretic approach to causal discovery. The problem of causal discovery is framed as a game of the Scientist against Nature, in which Nature attempts to hide its secrets for as long as possible, and the Scientist makes her best effort at discovery while minimizing cost. This approach provides a very general framework for the assessment of different search procedures and a principled way of modeling the effect of choices between different experiments.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/eberhardt10a.html
http://proceedings.mlr.press/v6/eberhardt10a.htmlCausal learning without DAGsCausal learning methods are often evaluated in terms of their ability to discover a true underlying directed acyclic graph (DAG) structure. However, in general the true structure is unknown and may not be a DAG structure. We therefore consider evaluating causal learning methods in terms of predicting the effects of interventions on unseen test data. Given this task, we show that there exist a variety of approaches to modeling causality, generalizing DAG-based methods. Our experiments on synthetic and biological data indicate that some non-DAG models perform as well or better than DAG-based methods at causal prediction tasks.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/duvenaud10a.html
http://proceedings.mlr.press/v6/duvenaud10a.htmlBeware of the DAG!Directed acyclic graph (DAG) models are popular tools for describing causal relationships and for guiding attempts to learn them from data. They appear to supply a means of extracting causal conclusions from probabilistic conditional independence properties inferred from purely observational data. I take a critical look at this enterprise, and suggest that it is in need of more, and more explicit, methodological and philosophical justification than it typically receives. In particular, I argue for the value of a clean separation between formal causal language and intuitive causal assumptions.Thu, 18 Feb 2010 00:00:00 +0000
http://proceedings.mlr.press/v6/dawid10a.html
http://proceedings.mlr.press/v6/dawid10a.html