Proceedings of Machine Learning ResearchProceedings of the Second Conference on Causal Learning and Reasoning
Held in Amazon Development Center, T\"ubingen, Germany on 11-14 April 2023
Published as Volume 213 by the Proceedings of Machine Learning Research on 10 August 2023.
Volume Edited by:
Mihaela van der Schaar
Cheng Zhang
Dominik Janzing
Series Editors:
Neil D. Lawrence
https://proceedings.mlr.press/v213/
Thu, 10 Aug 2023 10:10:35 +0000Thu, 10 Aug 2023 10:10:35 +0000Jekyll v3.9.3Factual Observation Based Heterogeneity Learning for Counterfactual PredictionExtant causal methods exclusively exploit the heterogeneity based on the observed covariates for heterogeneous outcome prediction. Even with nowadays big data, the collected covariates may not contain complete confounders. When some confounders are absent, the methods can suffer from confounding bias and missing heterogeneity. To address these two issues, we propose to leverage the factual observation in the observational data to recover the latent confounders. Since the learned confounder representation exploits the heterogeneity of latent confounders, it leads to finer granular heterogeneous outcome prediction, which is closer to the individual-level than prediction conditional on only covariates. Specifically, we propose a novel Factual Observation based Heterogeneity Learning (FOHL) algorithm with an encoder for confounder representation learning and a decoder for outcome prediction. Theoretical analysis reveals the validity of recovering confounders from factual observations to make the heterogeneous prediction closer to the individual-level. Furthermore, experimental results demonstrate that our FOHL method can outperform the existing baselines.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/zou23a.html
https://proceedings.mlr.press/v213/zou23a.htmlCausal Inference under Interference and Model UncertaintyAlgorithms that take data as input commonly assume that variables in the input dataset are Independent and Identically Distributed (IID). However, IID may be violated in many real world datasets that are generated by processes in which units/samples interact with one another. Typical examples include contagion that may be related to infectious diseases in public health, economic crisis in finance and risky behavior in social science. Handling non-IID data (without making additional assumptions) requires access to the true data generating process and the exact interaction patterns among units/samples, which may not be easily available. This work focuses on a specific type of interaction among samples, namely interference (i.e. some units’ treatments affect other units’ outcomes), in situations where there exists uncertainty regarding interaction patterns. The main contributions include modeling uncertain interaction using linear graphical causal models, quantifying bias when IID is incorrectly assumed, presenting a procedure to remove such bias and deriving bounds for average causal effects.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/zhang23a.html
https://proceedings.mlr.press/v213/zhang23a.htmlJointly Learning Consistent Causal Abstractions Over Multiple Interventional DistributionsAn abstraction can be used to relate two structural causal models representing the same system at different levels of resolution. Learning abstractions which guarantee consistency with respect to interventional distributions would allow one to jointly reason about evidence across multiple levels of granularity while respecting the underlying cause-effect relationships. In this paper, we introduce a first framework for causal abstraction learning between SCMs based on the formalization of abstraction recently proposed by Rischel (2020). Based on that, we propose a differentiable programming solution that jointly solves a number of combinatorial sub-problems, and we study its performance and benefits against independent and sequential approaches on synthetic settings and on a challenging real-world problem related to electric vehicle battery manufacturing.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/zennaro23a.html
https://proceedings.mlr.press/v213/zennaro23a.htmlNon-parametric identifiability and sensitivity analysis of synthetic control modelsQuantifying cause and effect relationships is an important problem in many domains, from medicine to economics. The gold standard solution to this problem is to conduct a randomised controlled trial. However, in many situations such trials cannot be performed. In the absence of such trials, many methods have been devised to quantify the causal impact of an intervention from observational data given certain assumptions. One widely used method are synthetic control models. While identifiability of the causal estimand in such models has been obtained from a range of assumptions, it is widely and implicitly assumed that the underlying assumptions are satisfied for all time periods both pre- and post-intervention. This is a strong assumption, as synthetic control models can only be learned in pre-intervention period. In this paper we address this challenge, and prove identifiability can be obtained without the need for this assumption, by showing it follows from the principle of invariant causal mechanisms. Moreover, for the first time, we formulate and study synthetic control models in Pearl’s structural causal model framework. Importantly, we provide a general framework for sensitivity analysis of synthetic control causal inference to violations of the assumptions underlying non-parametric identifiability. We end by providing an empirical demonstration of our sensitivity analysis framework on simulated and real data.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/zeitler23a.html
https://proceedings.mlr.press/v213/zeitler23a.htmlDirected Graphical Models and Causal Discovery for Zero-Inflated DataWith advances in technology, gene expression measurements from single cells can be used to gain refined insights into regulatory relationships among genes. Directed graphical models are well-suited to explore such (cause-effect) relationships. However, statistical analyses of single cell data are complicated by the fact that the data often show zero-inflated expression patterns. To address this challenge, we propose directed graphical models that are based on Hurdle conditional distributions parametrized in terms of polynomials in parent variables and their $0/1$ indicators of being zero or nonzero. While directed graphs for Gaussian models are only identifiable up to an equivalence class in general, we show that, under a natural and weak assumption, the exact directed acyclic graph of our zero-inflated models can be identified. We propose methods for graph recovery, apply our model to real single-cell gene expression data on T helper cells, and show simulated experiments that validate the identifiability and graph estimation methods in practice.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/yu23a.html
https://proceedings.mlr.press/v213/yu23a.htmlOn the Interventional Kullback-Leibler DivergenceModern machine learning approaches excel in static settings where a large amount of i.i.d. training data are available for a given task. In a dynamic environment though, an intelligent agent needs to be able to transfer knowledge and re-use learned components across domains. It has been argued that this may be possible through causal models, aiming to mirror the modularity of the real world in terms of independent causal mechanisms. However, the true causal structure underlying a given set of data is generally not identifiable, so it is desirable to have means to quantify differences between models (e.g., between the ground truth and an estimate), on both the observational and interventional level. In the present work, we introduce the Interventional Kullback-Leibler (IKL) divergence to quantify both structural and distributional differences between models based on a finite set of multi-environment distributions generated by interventions from the ground truth. Since we generally cannot quantify all differences between causal models for every finite set of interventional distributions, we propose a sufficient condition on the intervention targets to identify subsets of observed variables on which the models provably agree or disagree.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/wildberger23a.html
https://proceedings.mlr.press/v213/wildberger23a.htmlLeveraging Causal Graphs for Blocking in Randomized ExperimentsRandomized experiments are often performed to study the causal effects of interest. Blocking is a technique to precisely estimate the causal effects when the experimental material is not homogeneous. It involves stratifying the available experimental material based on the covariates causing non-homogeneity and then randomizing the treatment within those strata (known as blocks). This eliminates the unwanted effect of the covariates on the causal effects of interest. We investigate the problem of finding a stable set of covariates to be used to form blocks, that minimizes the variance of the causal effect estimates. Using the underlying causal graph, we provide an efficient algorithm to obtain such a set for a general semi-Markovian causal model.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/umrawal23a.html
https://proceedings.mlr.press/v213/umrawal23a.htmlUnsupervised Object Learning via Common FateLearning generative object models from unlabelled videos is a long standing problem and required for causal scene modeling. We decompose this problem into three easier subtasks, and provide candidate solutions for each of them. Inspired by the Common Fate Principle of Gestalt Psychology, we first extract (noisy) masks of moving objects via unsupervised motion segmentation. Second, generative models are trained on the masks of the background and the moving objects, respectively. Third, background and foreground models are combined in a conditional “dead leaves” scene model to sample novel scene configurations where occlusions and depth layering arise naturally. To evaluate the individual stages, we introduce the FISHBOWL dataset positioned between complex real-world scenes and common object-centric benchmarks of simplistic objects. We show that our approach learns generative models that generalize beyond occlusions present in the input videos and represents scenes in a modular fashion, allowing generation of plausible scenes outside the training distribution by permitting, for instance, object numbers or densities not observed during training.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/tangemann23a.html
https://proceedings.mlr.press/v213/tangemann23a.htmlSample-Specific Root Causal Inference with Latent VariablesRoot causal analysis seeks to identify the set of initial perturbations that induce an unwanted outcome. In prior work, we defined sample-specific root causes of disease using exogenous error terms that predict a diagnosis in a structural equation model. We rigorously quantified predictivity using Shapley values. However, the associated algorithms for inferring root causes assume no latent confounding. We relax this assumption by permitting confounding among the predictors. We then introduce a corresponding procedure called Extract Errors with Latents (EEL) for recovering the error terms up to contamination by other error terms lying on certain paths under the linear non-Gaussian acyclic model. EEL also identifies the smallest sets of dependent errors for fast computation of the Shapley values. The algorithm bypasses the hard problem of estimating the underlying causal graph in both cases. Experiments highlight the superior accuracy and robustness of EEL relative to its predecessors.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/strobl23b.html
https://proceedings.mlr.press/v213/strobl23b.htmlGeneralizing Clinical Trials with Convex HullsRandomized clinical trials eliminate confounding but impose strict exclusion criteria that limit recruitment to a subset of the population. Observational datasets are more inclusive but suffer from confounding – often providing overly optimistic estimates of treatment response over time due to partially optimized physician prescribing patterns. We therefore assume that the unconfounded treatment response lies somewhere in between the observational estimate before and the observational estimate after treatment assignment. This assumption allows us to extrapolate results from exclusive trials to the broader population by analyzing observational and trial data simultaneously using an algorithm called Optimum in Convex Hulls (OCH). OCH represents the treatment effect either in terms of convex hulls of conditional expectations or convex hulls (also known as mixtures) of conditional densities. The algorithm first learns the component expectations or densities using the observational data and then learns the linear mixing coefficients using trial data in order to approximate the true treatment effect; theory importantly explains why this linear combination should hold. OCH estimates the treatment effect in terms both expectations and densities with state of the art accuracy.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/strobl23a.html
https://proceedings.mlr.press/v213/strobl23a.htmlCausal Learning through Deliberate UndersamplingDomain scientists interested in causal mechanisms are usually limited by the frequency at which they can collect the measurements of social, physical, or biological systems. A common and plausible assumption is that higher measurement frequencies are the only way to gain more informative data about the underlying dynamical causal structure. This assumption is a strong driver for designing new, faster instruments, but such instruments might not be feasible or even possible. In this paper, we show that this assumption is incorrect: there are situations in which we can gain additional information about the causal structure by measuring more <em>slowly</em> than our current instruments. We present an algorithm that uses graphs at multiple measurement timescales to infer underlying causal structure, and show that inclusion of structures at slower timescales can nonetheless reduce the size of the equivalence class of possible causal structures. We provide simulation data about the probability of cases in which deliberate undersampling yields a gain, as well as the size of this gain. Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/solovyeva23a.html
https://proceedings.mlr.press/v213/solovyeva23a.htmlInfluence-Aware Attention for Multivariate Temporal Point ProcessesIdentifying the subset of events that influence events of interest from continuous time datasets is of great interest in various applications. Existing methods however often fail to produce accurate and interpretable results in a time-efficient manner. In this paper, we propose a neural model – Influence-Aware Attention for Multivariate Temporal Point Processes (IAA-MTPPs) – which leverages the powerful attention mechanism in transformers to capture temporal dynamics between event types, which is different from existing instance-to-instance attentions, using variational inference while maintaining interpretability. Given event sequences and a prior influence matrix, IAA-MTPP efficiently learns an approximate posterior by an Attention-to-Influence mechanism, and subsequently models the conditional likelihood of the sequences given a sampled influence through an Influence-to-Attention formulation. Both steps are completed efficiently inside a B-block multi-head self-attention layer, thus our end-to-end training with parallelizable transformer architecture enables faster training compared to sequential models such as RNNs. We demonstrate strong empirical performance compared to existing baselines on multiple synthetic and real benchmarks, including qualitative analysis for an application in decentralized finance.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/shou23a.html
https://proceedings.mlr.press/v213/shou23a.htmlA Meta-Reinforcement Learning Algorithm for Causal DiscoveryUncovering the underlying causal structure of a phenomenon, domain or environment is of great scientific interest, not least because of the inferences that can be derived from such structures. Unfortunately though, given an environment, identifying its causal structure poses significant challenges. Amongst those are the need for costly interventions and the size of the space of possible structures that has to be searched. In this work, we propose a meta-reinforcement learning setup that addresses these challenges by learning a causal discovery algorithm, called Meta-Causal Discovery, or MCD. We model this algorithm as a policy that is trained on a set of environments with known causal structures to perform budgeted interventions. Simultaneously, the policy learns to maintain an estimate of the environment’s causal structure. The learned policy can then be used as a causal discovery algorithm to estimate the structure of environments in a matter of milliseconds. At test time, our algorithm performs well even in environments that induce previously unseen causal structures. We empirically show that MCD estimates good graphs compared to SOTA approaches on toy environments and thus constitutes a proof-of-concept of learning causal discovery algorithms. Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/sauter23a.html
https://proceedings.mlr.press/v213/sauter23a.htmlFactorization of the Partial Covariance in Singly-Connected Path DiagramsWe extend path analysis by showing that, for a singly-connected path diagram, the partial covariance of two random variables factorizes over the nodes and edges in the path between the variables. This result allows us to determine the contribution of each node and edge to the partial covariance. It also allows us to show that Simpson’s paradox cannot occur in singly-connected path diagrams.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/pena23a.html
https://proceedings.mlr.press/v213/pena23a.htmlStochastic Causal Programming for Bounding Treatment EffectsCausal effect estimation is important for many tasks in the natural and social sciences. We design algorithms for the continuous partial identification problem: bounding the effects of multivariate, continuous treatments when unmeasured confounding makes identification impossible. Specifically, we cast causal effects as objective functions within a constrained optimization problem, and minimize/maximize these functions to obtain bounds. We combine flexible learning algorithms with Monte Carlo methods to implement a family of solutions under the name of stochastic causal programming. In particular, we show how the generic framework can be efficiently formulated in settings where auxiliary variables are clustered into pre-treatment and post-treatment sets, where no fine-grained causal graph can be easily specified. In these settings, we can avoid the need for fully specifying the distribution family of hidden common causes. Monte Carlo computation is also much simplified, leading to algorithms which are more computationally stable against alternatives.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/padh23a.html
https://proceedings.mlr.press/v213/padh23a.htmlLocal Dependence Graphs for Discrete Time ProcessesLocal dependence graphs for discrete time processes encapsulate information concerning the dependence relationships between the past of the multidimensional process and its present state and as such can represent feedback loops. Even in the discrete time setting some natural questions relating the conditional (in)dependence statements in the stochastic process to separation properties of the underlying local dependence graph are scattered throughout the literature. We provide an unifying view and fill in certain gaps. In this paper we examine graphical characteristics for two kinds of conditional independences: those occurring in Markov chains under the stationary regime and independences between the past of one subprocess and the future of another given the past of the third subprocess. Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/niemiro23a.html
https://proceedings.mlr.press/v213/niemiro23a.htmlScalable Causal Discovery with Score MatchingThis paper demonstrates how to discover the whole causal graph from the second derivative of the log-likelihood in observational non-linear additive Gaussian noise models. Leveraging scalable machine learning approaches to approximate the score function $\nabla \operatorname{log}p(\mathbf{X})$, we extend the work of Rolland et al. (2022) that only recovers the topological order from the score and requires an expensive pruning step removing spurious edges among those admitted by the ordering. Our analysis leads to DAS (acronym for Discovery At Scale), a practical algorithm that reduces the complexity of the pruning by a factor proportional to the graph size. In practice, DAS achieves competitive accuracy with current state-of-the-art while being over an order of magnitude faster. Overall, our approach enables principled and scalable causal discovery, significantly lowering the compute bar. Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/montagna23b.html
https://proceedings.mlr.press/v213/montagna23b.htmlCausal Discovery with Score Matching on Additive Models with Arbitrary NoiseCausal discovery methods are intrinsically constrained by the set of assumptions needed to ensure structure identifiability. Moreover additional restrictions are often imposed in order to simplify the inference task: this is the case for the Gaussian noise assumption on additive non-linear models, which is common to many causal discovery approaches. In this paper we show the shortcomings of inference under this hypothesis, analyzing the risk of edge inversion under violation of Gaussianity of the noise terms. Then, we propose a novel method for inferring the topological ordering of the variables in the causal graph, from data generated according to an additive non-linear model with a generic noise distribution. This leads to NoGAM (Not only Gaussian Additive noise Models), a causal discovery algorithm with a minimal set of assumptions and state of the art performance, experimentally benchmarked on synthetic data.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/montagna23a.html
https://proceedings.mlr.press/v213/montagna23a.htmlInstrumental Processes Using Integrated CovariancesInstrumental variable methods are often used for parameter estimation in the presence of confounding. They can also be applied in stochastic processes. Instrumental variable analysis exploits moment equations to obtain estimators for causal parameters. We show that in stochastic processes one can find such moment equations using an integrated covariance matrix. This provides new instrumental variable methods, instrumental variable methods in a class of continuous-time processes as well as a unified treatment of discrete- and continuous-time processes.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/mogensen23a.html
https://proceedings.mlr.press/v213/mogensen23a.htmlCausal Abstraction with Soft InterventionsCausal abstraction provides a theory describing how several causal models can represent the same system at different levels of detail. Existing theoretical proposals limit the analysis of abstract models to "hard" interventions fixing causal variables to be constant values. In this work, we extend causal abstraction to "soft" interventions, which assign possibly non-constant functions to variables without adding new causal connections. Specifically, (i) we generalize $\tau$-abstraction from Beckers and Halpern (2019) to soft interventions, (ii) we propose a further definition of soft abstraction to ensure a unique map $\omega$ between soft interventions, and (iii) we prove that our constructive definition of soft abstraction guarantees the intervention map $\omega$ has a specific and necessary explicit form.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/massidda23a.html
https://proceedings.mlr.press/v213/massidda23a.htmlPractical Algorithms for Orientations of Partially Directed Graphical ModelsIn observational studies, the true causal model is typically unknown and needs to be estimated from available observational and limited experimental data. In such cases, the learned causal model is commonly represented as a partially directed acyclic graph (PDAG), which contains both directed and undirected edges indicating uncertainty of causal relations between random variables. The main focus of this paper is on the maximal orientation task, which, for a given PDAG, aims to orient the undirected edges maximally such that the resulting graph represents the same Markov equivalent DAGs as the input PDAG. This task is a subroutine used frequently in causal discovery, e.g., as the final step of the celebrated PC algorithm. Utilizing connections to the problem of finding a consistent DAG extension of a PDAG, we derive faster algorithms for computing the maximal orientation by proposing two novel approaches for extending PDAGs, both constructed with an emphasis on simplicity and practical effectiveness.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/luttermann23a.html
https://proceedings.mlr.press/v213/luttermann23a.htmlLearning Causal Representations of Single Cells via Sparse Mechanism Shift ModelingLatent variable models such as the Variational Auto-Encoder (VAE) have become a go-to tool for analyzing biological data, especially in the field of single-cell genomics. One remaining challenge is the interpretability of latent variables as biological processes that define a cell’s identity. Outside of biological applications, this problem is commonly referred to as learning disentangled representations. Although several disentanglement-promoting variants of the VAE were introduced, and applied to single-cell genomics data, this task has been shown to be infeasible from independent and identically distributed measurements, without additional structure. Instead, recent methods propose to leverage non-stationary data, as well as the sparse mechanism shift assumption in order to learn disentangled representations with a causal semantic. Here, we extend the application of these methodological advances to the analysis of single-cell genomics data with genetic or chemical perturbations. More precisely, we propose a deep generative model of single-cell gene expression data for which each perturbation is treated as a stochastic intervention targeting an unknown, but sparse, subset of latent variables. We benchmark these methods on simulated single-cell data to evaluate their performance at latent units recovery, causal target identification and out-of-domain generalization. Finally, we apply those approaches to two real-world large-scale gene perturbation data sets and find that models that exploit the sparse mechanism shift hypothesis surpass contemporary methods on a transfer learning task. We implement our new model and benchmarks using the scvi-tools library, and release it as open-source software at \url{https://github.com/Genentech/sVAE}.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/lopez23a.html
https://proceedings.mlr.press/v213/lopez23a.htmlCausal Triplet: An Open Challenge for Intervention-centric Causal Representation LearningRecent years have seen a surge of interest in learning high-level causal representations from low-level image pairs under interventions. Yet, existing efforts are largely limited to simple synthetic settings that are far away from real-world problems. In this paper, we present CausalTriplet, a causal representation learning benchmark featuring not only visually more complex scenes, but also two crucial desiderata commonly overlooked in previous works: (i) an actionable counterfactual setting, where only certain (object-level) variables allow for counterfactual observations whereas others do not; (ii) an interventional downstream task with an emphasis on out-of-distribution robustness from the independent causal mechanisms principle. Through extensive experiments, we find that models built with the knowledge of disentangled or object-centric representations significantly outperform their distributed counterparts. However, recent causal representation learning methods still struggle to identify such latent structures, indicating substantial challenges and opportunities in CausalTriplet. Our code and datasets will be available at https://sites.google.com/view/causaltriplet.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/liu23a.html
https://proceedings.mlr.press/v213/liu23a.htmlBacktracking CounterfactualsCounterfactual reasoning—envisioning hypothetical scenarios, or possible worlds, where some circumstances are different from what (f)actually occurred (counter-to-fact)—is ubiquitous in human cognition. Conventionally, counterfactually-altered circumstances have been treated as “small miracles” that locally violate the laws of nature while sharing the same initial conditions. In Pearl’s structural causal model (SCM) framework this is made mathematically rigorous via interventions that modify the causal laws while the values of exogenous variables are shared. In recent years, however, this purely interventionist account of counterfactuals has increasingly come under scrutiny from both philosophers and psychologists. Instead, they suggest a backtracking account of counterfactuals, according to which the causal laws remain unchanged in the counterfactual world; differences to the factual world are instead “backtracked” to altered initial conditions (exogenous variables). In the present work, we explore and formalise this alternative mode of counterfactual reasoning within the SCM framework. Despite ample evidence that humans backtrack, the present work constitutes, to the best of our knowledge, the first general account and algorithmisation of backtracking counterfactuals. We discuss our backtracking semantics in the context of related literature and draw connections to recent developments in explainable artificial intelligence (XAI).Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/kugelgen23a.html
https://proceedings.mlr.press/v213/kugelgen23a.htmlImage-based Treatment Effect HeterogeneityRandomized controlled trials (RCTs) are considered the gold standard for estimating the Average Treatment Effect (ATE) of interventions. One important use of RCTs is to study the causes of global poverty – a subject explicitly cited in the 2019 Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel awarded to Duflo, Banerjee, and Kremer “for their experimental approach to alleviating global poverty.” Because the ATE is a population summary, researchers often want to better understand how the treatment effect varies across different populations by conditioning on tabular variables such as age and ethnicity that were measured during the RCT data collection. Although such variables carry substantive importance, they are often only observed only near the time of the experiment: exclusive use of such variables may fail to capture historical, geographical, or neighborhood-specific contributors to effect variation. In global poverty research, when the geographical location of the experiment units is approximately known, satellite imagery can provide a window into such historical and geographical factors important for understanding heterogeneity. However, there is no causal inference method that specifically enables applied researchers to analyze Conditional Average Treatment Effects (CATEs) from images. In this paper, we develop a deep probabilistic modeling framework that identifies clusters of images with similar treatment effect distributions, enabling researchers to analyze treatment effect variation by image. Our interpretable image CATE model also emphasizes an image sensitivity factor that quantifies the importance of image segments in contributing to the mean effect cluster prediction. We compare the proposed methods against alternatives in simulation; additionally, we show how the model works in an actual RCT, estimating the effects of an anti-poverty intervention in northern Uganda and obtaining a posterior predictive distribution over treatment effects for the rest of the country where no experimental data was collected. We make code for all modeling strategies available in an open-source software package and discuss their applicability in other domains (such as the biomedical sciences) where image data are also prevalent.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/jerzak23a.html
https://proceedings.mlr.press/v213/jerzak23a.htmlOn Discovery of Local Independence over Continuous Variables via Neural Contextual DecompositionConditional independence provides a way to understand causal relationships among the variables of interest. An underlying system may exhibit more fine-grained causal relationships especially between a variable and its parents, which will be called the local independence relationships. One of the most widely studied local relationships is Context-Specific Independence (CSI), which holds in a specific assignment of conditioned variables. However, its applicability is often limited since it does not allow continuous variables: data conditioned to the specific value of a continuous variable contains few instances, if not none, making it infeasible to test independence. In this work, we define and characterize the local independence relationship that holds in a specific set of joint assignments of parental variables, which we call context-set specific independence (CSSI). We then provide a canonical representation of CSSI and prove its fundamental properties. Based on our theoretical findings, we cast the problem of discovering multiple CSSI relationships in a system as finding a partition of the joint outcome space. Finally, we propose a novel method, coined neural contextual decomposition (NCD), which learns such partition by imposing each set to induce CSSI via modeling a conditional distribution. We empirically demonstrate that the proposed method successfully discovers the ground truth local independence relationships in both synthetic dataset and complex system reflecting the real-world physical dynamics.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/hwang23a.html
https://proceedings.mlr.press/v213/hwang23a.htmlAn Algorithm and Complexity Results for Causal Unit SelectionThe unit selection problem aims to identify objects, called units, that are most likely to exhibit a desired mode of behavior when subjected to stimuli (e.g., customers who are about to churn but would change their mind if encouraged). Unit selection with counterfactual objective functions was introduced relatively recently with existing work focusing on bounding a specific class of objective functions, called the benefit functions, based on observational and interventional data—assuming a fully specified model is not available to evaluate these functions. We complement this line of work by proposing the first exact algorithm for finding optimal units given a broad class of causal objective functions and a fully specified structural causal model (SCM). We show that unit selection under this class of objective functions is $\mbox{NP}^{\mbox{PP}}$-complete but is NP-complete when unit variables correspond to all exogenous variables in the SCM. We also provide treewidth-based complexity bounds on our proposed algorithm while relating it to a well-known algorithm for Maximum a Posteriori (MAP) inference.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/huang23a.html
https://proceedings.mlr.press/v213/huang23a.htmlEvaluating Temporal Observation-Based Causal Discovery Techniques Applied to Road Driver BehaviourAutonomous robots are required to reason about the behaviour of dynamic agents in their environment. The creation of models to describe these relationships is typically accomplished through the application of causal discovery techniques. However, as it stands observational causal discovery techniques struggle to adequately cope with conditions such as causal sparsity and non-stationarity typically seen during online usage in autonomous agent domains. Meanwhile, interventional techniques are not always feasible due to domain restrictions. In order to better explore the issues facing observational techniques and promote further discussion of these topics we carry out a benchmark across 10 contemporary observational temporal causal discovery methods in the domain of autonomous driving. By evaluating these methods upon causal scenes drawn from real world datasets in addition to those generated synthetically we highlight where improvements need to be made in order to facilitate the application of causal discovery techniques to the aforementioned use-cases. Finally, we discuss potential directions for future work that could help better tackle the difficulties currently experienced by state of the art techniques.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/howard23a.html
https://proceedings.mlr.press/v213/howard23a.htmlLocal Causal Discovery for Estimating Causal EffectsEven when the causal graph underlying our data is unknown, we can use observational data to narrow down the possible values that an average treatment effect (ATE) can take by (1) identifying the graph up to a Markov equivalence class; and (2) estimating that ATE for each graph in the class. While the PC algorithm can identify this class under strong faithfulness assumptions, it can be computationally prohibitive. Fortunately, only the local graph structure around the treatment is required to identify the set of possible ATE values, a fact exploited by local discovery algorithms to improve computational efficiency. In this paper, we introduce Local Discovery using Eager Collider Checks (LDECC), a new local causal discovery algorithm that leverages unshielded colliders to orient the treatment’s parents differently from existing methods. We show that there exist graphs where LDECC exponentially outperforms existing local discovery algorithms and vice versa. Moreover, we show that LDECC and existing algorithms rely on different faithfulness assumptions, leveraging this insight to weaken the assumptions for identifying the set of possible ATE values.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/gupta23b.html
https://proceedings.mlr.press/v213/gupta23b.htmlCan Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning?Causal confusion is a phenomenon where an agent learns a policy that reflects imperfect spurious correlations in the data. Such a policy may falsely appear to be optimal during training if most of the training data contain such spurious correlations. This phenomenon is particularly pronounced in domains such as robotics, with potentially large gaps between the open- and closed-loop performance of an agent. In such settings, causally confused models may appear to perform well according to open-loop metrics during training but fail catastrophically when deployed in the real world. In this paper, we study causal confusion in offline reinforcement learning. We investigate whether selectively sampling appropriate points from a dataset of demonstrations may enable offline reinforcement learning agents to disambiguate the underlying causal mechanisms of the environment, alleviate causal confusion in offline reinforcement learning, and produce a safer model for deployment. To answer this question, we consider a set of tailored offline reinforcement learning datasets that exhibit causal ambiguity and assess the ability of active sampling techniques to reduce causal confusion at evaluation. We provide empirical evidence that uniform and active sampling techniques are able to consistently reduce causal confusion as training progresses and that active sampling is able to do so significantly more efficiently than uniform sampling.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/gupta23a.html
https://proceedings.mlr.press/v213/gupta23a.htmlCausal Inference Despite Limited Global Confounding via Mixture ModelsA Bayesian Network is a directed acyclic graph (DAG) on a set of $n$ random variables (the vertices); a Bayesian Network Distribution (BND) is a probability distribution on the random variables that is Markovian on the graph. A finite $k$-mixture of such models is graphically represented by a larger graph which has an additional “hidden” (or “latent”) random variable $U$, ranging in $\{1,\ldots,k\}$, and a directed edge from $U$ to every other vertex. Models of this type are fundamental to causal inference, where $U$ models an unobserved confounding effect of multiple populations, obscuring the causal relationships in the observable DAG. By solving the mixture problem and recovering the joint probability distribution with $U$, traditionally unidentifiable causal relationships become identifiable. Using a reduction to the more well-studied “product” case on empty graphs, we give the first algorithm to learn mixtures of non-empty DAGs. Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/gordon23a.html
https://proceedings.mlr.press/v213/gordon23a.htmlEstimating long-term causal effects from short-term experiments and long-term observational data with unobserved confoundingUnderstanding and quantifying cause and effect relationships is an important problem in many domains. The generally-agreed standard solution to this problem is to perform a randomised controlled trial. However, even when randomised controlled trials can be performed, they usually have relatively short duration’s due to cost considerations. This makes learning long-term causal effects a very challenging task in practice, since the long-term outcome is only observed after a long delay. In this paper, we study the identification and estimation of long-term treatment effects when both experimental and observational data are available. Previous work provided an estimation strategy to determine long-term causal effects from such data regimes. However, this strategy only works if one assumes there are no unobserved confounders in the observational data. In this paper, we specifically address the challenging case where unmeasured confounders are present in the observational data. Our long-term causal effect estimator is obtained by combining regression residuals with short-term experimental outcomes in a specific manner to create an instrumental variable, which is then used to quantify the long-term causal effect through instrumental variable regression. We prove this estimator is unbiased, and analytically study its variance. Finally, we empirically test our approach on synthetic data, as well as real-data from the International Stroke Trial. Relevant source code and documentation has been made freely available in our \href{https://github.com/vangoffrier/UnConfounding}{online repository}.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/goffrier23a.html
https://proceedings.mlr.press/v213/goffrier23a.htmlCausal Discovery for Non-stationary Non-linear Time Series Data Using Just-In-Time ModelingCausal discovery from multivariate continuous time-series data is becoming more important as the amount of IoT data to analyze increases. However, it is not easy to identify the causal structure from such data using conventional linear causal discovery methods due to their non-stationary characteristics such as distribution shifts, and non-linearity of the system dynamics. The application of non-linear causal discovery methods is also generally limited, and there are still some problems such as their computational complexity, interpretability, and robustness for non-stationarity. To address these challenges, we propose a new causal discovery method JIT-LiNGAM, based on the Linear Non-Gaussian Acyclic Model (LiNGAM) and the Just-In-Time (JIT) framework, which is also called Lazy-Learning or Model-On-Demand. Our method estimates a local linear structural causal model from neighboring samples of the past data every time a new input sample is given. Approximating an inherently globally non-linear model with local linear models, we can benefit from high detection performance of causal relationship for non-linear and non-stationary data, improvements of interpretability of causal effects by linear expression, and reduced computational complexity. We formulate this algorithm based on Taylor’s theorem, and show effective neighbor selection algorithms by a simple experiment. The results of numerical experiments using artificial data with non-linearity and non-stationarity demonstrate the effectiveness of our method compared to representative methods for such data, under some general evaluation metrics.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/fujiwara23a.html
https://proceedings.mlr.press/v213/fujiwara23a.htmlDistinguishing Cause from Effect on Categorical Data: The Uniform Channel ModelDistinguishing cause from effect using observations of a pair of random variables is a core problem in causal discovery. Most approaches proposed for this task, namely additive noise models (ANM), are only adequate for quantitative data. We propose a criterion to address the cause-effect problem with categorical variables (living in sets with no meaningful order), inspired by seeing a conditional probability mass function (pmf) as a discrete memoryless channel. We select as the most likely causal direction the one in which the conditional pmf is closer to a uniform channel (UC). The rationale is that, in a UC, as in an ANM, the conditional entropy (of the effect given the cause) is independent of the cause distribution, in agreement with the principle of independence of cause and mechanism. Our approach, which we call the uniform channel model (UCM), thus extends the ANM rationale to categorical variables. To assess how close a conditional pmf (estimated from data) is to a UC, we use statistical testing, supported by a closed-form estimate of a UC channel. On the theoretical front, we prove identifiability of the UCM and show its equivalence with a structural causal model with a low-cardinality exogenous variable. Finally, the proposed method compares favorably with recent state-of-the-art alternatives in experiments on synthetic, benchmark, and real data.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/figueiredo23a.html
https://proceedings.mlr.press/v213/figueiredo23a.htmlBeyond the Markov Equivalence Class: Extending Causal Discovery under Latent ConfoundingIn this work, we show how to combine two popular paradigms for causal discovery from observational data in the presence of latent confounders in order to arrive at a much more informative causal model. Building on the seminal constraint-based causal discovery algorithm, FCI, we exploit the power of direct cause-effect pair identification to uncover new relationships, which can subsequently be propagated to find even more causal links in the rest of the model. This idea has been explored before, but until now always under the assumption of no latent confounders. Using our new causal direction criterion (CDC), we can finally drop this limitation. We derive inference rules for orienting additional cause-effect relations and show how to minimize the number of tests during the CDC search. In our experimental evaluations over a range of simulated data sets, the resulting FCI-CDC algorithm increases recall by between 5%-10% compared to vanilla FCI, without loss in precision.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/diepen23a.html
https://proceedings.mlr.press/v213/diepen23a.htmlBranch-Price-and-Cut for Causal Discovery We show how to extend the integer programming (IP) approach to score-based causal discovery by including pricing. Pricing allows the addition of new IP variables during solving, rather than requiring them all to be present initially. The dual values of acyclicity constraints allow this addition to be done in a principled way. We have extended the GOBNILP algorithm to effect a branch-price-and-cut method for DAG learning. Empirical results show that implementing a delayed pricing approach can be beneficial. The current pricing algorithm in GOBNILP is slow, so further work on fast pricing is required.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/cussens23a.html
https://proceedings.mlr.press/v213/cussens23a.htmlEnhancing Causal Discovery from Robot Sensor Data in Dynamic ScenariosIdentifying the main features and learning the causal relationships of a dynamic system from time-series of sensor data are key problems in many real-world robot applications. In this paper, we propose an extension of a state-of-the-art causal discovery method, PCMCI, embedding an additional feature-selection module based on transfer entropy. Starting from a prefixed set of variables, the new algorithm reconstructs the causal model of the observed system by considering only its main features and neglecting those deemed unnecessary for understanding the evolution of the system. We first validate the method on a toy problem and on synthetic data of brain network, for which the ground-truth models are available, and then on a real-world robotics scenario using a large-scale time-series dataset of human trajectories. The experiments demonstrate that our solution outperforms the previous state-of-the-art technique in terms of accuracy and computational efficiency, allowing better and faster causal discovery of meaningful models from robot sensor data.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/castri23a.html
https://proceedings.mlr.press/v213/castri23a.htmlCausal Models with ConstraintsCausal models have proven extremely useful in offering formal representations of causal relationships between a set of variables. Yet in many situations, there are non-causal relationships among variables. For example, we may want variables $LDL$, $HDL$, and $TOT$ that represent the level of low-density lipoprotein cholesterol, the level of high-density lipoprotein cholesterol, and total cholesterol level, with the relation $LDL+HDL = TOT$. This cannot be done in standard causal models, because we can intervene simultaneously on all three variables. The goal of this paper is to extend standard causal models to allow for constraints on settings of variables. Although the extension is relatively straightforward, to make it useful we have to define a new intervention operation that disconnects a variable from a causal equation. We give examples showing the usefulness of this extension, and provide a sound and complete axiomatization for causal models with constraints.Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/beckers23a.html
https://proceedings.mlr.press/v213/beckers23a.htmlLearning Conditional Granger Causal Temporal NetworksGranger-causality derived from observational time series data is used in many real-world applications where timely interventions are infeasible. However, discovering Granger-causal links in large temporal networks with a large number of nodes and time-lags can lead to <em>millions</em> of time-lagged model parameters, which requires us to make sparsity and overlap assumptions. In this paper, we propose to learn time-lagged model parameters with the objective of improving recall of links, while learning to defer predictions when the overlap assumption is violated over observed time series. By learning such conditional time-lagged models, we demonstrate a 25% increase in the area under the precision-recall curve for discovering Granger-causal links combined with a 18-25% improvement in forecasting accuracy across three popular and diverse datasets from different disciplines (DREAM3 gene expression, MoCAP human motion recognition and New York Times news-based stock price prediction) with correspondingly large temporal networks, over several baseline models including Multivariate Autoregression, Neural Granger Causality, Graph Neural Networks and Graph Attention models. The observed improvement in Granger-causal link discovery is significant and can potentially further improve prediction accuracy and modeling efficiency in downstream real-world applications leveraging these popular datasets. Thu, 10 Aug 2023 00:00:00 +0000
https://proceedings.mlr.press/v213/balashankar23a.html
https://proceedings.mlr.press/v213/balashankar23a.html