Home

Papers

Abstracts

Schedule

Call For Papers

Registration

Important Dates

Abstracts

Invited Talks

Causal modeling: An application in HIV vaccine design

David Heckerman

Since the early 1980s, researchers have been trying to create a vaccine that protects against infection by HIV. To date, they have been unsuccessful. I will discuss some of the reasons for this, and show how causal modeling is helping to forge a path to a hopefully effective vaccine.

Slides


From dynamical systems to causal models

Joris Mooij

Ordinary differential equations (ODEs) are a very popular and extremely useful mathematical modeling tool in many applied sciences (e.g., biology, chemistry, physics, etc.). ODEs are usually not thought of as causal models. On the other hand, structural equation models (SEMs), a different mathematical modeling framework mostly applied in the social and economical sciences, are usually interpreted causally. In this talk, I will show that these apparently different modeling frameworks are actually quite closely related. The main result is that under certain conditions, equilibrium distributions of ODE systems can be directly mapped onto structural equation models, preserving the right semantics under interventions. This result sheds more light on the nature of SEMs, in particular in cases where causal feedback is present. It also shows that SEMs can be used as an alternative to ODEs when time series data is absent.

Slides

Accepted for Presentation

Identifiability of binary directed graphical models with hidden variables

Elizabeth Allman, John Rhodes, Elena Stanghellini, and Marco Valtorta

Whether parameters of a DAG model with hidden variables can be identified is a difficult question. Here we give algebraic arguments establishing identifiability for two special DAG models with certain restrictions on the size of the finite state spaces of all variables. These results can be used to shed light on many other models. As an illustration, we address identifiability for all binary DAG models with at most five nodes and a single hidden variable parental to all observable ones.

Paper, Slides


Discovering Cyclic Causal Models with Latent Variables: A General SAT-Based Procedure

Antti Hyttinen, Patrik O. Hoyer, Frederick Eberhardt, Matti Järvisalo

We present a very general approach to learning the structure of causal models based on d-separation constraints, obtained from any given set of overlapping passive observational or experimental data sets. The procedure allows for both directed cycles (feedback loops) and the presence of latent variables. Our approach is based on a logical representation of causal pathways, which permits the integration of quite general background knowledge, and inference is performed using a Boolean satisfiability (SAT) solver. The procedure is complete in that it exhausts the available information on whether any given edge can be determined to be present or absent, and returns "unknown" otherwise. Many existing constraint-based causal discovery algorithms can be seen as special cases, tailored to circumstances in which one or more restricting assumptions apply. Simulations illustrate the effect of these assumptions on discovery and how the present algorithm scales.

Paper, Slides


A finite population test of the sharp null hypothesis for Compliers

Wen Wei Loh and Thomas Richardson

In a randomized experiment with non-compliance, testing whether treatment exposure X has an effect on the final response Y is often of scientific interest. We propose a finite-population permutation-based test of the null hypothesis that X has no effect on Y within Compliers. Our method builds on tests for principal stratum direct effects described in (Nolen and Hudgens, 2011).

Paper


Reasoning about Independence in Probabilistic Models of Relational Data

Marc Maier, Katerina Marazopoulou, and David Jensen

The rules of d-separation provide a theoretical and algorithmic framework for deriving conditional independence facts from model structure. However, this theory only applies to Bayesian networks. Many real-world systems are characterized by interacting heterogeneous entities and probabilistic dependencies that cross the boundaries of entities. Consequently, researchers have developed extensions to Bayesian networks that can represent these relational dependencies. We show that the theory of d-separation inaccurately infers conditional independence when applied directly to the structure of probabilistic models of relational data. We introduce relational d-separation, a theory for deriving conditional independence facts from relational models, and we provide a new representation, the abstract ground graph, that enables a sound, complete, and computationally efficient method for answering d-separation queries about relational models.

Paper, Slides


A Sound and Complete Algorithm for Learning Causal Models from Relational Data

Marc Maier, Katerina Marazopoulou, David Arbour, and David Jensen

The PC algorithm learns maximally oriented causal Bayesian networks. However, there is no equivalent complete algorithm for learning the structure of relational models, an expressive generalization of Bayesian networks. Recent developments in the theory and representation of relational models support lifted reasoning about conditional independence. This enables a powerful constraint for orienting bivariate dependencies and forms the basis of a new structure learning algorithm. We develop the relational causal discovery (RCD) algorithm that learns causal relational models. We prove that RCD is sound and complete, and we present empirical results that demonstrate effectiveness.

Paper, Slides


Maximum Likelihood estimation of structural nested logistic model with an instrumental variable

Roland Matsouaka and Eric Tchetgen Tchetgen

Methodology for obtaining inferences in the presence of unmeasured confounding about the parameters of an additive or multiplicative structural nested mean model (SNMM) through G-estimation is well developed in the causal inference literature. Unfortunately, a similar semiparametric approach does not exist for the logistic SNMM for binary outcome. To address this challenge, estimating equation approaches have recently been proposed as possible solutions but they either rely heavily on "uncongenial" modeling assumptions or involve having to solve an integral equation numerically for each observation at each step of solving the estimating equation. These serious drawbacks have impeded widespread use of the methods. Here, we present an alternative parametrization of the likelihood function corresponding to a logistic SNMM that circumvents computational complexity of existing methods while ensuring a congenial parametrization of the model. We use the likelihood approach to estimate the causal effect of an exposure of a binary, discrete or continuous nature, and we provide a goodness-of-fit test for evaluating parametric assumptions made by the model. Our method can be easily implemented using most standard statistical software, and are illustrated via a simulation study and a data application.

Paper, Slides


Single World Intervention Graphs: A Primer

Thomas Richardson and James Robins

We present a simple graphical theory unifying causal directed acyclic graphs (DAGs) and potential (aka counterfactual) outcomes via a node-splitting transformation. We introduce a new graph, the Single-World Intervention Graph (SWIG). The SWIG encodes the counterfactual independences associated with a specific hypothetical intervention on the set of treatment variables. The nodes on the SWIG are the corresponding counterfactual random variables. We illustrate the theory with a number of examples. Our graphical theory of SWIGs may be used to infer the counterfactual independence relations that hold among the SWIG under the NPSEM-IE model of Pearl (2000, 2009). Furthermore, in the absence of hidden variables, the joint distribution of the counterfactuals is identified; the identifying formula is the extended g-computation formula introduced in Robins (2004). As an illustration of the benefit of reasoning with SWIGs, we use SWIGs to correct an error regarding Example 11.3.3 presented in Pearl (2009).

Paper


Accepted for Posters

Scoring and Searching over Bayesian Networks with Informative, Causal and Associative Priors

Giorgos Borboudakis and Ioannis Tsamardinos

A significant theoretical advantage of search-and-score methods for learning Bayesian Networks is that they can accept informative prior beliefs for each possible network, thus complementing the data. In this paper, a method is presented for assigning priors based on beliefs on the presence or absence of certain paths in the true network. Such beliefs correspond to knowledge about the possible causal and associative relations between pairs of variables. This type of knowledge naturally arises from prior experimental and observational data, among others. In addition, a novel search-operator is proposed to take advantage of such prior knowledge. Experiments show that, using path beliefs improves the learning of the skeleton, as well as the edge directions in the network.

Paper


Learning Sparse Causal Models is not NP-hard

Tom Claasen, Joris Mooij and Tom Heskes

This paper shows that causal model discovery is not an NP-hard problem, in the sense that for sparse graphs bounded by node degree k the sound and complete causal model can be obtained in worst case order N2(k+2) independence tests, even when latent variables and selection bias may be present. We present a modification of the well-known FCI algorithm that implements the method for an independence oracle, and suggest further improvements for sample/real-world data versions. It does not contradict any known hardness results, and does not solve an NP-hard problem: it just proves that sparse causal discovery is perhaps more complicated, but not as hard as learning minimal Bayesian networks.

Paper


Why am I stuck? Causal Logic Models for Token-Level Causal Reasoning

Denver Dash, Mark Voortman and Martijn de Jongh

We present a new approach to token-level causal reasoning that we call Causal Logic Models (CLMs). A CLM is a first-order representation that allows one to produce token-level causal explanations or predictions in domains which are too vast to generate a complete causal model. CLMs produce explanations/predictions as a dynamic sequence of mechanisms (SoMs) that chain together to propagate causal influence through time. We argue that explanations/predictions of this form are more fundamental than explanations that involve likely states of variables in conjunction with a complete causal model. We compare this approach to the causal explanations of Halpern and Pearl [2005], and show that even on relatively simple real-world physical systems, their method of generating explanations can quickly become intractable. We argue that the SoMs approach is qualitatively closer to the human causal reasoning process, and that for many real problems in AI such as diagnosing why a robot is stuck, CLMs provide more tractable and informative explanations.

Paper


Bayesian Learning in Bayesian Networks of Moderate Size by Efficient Sampling

Ru He and Jin Tian

We study the Bayesian model averaging approach to learning Bayesian network structures (DAGs) from data. We develop new algorithms including the first algorithm that is able to efficiently sample DAGs according to the exact structure posterior. The DAG samples can then be used to construct the estimators for the posterior of any feature. Our estimators have several good properties; for example, unlike the existing MCMC-based algorithms, quality guarantee can be provided for our estimators when assuming the order-modular prior. We empirically show that our algorithms considerably outperform previous state-of-the-art methods.

Paper


Sparse Nested Markov Models with Log-linear Parameters

Ilya Shpitser, Robin Evans, Thomas Richardson and James Robins

Hidden variables are ubiquitous in practical data analysis, and therefore modeling marginal densities and doing inference with the resulting models is an important problem in statistics, machine learning, and causal inference. Recently, a new type of graphical model, called the nested Markov model, was developed which captures equality constraints found in marginals of directed acyclic graph (DAG) models. Some of these constraints, such as the so called 'Verma constraint', strictly generalize conditional independence. To make modeling and inference with nested Markov models practical, it is necessary to limit the number of parameters in the model, while still correctly capturing the constraints in the marginal of a DAG model. Placing such limits is similar in spirit to sparsity methods for undirected graphical models, and regression models. In this paper, we give a log-linear parameterization which allows sparse modeling with nested Markov models. We illustrate the advantages of this parameterization with a simulation study.

Paper


Student Posters

Maximum Likelihood Estimation in Cyclic Linear Gaussian Models with Correlated Errors

Christopher Fox and Mathias Drton

Linear structural equation models relate the variables of interest through linear functions with Gaussian noise. These models are in wide use because of their natural causal interpretations and close connection with Gaussian graphical models. Maximum likelihood estimation is an important problem in linear structural equation modeling. Drton et. al., 2009 introduce an algorithm, known as Residual Iterative Conditional Fitting (RICF), for computing maximum likelihood estimates in a subclass of recursive linear models. We generalize this algorithm to facilitate maximum likelihood estimation in cyclic linear models. In contrast to RICF, the extended algorithm involves the maximization of fractional quadratic functions, which is achieved via the use of QR-decompositions.


Identifying Finite Mixtures of Nonparametric Product Distributions and Causal Inference of Confounders

Eleni Sgouritsa, Dominik Janzing, Jonas Peters and Bernhard Schölkopf

We propose a kernel method to identify finite mixtures of nonparametric product distributions. It is based on a Hilbert space embedding of the joint distribution. The rank of the constructed tensor is equal to the number of mixture components. We present an algorithm to recover the components by partitioning the data points into clusters such that the variables are jointly conditionally independent given the cluster. This method can be used to identify finite confounders.