Chapter 6 Causal graphical models

A DAG can also encode causal information. Supposing that we intervene in our example from Section 5 to set the treatment \(A\) to a specific value, we simply delete the edges that are pointing to \(A\).

Definition 6.1 Let \(\mathcal{G}\) be a directed acyclic graph representing a causal system, and let \(p\) be a probability distribution over the variables \(X_V\). An intervention on a variable \(X_t\) (for \(t \in V\)) does two things:

graphically we represent this by removing edges pointing into \(t\) (i.e. of the form \(v \to t\));
probabilistically, we replace our usual factorization \[\begin{align*} p(x_V) = \prod_{v \in V} p(x_v \,|\,x_{\mathop{\mathrm{pa}}(v)}) \end{align*}\] with \[\begin{align*} p(x_{V\setminus \{t\}} \,|\,do(x_t)) &:= \frac{p(x_V)}{p(x_t \,|\,x_{\mathop{\mathrm{pa}}(t)})} = \prod_{v \in V \setminus \{t\}} p(x_v \,|\,x_{\mathop{\mathrm{pa}}(v)}). \end{align*}\]

For example, in the distribution relating to the graph in Figure 2.1(b)
we delete the factor corresponding to \(A\).
\[\begin{align*} p(w, z, x, a, y) &= p(w) \cdot p(z) \cdot p(x\,|\,z) \cdot p(a\,|\,w, z) \cdot p(y \,|\,x, a).\\ p(w, z, x, y \,|\,do(a)) &= p(w) \cdot p(z) \cdot p(x\,|\,z) \qquad \times \qquad p(y \,|\,x, a). \end{align*}\] All other factors are preserved, provided that the causal DAG is correctly specified.

The function \(p(\cdot \,|\,do(a))\) is just like any ordinary probability distribution, and obeys the same rules of conditioning and marginalization. In particular, we can define expectations in the usual way: \[\begin{align*} \mathbb{E}[Y \,|\,do(A=a)] := \sum_{y} y \cdot p(y \,|\,do(a)). \end{align*}\] Equipped with this distribution, we can now define the average treatment effect of \(A\) on \(Y\): \[\begin{align*} \mathop{\mathrm{ATE}}:= \mathbb{E}[Y \,|\,do(A=1)] - \mathbb{E}[Y \,|\,do(A=0)]. \end{align*}\] This is sometimes called the average causal effect (ACE) or the total effect.

We can also consider conditional quantities of this sort: for example, the conditional average treatment effect (CATE) given \(X=x\) is just \[\begin{align*} \mathop{\mathrm{CATE}}:= \mathbb{E}[Y \,|\,X=x, do(A=1)] - \mathbb{E}[Y \,|\,X=x, do(A=0)], \end{align*}\] defined analogously.