Chapter 13 Covariate adjustment

Covariate adjustment is one of the most basic methods for removing confounding effects from our causal estimates. This is synonymous with ‘controlling for’, and ‘standardizing’.

The idea is that, if some pre-treatment variable \(X\) is correlated with both your treatment and your outcome, it will induce a spurious dependence if it is not accounted for.

13.1 Parent sets

From Section 6 we know that in a causal DAG we have \[\begin{align*} p(y \,|\, do(a)) = \sum_{x_{\mathop{\mathrm{pa}}}} p(x_{\mathop{\mathrm{pa}}}) \cdot p(y \,|\, a, x_{\mathop{\mathrm{pa}}}), \end{align*}\] where \(\mathop{\mathrm{pa}}:= \mathop{\mathrm{pa}}_\mathcal{G}(a)\). This shows that the set of parents of the particular treatment is a valid adjustment set (see Definition 13.1). Unfortunately, it is also the least efficient adjustment set that we can use, for reasons that are explained in Section 14. Work by Pearl (1993) shows that a much larger collection of potential ‘adjustment sets’ can be used.

13.2 Back-door adjustment

Definition 13.1 We say that \(C\) is a valid adjustment set for the ordered pair \((A,Y)\) if \[\begin{align*} p(y \,|\, do(a)) = \sum_{x_C} p(x_C) \cdot p(y \,|\, a, x_C). \end{align*}\]

Note that we have already shown that the set of parents of \(A\) always constitute a valid adjustment set. We will extend this to more general back-door adjustment sets.

Definition 13.2 We say that \(C\) is a back-door adjustment set for the ordered pair \((A,Y)\) if

  1. \(C\) blocks all back-door paths from \(A\) to \(Y\);

  2. \(C\) does not contain any descendants of \(A\).

Note that \(C = \mathop{\mathrm{pa}}(a)\) is a back-door adjustment set, since all back-door paths have as their first internal vertex a parent of \(A\) as a non-collider.

Lemma 13.1 If \(C\) is a back-door adjustment set for \((A,Y)\) then:

  1. \(C \perp_d A\mid \mathop{\mathrm{pa}}(A)\);

  2. \(Y \perp_d \mathop{\mathrm{pa}}(A) \mid C, A\).

Proof. Since no vertex in \(C\) is a descendant of \(a\), we have that \(X_v \mathbin{\perp\hspace{-3.2mm}\perp}X_C \mid X_{\mathop{\mathrm{pa}}(a)}\) using the local Markov property. We also claim that \(Y\) is d-separated from \(\mathop{\mathrm{pa}}_\mathcal{G}(a)\) by \(C \cup \{a\}\).

To see this, suppose for contradition that there is an open path \(\pi\) from \(Y\) to some \(t \in \mathop{\mathrm{pa}}_\mathcal{G}(a)\) given \(C \cup \{a\}\). If \(\pi\) is also open given \(C\), then we can add the edge \(t \to a\) to find an open path from \(Y\) to \(a\). If \(\pi\) is not open given \(C\), this can only be because there is a collider \(s\) on \(\pi\) that is an ancestor of \(a\) but not of \(C\); hence there is a directed path from \(s\) to \(a\) that does not contain any element of \(C\). In this case, simply concatenate the path from \(Y\) to \(s\) with this directed path (shortening if necessary) to obtain an open path from \(Y\) to \(a\). Either way we obtain a path from \(Y\) to \(a\) that is open given \(C\) and ends \(\to a\), which contradicts our assumptions.

We conclude that \(Y\) is d-separated from \(\mathop{\mathrm{pa}}_\mathcal{G}(a)\) by \(C \cup \{a\}\), and hence the global Markov property implies that \(X_w \mathbin{\perp\hspace{-3.2mm}\perp}X_{\mathop{\mathrm{pa}}(a)} \mid A, X_C\).

Theorem 13.1 If \(C\) is a back-door adjustment set then it is also a valid adjustment set.

Proof. For the purposes of the proof we assume that \(C \cap \mathop{\mathrm{pa}}(a) = \emptyset\); the extension to the general case is conceptually simple but notationally annoying. We therefore have: \[\begin{align*} p(y \,|\, do(a)) &= \sum_{x_{\mathop{\mathrm{pa}}(a)}} p(x_{\mathop{\mathrm{pa}}(a)}) \cdot p(y \mid a, x_{\mathop{\mathrm{pa}}(a)})\\ % &= \sum_{x_{\pa(\t)}} p(x_{\pa(\t)}) \sum_{x_C} p(y, x_C \mid \t, x_{\pa(\t)})\\ &= \sum_{x_{\mathop{\mathrm{pa}}(a)}} p(x_{\mathop{\mathrm{pa}}(a)}) \sum_{x_C} p(y \mid x_C, a, x_{\mathop{\mathrm{pa}}(a)}) \cdot p(x_C \mid a, x_{\mathop{\mathrm{pa}}(a)})\\ &= \sum_{x_{\mathop{\mathrm{pa}}(a)}} p(x_{\mathop{\mathrm{pa}}(a)}) \sum_{x_C} p(y \mid x_C, a) \cdot p(x_C \mid x_{\mathop{\mathrm{pa}}(a)})\\ &= \sum_{x_C} p(y \mid x_C, a) \sum_{x_{\mathop{\mathrm{pa}}(a)}} p(x_{\mathop{\mathrm{pa}}(a)}) \cdot p(x_C \mid x_{\mathop{\mathrm{pa}}(a)})\\ &= \sum_{x_C} p(x_C) \cdot p(y \mid a, x_C), \end{align*}\] where the implications of Lemma 13.1 are used in the third equality.

13.3 Examples

Consider the causal DAG shown in Figure 13.1, and suppose we are interested in the causal effect of \(A\) on \(Y\). There is one back-door path, highlighted in red; this path must be blocked, so we have to condition on either \(Z\) or \(X\) or both. We must not condition on \(A\), \(M\) or \(Y\), since these would block part of the causal effect and hence would induce bias. For similar reasons we should avoid including \(K\) or \(R\), as this will have the effect of partially blocking the causal path. We are free to condition on any other subset of \(\{W,S,L\}\). In addition, we can also include \(D\) if we want to, though this would not be a back-door adjusment set.

In total this means that there are 24 back-door adjustment sets in this graph, consisting of \(\{Z\}\), \(\{X\}\) and \(\{Z,X\}\) together with the eight subsets of \(\{W,S,L\}\). We will see that the ‘optimal’ adjustment set involves taking \(X\) and then including \(S\) and \(L\), but not \(W\).

A causal directed graph.

Figure 13.1: A causal directed graph.

References

Pearl, Judea. 1993. “Comment: Graphical Models, Causality and Intervention.” Statistical Science 8 (3): 266–69.