Chapter 21 Causal Forests

21.1 Random Forests

Random forests are a nonparametric regression method that have very good prediction properties. See the ‘two cultures’ paper of Breiman (2001).

These subsample the data \(B\) times, growing a tree for that subsample that predicts optimally, and then averaging the predictions made. The final estimate becomes: \[\begin{align*} \mu(z) &= \frac{1}{B} \sum_{b=1}^B \sum_{i=1}^n \frac{\mathbb{I}\{L_b(z_i) = L_b(z)\}}{|L_b(z_i)|} Y_i, \end{align*}\] where \(L_b(z)\) is the leaf of the \(b\)th tree that contains the covariates \(z\). The tree is grown by selecting a single variable and splitting it into two groups so as to maximize the heterogeneity between the two groups. Tuning parameters control the depth of the tree, and usually there is a minimum sample size in each leaf.

21.2 Causal Forests

Random forests maximize variation between two groups. For causal forests, the aim is to properly estimate treatment heterogeneity (Athey, Tibshirani, and Wager 2019; Athey and Wager 2019).

Split data into two parts, \(\mathcal{D}_{\text{th}},\mathcal{D}_{\text{eff}}\);
learn random forest models for the propensity score and outcome using \(\mathcal{D}_{\text{th}}\);
learn the expected outcome in each leaf using \(\mathcal{D}_{\text{eff}}\). \[\begin{align*} \hat\beta(z) &= \frac{\sum_{i=1}^n \alpha_i(z) \{Y_i - \hat{\mu}(Z_i)\}\{X_i - \hat{\pi}(Z_i)\}}{\sum_{i=1}^n \alpha_i(z)\{X_i - \hat{\pi}(Z_i)\}^2}. \end{align*}\] Because leave-one-out estimation is very cheap for random forests, we do not use the \(i\)th observation to predict the \(i\)th outcome and treatment.

21.3 Asymptotics

Suppose that forests are grown on subsamples of size \(s = n^\beta\), for \(\beta_{\min} < \beta < 1\), where \(\beta_{\min}\) is an expression involving parameters relating to the chosen splits.

Theorem 21.1 Under some regularity conditions, one can show that \(\hat\beta(z)\) is consistent for \(\beta(z)\) and \[\begin{align*} \sqrt{\frac{n}{s}}(\hat\beta(z) - \beta(z)) \to^p N(0, \sigma^2_n(x)). \end{align*}\] where \(\sigma^2_n(x) = \operatorname{polylog}(n/s)^{-1}\), the inverse of a function that is polynomial in \(\log(n/s)\) (and bounded below).

Note that this is a point-wise result. \(\sigma^2_n\) can be estimated from the fitted models.

21.4 R Implementation

The R package grf implements causal forests with causal_forest().

library(grf)
ho <- sample(nrow(dat2), floor(nrow(dat2)/10))
out <- causal_forest(X[-ho,], dat2$y[-ho], W=dat2$z[-ho])

Here we plot the true individual causal effect against the estimate from the causal forest.

21.5 Burn the forest

The objects created by the package are quite large, so we should remember to remove them when we’re done.

pryr::object_size(out)

## 96.63 MB

rm(out)

References

Athey, Susan, Julie Tibshirani, and Stefan Wager. 2019. “Generalized Random Forests.” Annals of Statistics 47 (2): 1148–78. https://doi.org/https://doi.org/10.1214/18-AOS1709.

Athey, Susan, and Stefan Wager. 2019. “Estimating Treatment Effects with Causal Forests: An Application.” Observational Studies 5 (2): 37–51. https://doi.org/https://doi.org/10.1353/obs.2019.0001.

Breiman, Leo. 2001. “Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author).” Statistical Science 16 (3): 199–231. https://doi.org/10.1214/ss/1009213726.