Chapter 21 Causal Forests
21.1 Random Forests
Random forests are a nonparametric regression method that have very good prediction properties. See the ‘two cultures’ paper of Breiman (2001).
These subsample the data \(B\) times, growing a tree for that subsample that predicts optimally, and then averaging the predictions made. The final estimate becomes: \[\begin{align*} \mu(z) &= \frac{1}{B} \sum_{b=1}^B \sum_{i=1}^n \frac{\mathbb{I}\{L_b(z_i) = L_b(z)\}}{|L_b(z_i)|} Y_i, \end{align*}\] where \(L_b(z)\) is the leaf of the \(b\)th tree that contains the covariates \(z\). The tree is grown by selecting a single variable and splitting it into two groups so as to maximize the heterogeneity between the two groups. Tuning parameters control the depth of the tree, and usually there is a minimum sample size in each leaf.
21.2 Causal Forests
Random forests maximize variation between two groups. For causal forests, the aim is to properly estimate treatment heterogeneity (Athey, Tibshirani, and Wager 2019; Athey and Wager 2019).
- Split data into two parts, \(\mathcal{D}_{\text{th}},\mathcal{D}_{\text{eff}}\); 
- learn random forest models for the propensity score and outcome using \(\mathcal{D}_{\text{th}}\); 
- learn the expected outcome in each leaf using \(\mathcal{D}_{\text{eff}}\). \[\begin{align*} \hat\beta(z) &= \frac{\sum_{i=1}^n \alpha_i(z) \{Y_i - \hat{\mu}(Z_i)\}\{X_i - \hat{\pi}(Z_i)\}}{\sum_{i=1}^n \alpha_i(z)\{X_i - \hat{\pi}(Z_i)\}^2}. \end{align*}\] Because leave-one-out estimation is very cheap for random forests, we do not use the \(i\)th observation to predict the \(i\)th outcome and treatment. 
21.3 Asymptotics
Suppose that forests are grown on subsamples of size \(s = n^\beta\), for \(\beta_{\min} < \beta < 1\), where \(\beta_{\min}\) is an expression involving parameters relating to the chosen splits.
Theorem 21.1 Under some regularity conditions, one can show that \(\hat\beta(z)\) is consistent for \(\beta(z)\) and \[\begin{align*} \sqrt{\frac{n}{s}}(\hat\beta(z) - \beta(z)) \to^p N(0, \sigma^2_n(x)). \end{align*}\] where \(\sigma^2_n(x) = \operatorname{polylog}(n/s)^{-1}\), the inverse of a function that is polynomial in \(\log(n/s)\) (and bounded below).
Note that this is a point-wise result. \(\sigma^2_n\) can be estimated from the fitted models.
 Here we plot the true individual causal effect against the estimate
from the causal forest.
Here we plot the true individual causal effect against the estimate
from the causal forest.