Chapter 22 Bayesian Additive Regression Trees

Another approach with strong ignorability is to learn the response surface, i.e. how \(Y\) responds to \(A\) for different levels of \({\boldsymbol X}\). Hill (2011) uses Bayesian Additive Regression Trees (BART) to estimate each of \[\begin{align*} f_a({\boldsymbol x}) &= \mathbb{E}[Y \mid A=a, {\boldsymbol x}], \qquad a\in \{0,1\}. \end{align*}\] The approach is essentially a Bayesian version of random forests, but with priors on nodes being terminal, that is the decisions stopping. At depth \(d\), prior probability of a node not being terminal could be \[\begin{align*} 0.95(1+d)^{-2}. \end{align*}\] The prior on the mean of the outcome is suggested to be normal: \[\begin{align*} \mathbb{E}[Y \mid A_j, m_{ij}] = \mu_{ij} \sim N(0, \sigma_\mu^2), \end{align*}\] which implicitly regularizes the model.

22.1 BART on our data

We can use BART with the bartCause package, again on the 401(k) data.

library(bartCause)
bart_out <- bartc(response = dat$net_tfa, 
                  treatment = dat$e401, ## treatment must be 0,1
                  confounders = dat[,2:11],
                  estimand = "att")  ## get ATT as estimand
## fitting treatment model via method 'bart'
## fitting response model via method 'bart'
bart_out
## Call:
## bartc(response = dat$net_tfa, treatment = dat$e401, confounders = dat[, 
##       2:11], estimand = "att")
## 
## Treatment effect (att, conditional average): -1804

The causal effect estimate on the 401(k) data is wildly variable; this may be to do with the default tuning parameters.

We should also remember to remove the BART object after use, as it’s really very big!

pryr::object_size(bart_out)

rm(bart_out)

The size of the objects created does mean that forest-based methods are relatively slower to give predictions than (e.g.) post-double selection.

References

Hill, Jennifer L. 2011. “Bayesian Nonparametric Modeling for Causal Inference.” Journal of Computational and Graphical Statistics 20 (1): 217–40.