Chapter 22 Bayesian Additive Regression Trees
Another approach with strong ignorability is to learn the response surface, i.e. how \(Y\) responds to \(A\) for different levels of \({\boldsymbol X}\). Hill (2011) uses Bayesian Additive Regression Trees (BART) to estimate each of \[\begin{align*} f_a({\boldsymbol x}) &= \mathbb{E}[Y \mid A=a, {\boldsymbol x}], \qquad a\in \{0,1\}. \end{align*}\] The approach is essentially a Bayesian version of random forests, but with priors on nodes being terminal, that is the decisions stopping. At depth \(d\), prior probability of a node not being terminal could be \[\begin{align*} 0.95(1+d)^{-2}. \end{align*}\] The prior on the mean of the outcome is suggested to be normal: \[\begin{align*} \mathbb{E}[Y \mid A_j, m_{ij}] = \mu_{ij} \sim N(0, \sigma_\mu^2), \end{align*}\] which implicitly regularizes the model.
22.1 BART on our data
We can use BART with the bartCause package, again on the 401(k) data.
library(bartCause)
bart_out <- bartc(response = dat$net_tfa,
treatment = dat$e401, ## treatment must be 0,1
confounders = dat[,2:11],
estimand = "att") ## get ATT as estimand## fitting treatment model via method 'bart'
## fitting response model via method 'bart'
## Call:
## bartc(response = dat$net_tfa, treatment = dat$e401, confounders = dat[,
## 2:11], estimand = "att")
##
## Treatment effect (att, conditional average): -1804
The causal effect estimate on the 401(k) data is wildly variable; this may be to do with the default tuning parameters.
We should also remember to remove the BART object after use, as it’s really very big!
pryr::object_size(bart_out)
rm(bart_out)
The size of the objects created does mean that forest-based methods are relatively slower to give predictions than (e.g.) post-double selection.