Chapter 23 Meta-Learners
A meta-learner is a generic algorithm for learning a causal effect.
S-learner: (S=‘single’). Learn a (flexible) regression model for \(Y | {\boldsymbol X}, T\).
T-learner: (T=‘two’). Learn separate (flexible) regression models for \(Y | {\boldsymbol X}, T=0\) and \(Y | {\boldsymbol X}, T=1\).
X-learner: (X=‘cross’). Start as for the T-learner, but then estimate ITEs as \[\begin{align*} D_i &= \hat\mu_1({\boldsymbol X}_i) - Y_i & D_i &= Y_i - \hat\mu_0({\boldsymbol X}_i) \end{align*}\] for control and treated individuals respectively. Use these to build separate ITE models (\(\tau_a\)), and then write \[\begin{align*} \mathop{\mathrm{CATE}}({\boldsymbol x}) &= g({\boldsymbol x}) \cdot \tau_0({\boldsymbol x}) + (1-g({\boldsymbol x})) \cdot \tau_1({\boldsymbol x}) \end{align*}\] for an arbitrary weight function \(g\). A good choice is an estimate of the propensity score (Künzel et al. 2019).
R-learner: (R=‘Residual’ from Robinson (1988), also c.f. Frisch-Waugh-Lovell Theorem idea)
First regress both \(A\) and \(Y\) on \({\boldsymbol X}\), and compute residuals \(R_A\), \(R_Y\).
Then regress \(R_Y\) on \(R_A\) using (e.g.) least squares (Nie and Wager 2021).
DR-learner: (DR=‘double robust’, Kennedy (2023))
First use part of sample to estimate nuisance parameters \(\pi, \mu_0, \mu_1\).
Now estimate pseudo-outcome \[\begin{align*} \widetilde{Y} &= \frac{A- \hat{\pi}({\boldsymbol X})}{\hat{\pi}({\boldsymbol X}) \cdot (1-\hat{\pi}({\boldsymbol X}))} \left\{Y - \mu_A({\boldsymbol X})\right\} + \mu_1({\boldsymbol X}) - \mu_0({\boldsymbol X}), \end{align*}\] in the remainder, and regress on covariates to obtain \(\widehat\mathbb{E}[\widetilde{Y} \mid {\boldsymbol X}={\boldsymbol x}]\).