Probability for Machine Learning Seminar

Causal modelling with distribution embeddings: treatment effects, counterfactuals, mediation, and proxies

A fundamental causal modelling task is to predict the effect of an intervention (or treatment) D=d on outcome Y in the presence of observed covariates X. As a common example from medical science, treatment might be a particular medicine, covariates might be relevant patient information (blood pressure, cholesterol levels, age), and outcome is whether a cure is achieved.

We can estimate the average effect of a treatment from observations of historical data, by marginalising our estimate of the conditional mean E(Y|X,D) over  P(X): for instance, what is the average probability of being cured if we give Medicine A to patients with covariate distribution P(X)? More complex causal questions require taking conditional expectations. For instance, the average treatment on the treated (ATT) addresses a counterfactual: what is the outcome of an intervention d' on the subpopulation that received treatment d (for instance, for the population that received Medicine A,  what would have happened had we given them Medicine B)?  Or we might be interested in the Conditional Average Treatment Effect: what is the average effect of Medicine A for patients with certain blood pressure readings? Finally, we might be interested in the case that covariates are not observed, but indirect "proxy" covariate information is known.
 
We address these questions in the nonparametric setting using both kernel and NN methods, which apply for very general treatments D and covariates X (the presentation will focus primarily on the kernel case for simplicity). We provide strong statistical guarantees under general smoothness assumptions, and a straightforward and robust implementation (a few lines of code). The method is mostly demonstrated by addressing causal modelling questions arising from the US Job Corps program for Disadvantaged Youth.

Relevant papers:

Generalized Kernel Ridge Regression for Nonparametric Structural Functions...
https://arxiv.org/abs/2010.04855

Kernel Methods for Multistage Causal Inference...  
https://arxiv.org/abs/2111.03950

Proximal Causal Learning with Kernels...  (ICML 21)
https://arxiv.org/abs/2105.04544

Deep Proxy Causal Learning and its Application.... (NeurIPS 21)
https://arxiv.org/abs/2106.03907

Arthur Gretton is a Professor with the Gatsby Computational Neuroscience Unit, and director of the Centre for Computational Statistics and Machine Learning (CSML) at UCL. He received degrees in Physics and Systems Engineering from the Australian National University, and a PhD with Microsoft Research and the Signal Processing and Communications Laboratory at the University of Cambridge. He previously worked at the MPI for Biological Cybernetics, and at the Machine Learning Department, Carnegie Mellon University.
Arthur's recent research interests in machine learning include the design and training of generative models, both implicit (e.g. GANs) and explicit (exponential family and energy-based models), nonparametric hypothesis testing, survival analysis, causality, and kernel methods.

He has been an associate editor at IEEE Transactions on Pattern Analysis and Machine Intelligence from 2009 to 2013, an Action Editor for JMLR since April 2013, an Area Chair for NeurIPS in 2008 and 2009, a Senior Area Chair for NeurIPS in 2018 and 2021, an Area Chair for ICML in 2011 and 2012, a member of the COLT Program Committee in 2013, and a member of Royal Statistical Society Research Section Committee since January 2020. Arthur was program chair for AISTATS in 2016 (with Christian Robert), tutorials chair for ICML 2018 (with Ruslan Salakhutdinov), workshops chair for ICML 2019 (with Honglak Lee), program chair for the Dali workshop in 2019 (with Krikamol Muandet and Shakir Mohammed), and co-organsier of the Machine Learning Summer School 2019 in London (with Marc Deisenroth).