Chapter 1 Introduction

The most important feature in the study of causal inference, is to note that you will only be able to obtain a causally interpretable answer if you start with a causally interpretable question! This is discussed in Section 1.1. From there, we suggest thinking about the ideal ‘target trial’ (Section 1.2), and trying to emulate such a trial from the observational data provided (1.3).

1.1 Statistical vs causal questions

The major difference between standard statistical inference and causal inference is the kind of questions that we try to answer. Statistics is typically concerned with either describing a system or predicting (or classifying) an observation. In causal inference the question will–either explicitly or implicitly–involve some sort of intervention that can, at least in principle, be modified.

As an example, the following two questions would fall into the category of descriptive or predictive:

  • “Is this patient at high risk of developing complications during surgery?”

  • “Is this patient suitable for surgery?”

In contrast, the following questions are causal in their nature:

  • “Which type of anaesthetic should this patient receive to minimise the risk of complications during surgery?”

  • “How does the amount of anaesthetic affect the risk of complications during surgery?”

  • “What can be done to reduce the risk of complications during surgery for an average / a particular type of patient?”

The first two questions make no reference to any sort of intervention while the final three all do, even if only implicitly. Figure 1.1 gives a cartoon illustrating the differences between statistical and causal inference: in the former, we are only interested in the relationship between the observed outcomes under the two settings; for causal inference, we are interested in what would have happened to (a subset of) the population if everyone had been treated compared with everyone given the control.

Illustration of the differences between associational and causal questions.  Adapted from @hernan25whatif

Figure 1.1: Illustration of the differences between associational and causal questions. Adapted from Hernán and Robins (2025)

1.2 Target Trials

The concept of the target trial was fully introduced by Hernán and Robins (2016), building on principles that were given in Hernán et al. (2008). The earlier work looked at an apparent discrepancy between randomized controlled trials—which seemed to show that hormone replacement therapy had a detrimental effect on cardiovascular outcomes for post-menopausal women—and observational studies—in which the effect was beneficial; Hernán et al. showed that, once the randomized trials were emulated in the observational data, both datasets showed that HRT has a detrimental effect.

The idea is to either take an ideal trial, or consider how you would estimate the causal effect of interest if you were able to run an ideal trial. This ideal trial is then emulated using the available data, which is usually observational (see Section 1.3). The five main things to consider can be remembered using the mnemonic PICOT.

  • Population: the group in which you wish to estimate the causal effect.

  • Intervention: a precise description of the treatment arm.

  • Comparison: what is the group with which you wish to compare the treatment arm?

  • Outcome: the response you intend to measure.

  • Time: how will you determine the start and end of follow-up?

We will also need to specify a causal effect that we wish to estimate. To make this more comcrete, we include Table 1.1, giving the TT specification used in the hormone replacement therapy example.

Table 1.1: An example of an analysis plan for a target trial, adapted from Table 1 of Hernán and Robins (2016).
Protocol element Description
Population Postmenopausal women within 5 years of menopause and with no history of cancer and no hormone therapy over past 2 years.
Intervention Initiate estrogen plus progestin hormone therapy at baseline and remain on it during the follow-up unless you are diagnosed with deep vein thrombosis, pulmonary embolism, myocardial infarction, or cancer.
Comparison Refrain from taking hormone therapy during the follow-up.
Outcome Breast cancer diagnosed by an oncologist within 5 years of baseline.
Time Starts at randomization and ends at diagnosis of breast cancer, death, loss to follow-up, or 5 years after baseline, whichever occurs first.

1.3 Target Trial Emulation

Having formulated the ideal trial, it remains to try to emulate it using the available (and generally observational) data. The first action is to apply the inclusion and exclusion criteria specified in the ideal trial to your observational data. Depending upon your data and the criteria you specify, you may end up with a very small number of observations. In this case, it is sensible to consider whether you could weaken your criteria to include more observations.

You should next determine which of the remaining individuals should be considered as control units, and which as treated units. Again, there may be some that do not qualify as either, and again this may need to be relaxed if the sample size gets too small.

Then consider how you will measure the outcome in your observational data. Often this is not possible to do exactly, and you may have to make do with (for example) a disease code being applied at a certain time.

Finally, think about how you will define the time zero (‘randomization’) in your treatment and control groups. This is generally particularly important in the control group, to avoid immortal time bias and prevalent user bias.

References

Hernán, Miguel A, Alvaro Alonso, Roger Logan, Francine Grodstein, Karin B Michels, Walter C Willett, JoAnn E Manson, and James M Robins. 2008. “Observational Studies Analyzed Like Randomized Experiments: An Application to Postmenopausal Hormone Therapy and Coronary Heart Disease.” Epidemiology 19 (6): 766–79.
Hernán, Miguel A, and James M Robins. 2016. “Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available.” American Journal of Epidemiology 183 (8): 758–64.
———. 2025. Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.