More than averages: using causal quartets to illustrate variability

More than averages: using causal quartets to illustrate variability

Article Title: Causal Quartets: Different Ways to Attain the Same Average Treatment Effects [DOI:10.1080/00031305.2023.2267597]
Authors & Year: A. Gelman, J. Hullman, and L. Kennedy (2023)
Journal: The American Statistician
Review Prepared by Peter A. Gao 

Causal inference research commonly focuses on estimation of average treatment effects: In a target population, what is the difference in mean outcomes between individuals who receive the treatment and individuals who receive a control? For example, imagine an experiment investigating whether limiting daily phone usage improves academic performance among high school students. Subjects are randomly sorted into a treatment group (limited to one hour of phone time daily) or a control group (unrestricted) and over the course of a semester, their academic performance is measured using exams. In this case, the average treatment effect is simply the average exam score of the treated students minus the average score of the control students. If this effect is large and positive, then the results may suggest that limiting phone usage causes improved academic performance. 

In some cases, this focus on average treatment effects can obscure meaningful variability within the target population. Traditional significance tests are often used to assess null hypotheses under which the average treatment effects are exactly zero, but statistical significance may not be meaningful if, for example, a treatment may work exceptionally well for some individuals but have no effect or even the inverse effect for others. In their recent paper, “Causal Quartets: Different Ways to Attain the Same Average Treatment,” researchers Andrew Gelman, Jessica Hullman, and Lauren Kennedy show the same average treatment effect can arise under many different scenarios, each with diverging implications. Drawing upon Anscombe’s quartet, a collection of four datasets with identical descriptive statistics but different distributions, they propose a new set of visualizations called causal quartets, which illustrate different ways to produce the same average treatment effect. They also provide an R package which can be used to reproduce the article’s figures. The below image illustrates an example similar to the original paper’s Figure 1.

Four scatter plots show causal effects for participants 1 to 20 that received treatment x. (a) All have the same slightly positive effect )0.2), (b) minor variation from 0 to 0.3, (c) larger dispersion with values from -.2 to 0.6, and (d) almost all zero except 4 at 0.6.
Example causal quartet illustrating four scenarios yielding the same average treatment effect: (a) homogeneous individual-level treatment effects; (b) effects that vary but are always positive; (c) effects that vary in magnitude and sign; (d) effects that are usually zero but are large for a small subgroup.

This quartet illustrates four different scenarios with the same average treatment effect. In each panel, each point represents an individual-level treatment effect: the difference in outcomes for an individual if they receive the treatment instead of receiving the control. In practice, individual-level effects are unobservable: each individual is either assigned to the treatment or control, so this difference cannot be calculated. The quartet represents four hypothetical scenarios that can be used as a helpful conceptual tool for thinking through effect heterogeneity. Panel (a) illustrates the simple case in which the treatment effect is the same across all individuals. Panel (b) shows treatment effects that are always positive but of varying magnitude. Panel (c) depicts high variation with some negative individual-level effects. Finally, (d) shows a scenario in which the treatment effect is usually zero, but is high for a few individuals, representing a situation in which the treatment is only effective for a small subgroup. In a real data setting, the researcher may not know which, if any, of these scenarios is appropriate for the observed data. However, considering these possibilities may help them interpret their results.

Since individual effects are unobserved, causal quartets are useful primarily for thinking through possible forms of effect heterogeneity. In some cases, treatment effects may be expected to be highly variable: interventions in childhood may impact subsequent income, but these effects are likely to be complicated and heterogeneous. In other cases, treatments may be expected to work on only a subset of the population: limiting screen time will have little impact on students who rarely use their phones. Identifying potential scenarios can aid in interpretation of results and subsequent policy design. For outcomes that are known to be highly variable (i.e. income), a scenario like the one presented in Panel (c) may be likely, meaning that even an intervention with a large average treatment effect may not be universally effective. In this vein, understanding these scenarios can also help researchers assess the generalizability of their results. Similarly, thinking through these scenarios may help policymakers develop plans for implementation. For example, a proposed intervention may have a large average treatment effect, but if it only works for a small proportion of the population, it may be wise to spend additional resources to target these individuals. Though data may be insufficient to identify conclusively which scenario is correct, using prior knowledge and domain expertise can narrow the possibilities.