Why only one treatment at a time?

I remember learning in Gary King’s Gov 2001 class back in the fall of 2004 (!) that we should really only ask about the effect of one cause at a time. That is, having run our analysis we should not expect to be able to infer the effect of more than one variable from it. I remember not really seeing why this was the case: why can’t we run a regression and learn about the effect of more than one of the variables? I thought I would record here some of my current thoughts on this question.

Let me start by noting that we can infer the effects of more than one cause at a time given the right dataset. Suppose we have run a conjoint experiment, i.e. we have conducted a survey in which respondents are asked to compare or evaluate items (political candidates, products, reform proposals) whose attributes are jointly randomly varied. (So, one respondent is asked to choose her preferred candidate between a young female banker and a middle-aged female teacher; another is asked to choose between a young female lawyer and an old male banker; etc.) Given this data, we can run a regression of the outcome (the respondents’ choices) on the attributes they were shown, and the coefficients on the attributes will tell us the effect of each treatment averaged over the joint distribution of the other treatments. (We can also estimate interaction effects, but that is more in the spirit of “subgroup analysis” in designs that are focused on only one treatment.) The randomization of a given attribute makes it possible to straightforwardly measure the effect of that attribute; it shouldn’t be surprising that we can randomize more than one attribute at once and thus simultaneously measure more than one effect at once.

In some very lucky circumstances we could also use standard methods of design-based inference to simultaneously measure the effect of more than one treatment in observational data. You could have an RDD where treatment A depends on whether the forcing variable is above threshold Z1 and treatment B depends on whether the same forcing variable (or another forcing variable) is above threshold Z2; one could use this situation to measure the effect of both A and B, where each effect is conditional on the value of the other treatment at the relevant cutoff. As with the conjoint experiment, you could of course study these effects totally separately, though combining the analysis may make it more efficient.

Actually, we can simultaneously get unbiased estimates of the effects of multiple treatments in any situation where each treatment is a function of a set of covariates X (but not the other treatments) and we can do covariate adjustment for X. This could mean a regression that includes all of the treatments and X (with the proper functional form assumptions); we could also create matched groups for each combination of treatments and measure effects by averaging across groups. Where the treatments are conditionally independent of each other and determined only by observable covariates, then for each of the treatments we can say that the value of the treatment is independent of the potential outcomes (which Angrist & Pischke call the conditional independence assumption, or CIA) and thus the effects can all be estimated in one regression.

So clearly “one treatment at a time” is not an epistemological law. Why might it still be a good guide?

The key point is that although there are circumstances where the CIA is met for more than one treatment at a time (as illustrated above), these circumstances are not common. The more typical situation in observational studies is that the treatments affect each other in some way. When that is the case (for example when treatment T1 affects treatment T2), we must choose a treatment. Why? To estimate the effect of T2, we need to include T1 in the regression unless we have reason to think that T1 has no effect on the outcome (in which case we don’t really care about estimating the two treatment effects anyway). To estimate T1, however, we don’t want to include T2 in the regression because it is “post-treatment” and thus a “bad control” in Angrist & Pischke’s terms. Because T1 affects T2, estimating the effect of T1 precludes estimating the effect of T2; we must choose one of the two effects to estimate.

I think post-treatment bias is probably the context in which Gary King was talking about the “one treatment at a time” rule in 2004, because he was thinking about post-treatment bias around that time. (I’d like to go back and look at that work.) It occurs to me now that the causal ordering rationale for the “one treatment at a time” rule also has a useful connection to the literature on mediation effects by e.g. Kosuke Imai that I would like to flesh out at some point.

Let me restate all of this a bit differently (and a bit more succinctly): We can estimate the effects of a treatment whenever the conditional independence assumption (CIA) is met, but outside of conjoint experiments it is unusual for the CIA to be simultaneously met for more than one treatment. Fundamentally this is because the world is complicated and the treatments we’re interested in typically affect each other. In these situations we might be able to estimate the effects of the various treatments of interest. (It depends on the plausibility of the conditional independence assumption in each case.) But the design will need to be different for each treatment.