Establishing causality is hard
Contents
2.2. Establishing causality is hard#
Establishing causality is, in general, much harder than establishing associations (like correlations). Statements about causality rely on counterfactual reasoning. In particular, we will say that the
Causal effect of some factor \(X\) is the difference between what actually happened and what would have happened… if \(X\) had been different in some way.
We can be even more specific; we can say that the
Causal effect of the treatment (X) on the outcome for a specific case at a specific time, is the difference between the actual outcome and the hypothetical outcome that would have occurred…in the same case, at the same time, had \(X\) not been present.
The term “treatment” may remind you of medical trials, but it does not need to connote a drug, specifically. The “outcome” could also be medical (e.g. got better or not), but need not be. For now, we will think of both the treatment and the outcome as being binary, meaning they take one of two values. That is, the entity under study received the treatment or did not receive the treatment (and we will generally refer to the non-treated group as “control”), and then the outcome is one of two options. A “case” here will be whatever unit the causal effect (and the treatment) applies to: a patient, a resident of London, an animal, a country, etc.
In the case of cholera in 1854, an example of this reasoning would be
the causal effect of drinking contaminated water (treatment) on the infection status (outcome) for a particular person, on a given day, is the difference between their infection status after drinking the water, and the infection status they would have had on the same day had they not drunk contaminated water.
As this example suggests, there is a Fundamental Problem of Causal Inference:
we never get to see both scenarios for the same unit (person) at the same time, and so we can never know the causal effect with certainty!
In other words, if I observe a particular person drinks from the pump on a particular day, I can record their outcome. But I cannot then go back in time and see what would have happened to that specific person had they not drunk from the pump.
However, all is not lost. What we can do, and what Snow did ultimately, is find good comparison cases—some of whom receive treatment and some of whom do not (“control”). Under some important conditions, we can compare those groups in terms of their outcomes, and calculate the causal effect of treatment.
Experiments#
Experiments allow us to estimate causal effects. We say “estimate” because, to reiterate, we will never know the effect for certain owing to the Fundamental Problem of Causal Inference. The key components of an experiment are:
a treatment group and a control group, with a placebo
random assignment of treatment or control to subjects
double-blind design
Components 1 and 2 make the experiment a randomized control trial (RCT) (also called a “randomized controlled trial”); while component 3 is not technically required for an RCT, it is preferred if at all possible.
In some cases, it may not be practical or ethical for a given experiment to have all the components. This does not mean we cannot learn anything about causation for those questions, but it does make life more challenging. We will deal with each component in turn.
1. Treatment and Control#
The treatment group is
the group of subjects (the units on which the experiment is being conducted) receiving the treatment.
To reiterate, the treatment could be almost anything: water from the pump, a leaflet to read, a particular job training scheme, a life experience etc.
The control group is
the group of subjects who do not receive the treatment.
Members of the control group may, however, receive a placebo. This is
a pretend or “sham” treatment that the experimenter knows does not affect the outcome in and of itself.
A classic example in a medical trial is a sugar pill: it will not have any therapeutic value in terms of curing the underlying disease being studied. So why give it? We use placebos to make sure that subjects in the control group do not know they are in the control group: they are receiving a “treatment” (though it doesn’t do anything), and so will not know for sure which group they are in. This is important because simply knowing which group (treatment or control) you are in—and having an expectation about how the treatment will work—can effect how people feel, and thus the outcomes they report. But we want to know the real effect of the drug, not the psychological effect of believing you did or not receive it.
To calculate the causal effect of the treatment, we compare the outcomes of the treatment group with those of the control group. We ask how different they were, on average. In practice, this often means we must take into account a placebo effect, which is the effect that receiving the placebo has on the reported outcomes in the control group. In many cases, control group members will report feeling better because they believe they received the treatment (but they didn’t). So, in practice, we calculate the difference in the outcomes net (subtracting off) any placebo effect.
A key observation here is that without a control group, you cannot make causal inferences. For example, Snow needs to observe people who did not drink contaminated water along with those who did. Otherwise, he has no one to compare. Similarly, in a drug trial, claiming that, e.g., a treatment causes colds to improve after three days does not mean much unless you have a control group (because the cold may naturally improve three days, and the treatment actually does nothing to speed it up).
For questions of public policy, finding a control group can be very difficult. For example, suppose you want to know what the effect of a given state’s new contraceptive policy is on its teen birth rate. Maybe the teen birth rate changes when the policy is introduced—but the birth rate change may be due to other factors that occurred at the same time. Perhaps we could compare the teen birth rate to other states that don’t have that treatment, but those states may be different in other ways.
There are even harder problems: consider asking what “the effect” of a particular President of the United States was on the US economy. We don’t have a US economy not subject to that President, so drawing firm causal conclusions is tough. In practice, beyond this course, there are other things we can do to set up an “as if” control group, but we have to accept we will move away from our ideal experiment logic.
2. Random Assignment#
Suppose we were conducting an experiment for a new drug. We would prefer that subjects did not select into the treatment or control group. By “select into” we mean subjects know which group will get the drug and which will get the placebo, and they choose which group to be in and thus whether they get the drug. One reason for this is that we imagine people who want to take the drug—as opposed the placebo—may be different in some way from others. Perhaps they are less risk averse, perhaps they are sicker, or perhaps they just like taking drugs. In all these cases, it seems plausible that their reported outcomes (in terms of say, how they feel, what side effects they experience) will be different, above and beyond the causal effect of the drug itself.
One way to avoid this selection problem is to randomize subjects into treatment and control. This might mean tossing a coin for each subject, and if the coin comes down heads they go to treatment, and if it comes down tails, they are assigned to control. Randomization ensures that the treatment and control group are the same, on average. This eliminates biases and differences that would otherwise make it difficult to assess the causal effect.
3. Double-blind Design#
We would prefer that subjects do not know whether they have been assigned to treatment or control. Indeed, one reason we give a placebo is to ensure the subjects cannot easily tell, and thus over- or under-state the outcomes they experience. We say an experiment is single-blind if
subjects do not know whether they are assigned to treatment or control.
Ideally, we would prefer that the experimenter also not know who is in treatment or control, at least when conducting the experiment. This is because the scientist might consciously or unconsciously record outcomes differently or signal information of different kinds to people they know were given treatment versus control. For example, someone in the control group might believe they have benefited from the drug and report this at the end of the experiment. But if the experimenter knows this is not possible, they may ask more skeptical follow-up questions to the subject, who may in turn change what they report. This might make the drug look more efficacious than it really is.
We say an experiment is double-blind if
neither subjects nor the experimenter knows who is assigned to treatment or control.
In practice, ensuring double-blind, or even single-blind, conditions can be difficult. Consider, for example, a surgical intervention, like bariatric surgery for weight loss. Even if we could randomize patients to treatment and control, finding a placebo might be very challenging. That is, subjects would know if they had surgery or not (as would the experimenter) and this could be problematic for our causal effect estimates. Some studies use sham surgeries as a placebo, wherein the doctors essentially pretend to operate on control group subjects.
Back to Snow#
In Snow’s case, the people living in Soho are not randomized into drinking (treatment) or not drinking (control) from the pump. Indeed, we can imagine that those who drink from the pump differ in various other important ways from those that do not. For example, perhaps poorer, less healthy people are compelled to use this pump and others are able to travel further to other pumps. So he cannot easily do a randomized control trial (RCT) that has the three components we mentioned above (note that we sometimes use the term “field experiment” when such an experiment takes place outside the lab, in the “real world” in which our subjects live).
But even if he could do an RCT there are surely ethical issues. It would not be ethical to assign people to drink contaminated water, if we know it is likely to kill them. Of course, the cholera outbreak is not unique in this regard: we cannot do an ethical RCT on the effects of smoking on lung cancer, by assigning people to smoke cigarettes (or not) for 25 years. Instead then, we will rely on observational data which is
data where the analyst does not control who (which units) gets treatment and who does not
Indeed, in many observational studies, the analyst often does not even know the exact way in which a treatment (like smoking) came to be “assigned” to subjects.