2.1. Case Study: John Snow#

John Snow’s investigation is a classic case study of attempting to establish causality. Snow was a medical doctor, and observed an outbreak of cholera in London in 1854. At the time, cholera was both common and could lead to thousands of fatalities in a given wave.

Snow noticed three important features of cholera epidemics:

once contracted by a person, death was fast: typically within days.
infection and death was typical local to a household: that is, while a family living in the same home would all become infected and die, a family living right next door to them might not.
the symptoms of cholera were digestive: people who caught would have stomach pains and diarrhea.

The question is why? That is, what was causing cholera and these features of its spread? At the time, scientists believed that cholera was caused by a miasma. Here, miasmas are “bad air” thought to be more common in poorer parts of the city. But the facts of cholera ere not compatible with miasma theory. For one, it is hard to understand why people in a single household would be affected, but not their neighbors, who were surely breathing the same air. Plus, the digestive symptoms suggest the disease comes from something people are eating or drinking, rather than breathing. Finally, if it were miasma (which is everywhere, all the time), it is hard to understand why some people suddenly fell ill but others did not. Snow’s prime suspect was water contaminated by sewage.

Theory and Hypotheses#

We will define a theory as being a:

conjecture about the causes of some phenomena of interest

Here the phenomena of interest is infection and death from cholera. Snow’s conjecture is that it is caused by the drinking of contaminated water. The conjecture of miasma theory is that the infections and death are caused by “bad air” of some kind.

We will compare the merits of these particular theories momentarily. First, note that “good” theories generally have the following properties:

they have observable implications. This means that they predict something specific about the world which we can check via data.
they are falsifiable. This means that we can imagine some observations in the world that are not compatible with our theory, such that we would have reason to doubt that theory is correct.

Many “theories” we come across in the world lack one or both of these properties. For example, many “conspiracy theories” are not falsifiable (they are “unfalsifiable”). This means that any evidence we find that runs contrary to the theory can be immediately accommodated by the theory. For example, when shown there is no evidence of a conspiracy, someone in favor of a conspiracy theory might say that this simply demonstrates how thorough the conspiracy was (!).

Note that a theory cannot be “proved correct”. For one thing, “proving” implies you can produce a “proof”, but that is specific to statements that can be shown to be mathematically true for all time given the assumptions you make or axioms you state. An example would be proving that \(\pi\) is an irrational number (the decimal repeats forever). We can prove theorems (like Pythagoras’ Theorem), but not theories.

What we may be able to do, however, is show that one theory is better than another. Or rather, more specifically, show that it is less wrong, as no theory is perfect. In general, we like theories that can explain more variation than others, which essentially means that they can predict an outcome (e.g. who will die of cholera) more accurately. A trade-off we typically face here is that theories that explain more are often are more complicated. That is, they have more moving parts or more variables we have to record before making a prediction. Generally, we prefer simpler theories. Indeed, there is a name for this idea: Occam’s Razor. This is a rule that says that if two theories have the same explanatory power, we should prefer the simpler one. But if they don’t have the same explanatory power, and the simpler one predicts less well, Occam’s Razor won’t help much. Ultimately then, we may need to trade off explanatory power for simplicity (sometimes called “parsimony”).

To return to Snow, his theory was that sewage contamination in drinking water was causing cholera infections. Some local people had mentioned to him that the source of this contaminated water might be a pump on Broad Street. This allowed to Snow to form a hypothesis which is a testable explanation. In particular for Snow this will be that

if we compare people who did and did not drink the water from the pump (“treatment”), we should see differences in infection rates (“outcome”).

We will define these terms in more detail below, but first, let’s look at Snow’s initial evidence as regards this hypothesis.

Snow’s Visualization#

Snow is famous for his systematic marking of the cholera epidemic in Soho—a part of London. A copy of his original map is below.

At the end of August 1854, cholera struck in the overcrowded Soho district of London. As the deaths mounted, Snow recorded them diligently, using a method that went on to become standard in the study of how diseases spread: he drew a map. On a street map of the district, he recorded the location of each death.

Here is Snow’s original map. Each black bar represents one death. When there are multiple deaths at the same address, the bars corresponding to those deaths are stacked on top of each other. The wide street in the middle of the map is Broad Street. And in the middle of Broad Street is a water pump. Just from casual inspection, it is clear that cholera deaths fan out around that point.

snow_map — Fig. 2.1 Snow’s first cholera map#

While the general pattern was clear, there were still some initially puzzling exceptions: for example, right next door the pump, there were no deaths at the brewery. But it turned out that the brewery had it’s own water supply. And there were some deaths scattered in streets around the pump, these were from children who drank from it on the way to school. Indeed, there were even some deaths very far from the pump in other parts of London: it turned out that these were from people who liked the taste of Broad Street water so much, they had it brought to them deliberately.

Snow subsequently discovered a cesspit that had been leaking into the well that supplied the pump. This is what had been giving people cholera. So, Snow removed the handle from the pump, in an effort to stop people using it. The John Snow memorial in central London is a pump without its handle.

Snow had provided some evidence that infected water is associated with contracting cholera. But he had not shown that it caused cholera. And therefore, he did not yet know for sure how to stop the spread. In fact, this problem is very general: it is much easier to establish associations than it is to establish causation. The latter needs much more care.

Data Science for Everyone

Case Study: John Snow

Contents

2.1. Case Study: John Snow#

Theory and Hypotheses#

Snow’s Visualization#