2. Thinking like a scientist#

Consider the following questions, which cover a broad range of substantive areas:

  • Why do some people get cancer and others don’t?

  • Why does war exist?

  • Why might we experience depression or anxiety?

  • How can we eradicate malaria? What interventions are optimal?

  • What makes one football team better than another?

  • How can we predict which startup companies will succeed?

Ultimately these are questions about causation and causal effects:

  • What are the causes of cancer?

  • What are the causes or war?

  • What are the causes of depression?

  • What causes malaria?

  • What causes success, rather than failure?

Without understanding causal relationships, it is very difficult to design useful drugs, give advice, or generally interact with the world scientifically. Central to “thinking like a scientist” and uncovering causal relationships is the scientific method. We can lay it out as a series of (potentially) iterative steps:

  1. Observation: observe the world; describe the patterns we see.

  2. Question: ask why these patterns exist.

  3. Theory: think about what might explain these patterns.

  4. Hypothesis: make a testable statement about what our theory would predict

  5. Test: look at the support for that hypothesis

  6. Update theory: perhaps refine it, or put it aside if a better theory emerges

  7. Repeat as desired

Below, we will study how a particular scientist, John Snow, went about this process in his investigation of cholera in London in the 1850s. Before doing so, notice that only a few of the steps above involve data directly: when we make our observation, when we gather data to test our hypothesis, and potentially when we repeat the scientific investigation process. The point here is that while being a data scientist involves data, the science part is essential (and easily overlooked).