9. Prediction 1#

Let’s step back for a moment and ask: Why are we doing Data Science?

When we apply data science to a given applied problem, we are typically trying to do one of three things:

  1. Explanation. These are mostly “why” questions, and usually involve the causal inference lessons we discussed earlier. Sometimes, we are trying to understand why a particular event occurred (“What caused the Second World War?”, “what \(X\)s caused \(Y\)?); sometimes, we are trying to understand how a particular (causal) factor changes outcomes (“What happens to your heart health if you follow the Keto diet?”, “how does \(Y\) depend on \(X\)?”). An example above was Snow trying to understand why some people contracted cholera.

  2. Policy evaluation. Here, the problem involves some causal reasoning, but is narrower in terms of what we are investigating and trying to conclude. In policy evaluation we are studying the effects of a given intervention, which could be a government action, or some change that a firm made to a product, or even some norm shift in the public. We focus particularly on whether a policy “worked” or not, and in what ways it was successful. We don’t necessarily try to explain why it worked (or did not). An example above might be the various government acts to clean up the water supply in 19th Century London. There was a change in policy, and we want to understand what its effects were—potentially, but not necessarily, because we are contemplating rolling out the policy to other places.

  3. Prediction. In prediction, we are not necessarily interested in causality at all—at least not directly. Instead, we want to understand the relationship—causal or not—between some \(X\) and some \(Y\). The goal now is to forecast, often with careful attention to uncertainty, what will happen in the future. The “future” here could be in a few minutes (e.g. predicting the outcome of some soccer match), or a few days (e.g. the weather), or many years (e.g. demographic dependency ratios). Importantly, we may not be able to explain exactly why \(X\) effects \(Y\) in the way it does, and by extension we may not be able to suggest policy interventions based on our work. But we can, nonetheless, be good at estimating what will happen.

It perhaps goes without saying that prediction is important for much of our daily lives. The gamut runs from fairly trivial things like suggesting movies a Netflix user is likely to enjoy, to very serious things like inferring what treatment someone will need depending on what drug they have taken. In any case, it is a core part of data science—especially “machine learning”—so we will spend some time on it here.


9.1. Note#

This chapter is largely adapted from materials in the 2017 commit of Inferential Thinking. Please refer to the first page of our text for copyright details.