More causality terminology
=========================



As we have seen, establishing that $X$ causes $Y$ is hard.  Here we add to our discussion by introducing some basic principles to keep in mind when making a causal claim, along with terminology that is important to know in such circumstances.

## General Principles of Causality
Establishing causal relationships typically requires that, at a minimum, we demonstrate: 
1. **covariation** between the variables
2. **temporal precedence**: that treatment comes before outcome
3. we have **controlled for third variables**

We now briefly discuss each one.

### 1. Covariation
In general, we can objectively determine the association between two variables.  Such measures include things like correlation, which we will cover later.  If no association exists, this suggests that the variables do not covary, and it seems unlikely that $X$ is causing $Y$.

But this is not enough.  For one thing, associations (e.g. correlations) do not imply causation: we can think of many situations where two variables are associated (like buying a dog and the owner's heart health), but may not be causally related.  

More difficult still, we sometimes have situations where a causal relationship exists but there is, in fact, no obvious association---at least in the data we have access to.  For example, in the NBA (professional basketball association), height is not correlated with player performance or player value or player salary. These things do not covary with height.  Studying only the NBA might lead us to believe height is unrelated to basketball ability in the broader world.  But this is wrong, and due to a problem called **conditioning on a collider**, which is a more advanced topic.

### 2. Temporal Precedence
In general, we want our treatment to precede our outcome: it is very hard to make claims about causality without this! But of course, there are many examples of one event consistently coming before another but there being no causal relationship: roosters crow in the morning, but they are not causing the sun to rise.  More subtly, there is no evidence that vaccines cause autism, though certain vaccines may be administered just before children are typically diagnosed with autism.  

### 3. Confounders and Controls
As we said, confounders---factors that influence both treatment status and outcomes---are a constant threat.  One way to reduce our concern is to randomize subjects into treatment and control (and hope they comply with their group assignment).

If we cannot randomize, we may be able to **"control"** for known confounders.  Here "control" means "take into account", and essentially means we are comparing within levels of the confounder, to see if there is any relationship between treatment and outcome.

A classic example here is the relationship between ice cream consumption and drowning deaths.  Months with more ice cream eaten (treatment) are also months with more drowning deaths (outcome).  But this is probably not causal: it seems likely that the ambient weather or air temperature is a confounder, $Z$.  When the temperature goes up, people eat more ice cream and also swim more, which leads to more drowning deaths.  If we "control" for the weather $Z$, we are implicitly comparing days *at the same given temperature* (say, 85 degrees) and asking whether in that set, the days with more ice cream consumption also have more drowning deaths.  This is almost certainly not the case, and we can conclude the relationship is not causal.  






## Causal Terminology
When we talk about causality, it is helpful to use certain terms in a specific way.  Below we first deal with understanding of *types* of causality; second we mention a particular but common problem in assessing causal relationships ("endogeneity"); third, we logical conditions as they may pertain to causal relations; finally, we discuss "mechanisms".

### Deterministic and Probabilistic Causality
How should we think about causality, in general?  As in, not just in terms of a definition of a causal effect but the philosophical nature of causes. There are two broad possibilities: 
1. **Deterministic** causation: here, if $X$ causes $Y$, then $X$ occurring will mean $Y$ occurs.  This is how we might model certain physical laws, such as gravity.
2. **Probabilistic** causation: here, we say that if $X$ occurs, $Y$ is *more likely* (or less likely) to occur.  

Deterministic causation claims are somewhat rare in social science, and mostly turn up when talking about logical systems, like what constitutions say must happen in the event of some condition applying.  

Some scholars argue that essentially all causation is deterministic but appears to us as probabilistic. That is, we might believe there are specific factors which we could in principle know and include in our models for events, but in practice we simply cannot observe them all---thus causation appears probabilistic.  For example, whether a tossed coin comes down heads or tails appears random to us, but if we knew the various factors acting on the coin (toss speed, air pressure, spin velocity etc) we would be able to predict the result with certainty.


### Endogeneity 
You will regularly hear analysts talk about the problem of **"endogeneity"**.  This can mean many things, but in general refers to the idea that an independent variable (perhaps the treatment of interest) is being determined by (or is at least associated with) something not included in our experimental set up and that factor is also affecting the outcome.

This sounds vague, and in this course we will think about endogeneity as being the specific problem that occurs when $X$ causes $Y$ and $Y$ causes $X$.  This is sometimes referred to as **reciprocal causation**.

It is straightforward to think of examples: 

- democracy and development: we can imagine that higher levels of democracy might help countries become richer (perhaps via more stable legal rights that citizens can rely on to build businesses); but becoming richer may lead to more stable democracy (perhaps because citizens see that democracy increases living standards, and become invested in it). But this will encourage democracy, which in turn will encourage development etc.  
- stress and physical illness: being very stressed can make one physically ill.  But being very physically ill may lead to stress, and so on.

In both these cases, it is no simple matter to tease apart cause and effect. We cannot simply decide one of the variables in the outcome and one is the treatment, and learn the causal effect of (what we say is) the treatment by comparing units with different treatment levels in terms of their outcomes. 


### Necessary and Sufficient Conditions
When discussing causal relationships, it is helpful to be clear about whether one is asserting that a factor is necessary, sufficient, both or neither for a given outcome to occur. 

If $X$ is a **necessary condition** for $Y$, then

- if $Y$, then $X$
- $X$ is required for $Y$; $Y$ cannot occur without $X$
- every time you observe $Y$, $X$ must be present


If $X$ is a **sufficient condition** for $Y$, then

- if $X$ then $Y$, but $Y$ can occur without $X$ occurring.
- $X$ is sufficient for $Y$, nothing else is required.
- every time $X$ occurs, $Y$ will occur 

Just to give a couple of examples: 

- being a US citizen is *necessary* but not sufficient to vote for president.  You cannot vote for president unless you are a US citizen.  But that is not enough: you also have to fill in some forms registering, not be a felon (in some places) etc.
- birth in the US is *sufficient* but not necessary to be a US citizen.  Simply being born in the US is (generally) enough to get citizenship, but you can get it other ways---like naturalization.
- the date being July 4th is *necessary and sufficient* for it to be Independence Day (in the US).  It is sufficient, because you need nothing else.  It is necessary, because the US does not celebrate independence on any other day.
- having a US passport is *neither necessary nor sufficient* to run for President of the United States.  You can run for president without owning a US passport.  But owning one, and thus being a citizen, is not enough---you must be "natural born" etc.

### Causal Mechanisms
A **causal mechanism** is 

> a sequence of processes leading from $X$ (the treatment) to $Y$ (the outcome)

We need some sense of a possible causal mechanism to know what we should test and how.  In the case of Snow, he suspected that consuming contaminated water lead to cholera. This idea allowed him to set up his natural experiment via the water companies. He suspected the issue was fecal-oral transmission, but he didn't know the exact mechanism by which this happened.

Later, scientists (like Filippo Pacini and Robert Koch) would refine the mechanism, and understand the importance of the cholera bacterium.  Refining the mechanism allows for better understanding of how to prevent the disease spreading.  
