Measurement
Contents
6.2. Measurement#
We’ve said that data is not Truth, and it doesn’t speak for itself. We’ll talk more about why data on its own can’t speak, but first we’ll turn to a closer look at what data is. Data is a momentary snapshot taken at a particular time and place that only exists because someone decided it was worth measuring, as well as how to measure it. We hope that data is reflective of reality – and in this and the next section we’ll discuss ways to start to assess this – but even a “perfect” dataset – e.g., one with zero errors and exactly precise measurement – is still only a snapshot. Unless we can put the entire world into a dataset (and then it would just be the world, not a dataset), we humans are making some kind of decision about what to put in our data and, necessarily, what to leave out. Thus, even before we conduct any analyses or even make any decisions about what analyses we might want to make, we’ve already narrowed reality considerably to fit into our necessarily finite dataset.
Further, we are almost never actually putting “reality” into a dataset. We’re usually inputting descriptions of reality that a human has decided is worth describing. Yes, there is a True amount of rain in the world, but we are describing how much of it there was in a particular time using inches recorded each day. Yes, there is a True temperature, but we’ve translated it into a number using a thermometer that we’ve decided should go in one spot of a city rather than another.
Importantly, saying that data is not True does not mean we throw it away, or dismiss or disregard it because we disagree with it. Quite to the contrary: data is incredibly powerful. It helps us understand trends, patterns, and relationships in the world that we would not be able to see without it. But data is not perfect, and it only exists because we’ve decided we need to describe one particular aspect of reality over another, and we’ve decided how to describe that reality in a particular way. Recognizing that data is necessarily limited and imperfect allows us to make even stronger inferences about the world than if we just took data as True. Reviewing data with a critical and rigorous eye is just as important as reviewing someone’s statistical methods or research design. There is no perfect statistical method, no perfect research design (and you should be suspicious of anyone who claims as much!), and no perfect dataset. Acknowledging this and then asking questions, such as: how can we make our data better better? What can we still learn? What are the limitations to our current knowledge, and what might be the next steps? – are how we keep learning more about the world.
In this and the next section, we’ll look more closely at specific ways to consider the limitations of data, and what to do about them. Up first is measurement, which is how we turn Truth into data. It’s a very influential step in data science, and yet one that is frequently overlooked in practice. Too often, we take data as given without interrogating where it came from. It came from measurement, which has three big steps.
As we proceed, recall that we said in the beginning of the book that data science is at the intersection of math and statistics, computer science, and substantive expertise. Measurement typically requires a fair amount of substantive expertise. If you’re working in an area that you know well (for example, we have chosen the example of “democracy” below specifically because we are familiar with research and data on this subject), you’ll be better equipped to measure the concepts you care about as well as evaluate others’ measures of them (though of course you should always share your measures with others and get input and ideas for improvement). If you are working in an area that you don’t know very well (and let’s be honest, part of the fun in data science is we do get to explore lots of interesting and different topics), we recommend collaborating with area experts for input on your work, ideally before, during, and after your research. At minimum, humility is important, especially as we attempt to measure things that do not obviously automatically lend themselves to numbers.
1. Conceptualization#
Conceptualization is the step where we decide what we mean by the thing we wish to measure. For example, suppose I want to understand whether the world has become more or less democratic over time. In order to do this, the first thing I need to do is narrow down all the possible things one might mean by “democratic” into something we can actually eventually turn into a number (or a word, or other piece of information, but we’ll use numbers in this example for simplicity). Our ideal dataset might be something that includes some kind of “score” for how democratic a country is each year for the past, say, 100 years (notice already I’m narrowing down the scope of my study from “all places ever”, as well as defining how “democratic” the world is by the democracy score by country, as opposed to, say, by population, or something else).
So, where do I get that “score”? Democracy means a lot of different things to a lot of different people. Many of us think of elections, but it might also mean certain freedoms, separations of power, rule by the people in some form that doesn’t necessarily require elections, an independent press, and so on. Step one of turning something as abstract and complex as “democracy” requires that we make a choice about which aspects we think are essential (while keeping in mind feasibility of the study). While we could have more than one conceptualization in a full study, let’s start with just one. Suppose we conceptualize democracy as a country having elections. (You could imagine a second variable on independent press, and so on.)
2. Operationalization#
Operationalization is the next step, where we turn the concept we’ve just selected into a number. Narrowing all the possible things we could mean by the idea of “democracy” to elections is a good start, but it doesn’t tell us specifically how to turn “elections” into a number. Some ideas might be. Some ideas we might consider could be:
Give a country a 1 if they’ve had an election in the past 100 years; 0 if they have not (this scheme of coding something 0 or 1 to indicate the absence or presence of it is called a dummy variable)
Write down the number of elections a country has had in the past 100 years
Do something more complicated, which means assign a score of 1, 2, 3, 4, or 5 to a country depending on whether it meets a series of specific requirements around elections, such as whether they happen on a regular basis, whether they’re competitive, and whether citizens are free to vote in them.
Which one is the most compelling operationalization? I’d say the third one. But, notice that it’s much harder to collect, and now actually gives rise to a number of other decisions we need to make, including:
What counts as “regular”? Is it enough that there’s been more than one election? Do they need to be evenly spread out? Do we need to worry if they’re too spread out?
How do we measure competitive? The number of candidates or parties who are running? The number that actually stand a chance? The evenness in the proportion of votes ultimately won?
How do we decide if citizens are free to vote? Is it as simple as the percentage of population eligible? Or who actually turn out on election day? Do we need to worry about how citizens learn about whom to vote for, or whether they feel they can safely vote for the candidate they prefer?
There are no obvious answers to these questions – and our decisions about how to answer them are unlikely to be trivial for our study. And, all of these questions arise from just one operationalization of one conceptualization of democracy! Not surprisingly, there are a lot of democracy measures out there, and most of them get very complicated and detailed very quickly, usually with multiple conceptulizations and operationalizations, each with multiple sub-parts. So, how do we know which one(s) to use? How do we know which one is “right”?
Conceptualization vs. operationalization#
The distinction between these two steps of measurement is often a source of confusion among students. The key distinction between the two is that conceptualization answers the question, “What do I mean by the thing I want to study?” Operationalization answers the question, “How am I going to turn that concept into data?”
Conceptualization must come first because it’s the step where we take all the possible dimensions, meanings, or interpretations of whatever it is that we’re interested in and where we narrow it down to something manageable. Again, you could have more than one of these, but those will usually be distinct variables; i.e., we might be interested in the “size” of a horse, which we conceptualize as height and weight in two separate columns.
Operationalization comes second because once we’ve narrowed our concept, we then specifically consider how we might turn that concept into data that we could put in a dataset. In some cases this is more straightforward than others. In the case of the height variable, operationalization would look like deciding what units to measure height in (horses tend to be measured in “hands”, but we could prefer feet or something else, especially if we eventually want to compare horse height to other heights), as well as figuring out how we are going to get that information. Does this information exist on each horse’s documentation somewhere? Do we have to manually go out and measure a bunch of horses?
In short, conceptualization is about what concept we are interested in. To keep it clear in your mind, the focus is on how do we think about the thing we want to study – what specifically do we mean by it? Operationalization is how we operate on that concept – it’s the physical steps we will take to turn our concept into data. Generally, when you’re reading codebooks about a dataset, the operationalization section is much longer. You might see something like, “We conceptualize democracy as fair and free elections,” followed by a paragraph or more about the specific rules by which the researchers turned those concepts into data.
How we ultimately conceptualize and operationalize variables is ultimately going to be the result of a mix of our own interests and research goals, as well as practicality. For example, when we study how democratic a country is, we might really be interested in the extent to which citizens feel that they have a say in how the country is run. Unless you have the budget for or access to a large, representative survey that asks exactly that question (and even then, as we’ll see in the next section, this doesn’t guarantee that we’ll actually be capturing what we’re looking for), you may need to start with a related conceptualization for which there is data – such as how many elections it’s had over a certain number of years. It won’t be perfect, but no single study ever is.
The task then is to be transparent about your measures and update your inferences based on the strengths and weaknesses of that measure. For example, when presenting your results, you might say that you found an increase in the number of elections per decade around the world, but that the study does not (yet!) take into account the levels of participation in those elections. It might be tempting to think that calling out the shortcomings of a study is a bad practice, but really science is all about being honest about the decisions we are making and the inevitable limitations of our work. This is how knowledge moves forward.
3. Validation#
Whew! Validation is the third and final step of measurement, and it’s when we evaluate whether we’ve actually measured what we think we measured. There is a wide world of techniques for validation, which is beyond the scope of this book, but one technique is to check the outcome of your measurement efforts against your own and others’ expertise about the world. This of course is not perfect, but it’s an important start.
Suppose we went ahead with our third operationalization above. We will end up with a list of countries and their democracy scores on a scale of 1-5 for each year of the past 100 years (notice: creating this will likely take a very long time and a lot of research!). Now, go through the list, perhaps starting with the most recent year. Have you coded countries as democracies that seem like they are democracies? Or have you coded some countries you think of as autocracies as democracies? Who are you “ruling in” as democracies, and who are you “ruling out”? And is that consistent with your expectaion of the world? If not, is the number of errors that you’re making more than you think is acceptable? If so, you may need to revise your measure.
At this point, this should all feel very unsatisfactory, and perhaps far too subjective. The good news is that most measures of democracy (and a lot of things) undergo a lot of validation checks that are more sophisticated than this starting one. The bad news is that not all data set subjects are so lucky (and as mentioned, many concepts that are even more abstract and complex than democracy have not been turned into data at all). Whenever you encounter a dataset, the first thing you want to ask is: How was this measured? Is it measuring what it claims to be measuring? How do I know?
One best practice to be aware of going forward is that researchers will often use multiple measures of the same thing in their analyses, and then compare the results. This is a type of robustness check. If I want to understand whether the world has become more democratic over time, I might first conduct my analysis on the dataset we’ve designed here. Then, to make sure my results are not just a consequence of something related to my measurement decisions, I might then do the same analysis on, say, the Economist Intelligence Unit dataset or the Polity 5 dataset, and other notable ones. If my results are similar across all datasets, I might say my results are robust to different measures.
Does this mean I’ve now gotten to the Truth? No, but it gives us a bit more confidence that we are measuring what we think we’re measuring.
Measurement in the wild#
One place to notice the importance of measurement in real life is to read claims in newspapers about the results of scientific studies. You may come across headlines such as Democracy is in decline! or Coffee causes cancer! or Money causes happiness!. In addition to being skeptical of their methodologies and causal claims, you can now also be skeptical of their measures: What do they mean by “money”? (Income? wealth? disposable income? a one-time bonus cash deposit?) How are they measuring “happiness”?
Again, none of this is to say that this critical thinking gives us license to ignore data and results that are counter to our beliefs or values. We cannot just ignore data because we don’t like it. But we absolutely can and should be critical of data, which means asking how was this measured? How was it conceptualized, operationalized, and validated? Do I, a smart person who thinks about the world, think this is compelling? Why or why not? How might I improve it? What can I still learn from this data even if I notice flaws in how it is measured?
These questions will help you make much stronger and more useful inferences about the world – regardless of whether your analyses that follow are simple trend plots or machine learning algorithms. If nothing else, from now on, whenever someone shows you the result of a study, one of the first questions that we hope you ask is, “How was this measured?”