1.1. Data science in this course#

In part because the field is so new, in part because the field is so interdisciplinary, and in part because we all have different opinions, training, and priorities, there are about as many definitions and interpretations of data science as there are data scientists.

For our purposes data science is:

using data and science to better understand the world.

This might be through one or more of the following activities:

  1. Exploring data; i.e., identifying patterns and trends in the world

  2. Drawing inferences from data; i.e., quantifying how reliable a pattern or trend is

  3. Making predictions about the world using data; i.e., make informed guesses about patterns or trends in the future

One of many ways we might visualize where data science “fits” in the broader scheme of science and established disciplines is as follows (image modified from Drew Conway):

venn

Fig. 1.1 Where data science fits#

All three components (programming, math and statistics, and subject matter expertise) are necessary for data science. A statistician, computer scientist, and “traditional” researcher or area expert could be three different people who collaborate – and this does happen in both industry and research. But you can think of this course as an introduction to all three. Later in your data science career you may choose to specialize in, say, more programming or more subject matter expertise, but developing skills in all three areas, in our experience, is very helpful.