4. Statistics 1#

Welcome to the first statistics chapter of this book! Here are a few preparatory notes as we head into the wonderful world of statistics. First, we assume you’ve seen some statistics previously, but it’s absolutely fine if your experience is minimal. We’ve designed the statistics material for those with very limited experience, so don’t worry. Really, it’s more that it’s unlikely (statistically!) that you made it through high school without at least encountering some of the fundamentals!

That said, even if you’ve previously seen a fair amount of statistics, or perhaps econometrics, biostats, or the equivalent with particular focus on a specific field, we encourage you to nonetheless read this chapter carefully and work through the code in your own notebooks so that you make sure you’re absolutely crystal clear on the terminology we’re using.

Finally, a few thoughts for everyone, regardless of your prior experience with statistics:

Statistics has a very specialized and precise vocabulary, and some of it can be tricky to keep clear – in our experience, especially if it’s been a bit of time since your last exposure to statistics, it’s important to pay close attention to the terminology here so that you make sure you’re using and understanding it absolutely correctly.
In addition to the vocabulary being very particular, there are also many terms that sound similar but refer to different concepts – for example, a sample vs. sampling distribution, which we’ll discuss shortly. Please read this chapter closely with an eye to these almost-overlapping terminologies. We endeavor to make the distinctions as clear as possible, but a bit of memorization will likely be helpful as you go.
Statistics can feel rather unintuitive for many at first – we’ve found in many years of teaching and practicing that multiple exposures to the same material is extremely helpful for building intuition for the fundamentals. The sooner you feel absolutely at ease with these concepts, the smoother your data science life will be.
Just like programming, statistics is absolutely central to data science, but is not the whole story. In this book, we introduce many of the fundamentals you need to know to do more advanced work in data science, but this course is by no means a substitute for a full statistics course (just like it’s not a substitute for a full computer science course). Consider this a survey of the minimum you need to know. And if you enjoy it – which we hope you will – we encourage you to seek out future statistics courses and materials as well.
As with programming, and really all of data science, learning by doing is going to be your friend here. Don’t just read the examples – practice them yourself. Write out the code, try modifying it, run the simulations, and experiment lots. Remember, error messages can help you learn, so don’t worry if you generate a fair amount early on. There’s a lot of material here, and it’s meant to be very hands-on.

With that – we also want to say that statistics offers a fabulously useful lens for understanding, explaining, and even predicting the world(s) around us. You can think of it as a journey into philosophy expressed through mathematics (we’ll see a different philosophy, Bayesian statistics, later on), and we hope you’ll find it as rewarding and interesting as we do, even if (or, perhaps, especially if!) it also feels challenging at times.

4.1. Note#

This chapter is largely adapted from materials in the 2017 commit of Inferential Thinking. Please refer to the first page of our text for copyright details.

Data Science for Everyone

Statistics 1

Contents

4. Statistics 1#

4.1. Note#