1.3. Why have a special course in data science?#

If data science is so interdisciplinary, why not just have students take one course in programming, one in statistics, and one in some research discipline – say, biology, politics, business, or whatever interests them? For a long time this was, indeed, how data science was taught. Today, however, three recent changes have made it not just helpful but, truly, necessary to combine all three into one course:

Big data#

If you’ve paid attention to anything in the world – news, tech, sports, health, government – you’ve already heard that the amount and availability of data today is unprecedented in human history – by a lot. If you have access to the Internet (which you do if you’re reading this book!), you have access to more data than any of us will ever have time to make sense of in our lifetimes (sorry to break the news). Given the availability and size of the data out there in the world, the more of us who are specifically trained to analyze, interpret, and apply data science to our many pressing problems and opportunities, the better of we will all be.

Computation#

At the risk of sounding like your parents, the amount of computation we are all now able to do on our personal computers and even phones is also unprecedented. This has allowed for the development and flourishing of sophisticated computational techniques to not just analyze data, but also generate, collect, visualize, and share data with one another, among other activites. There are many applications of computer programming (or “coding”) beyond just working with data – such as game and software development – but the sophistication of data science-specific programming techniques increasingly warrants its own programming specialization.

Indeed, if you’ve already done some programming in a computer science context, you’ll see that the basics are the same, but we’ll quickly move to more data science-specific syntax that you may not have encountered in a traditional computer science introductory course.

Randomization#

We can also use data and computing together to improve our abilities to conduct experiments and make causal inferences. We’ll talk much more about this in coming chapters – including the difference between observational and experimental data and research designs – but this is an incredibly powerful part of data science when it comes to making inferences about the world, and especially with respect to causality. You may have come across terms like “A/B testing”, which is a great example of this. We’ll talk about that, as well as permutation analyses and simulations.