3.2. Why Python?#

There are many excellent programming languages out there, and Python is just one of them. It’s not perfect by any means, but all programming languages have strengths and weaknesses, and we believe that the balance of both in Python make it a great starting language for data science. Here are a few things we like about Python (though note that not everyone agrees these are advantages – it really depends on what you’re trying to do):

  1. High level: You don’t have to think about underlying computational details in order to use it

  2. Readable: if you speak English, Python is much easier to “read” than many computer programming languages, as it uses a lot of actual English words, whitespace, and math syntax you already know

  3. General purpose: We’re using Python for data science, of course, but Python can be used for lots of programming applications. This also means we can do, e.g., statistics analysis and data visualization using the same language (though there are different programs that specialize in each)

  4. Interpretation: Some programming languages require running code through a “compiler” that translates human-readable code to source code, or computer-readable code. This is more efficient than using an interpreter, but it means it’s harder to troubleshoot lines of code as you go.

  5. Free & open source: We like free things! And, as we saw in the previous section, we’re going to rely on libraries (or “packages”) quite a bit going forward. Because Python is so popular, and because it’s open source, there are new packages being developed all the time with new functionalities that make our lives as data scientists easier. (And, as you progress as a data scientist, you can contribute your own packages to the community!)

  6. Lots of online support: You’ll quickly learn that online resources about how a particular line of code works, what a particular error message means, or how to solve a particular sub-problem are one of a data scientist’s most important resources. Python’s popularity also means there are mountains of resources available for often even the most specific problems.

  7. Many libraries specifically for scientific computing: We’re going to use a number of packages that have been specifically created for data wrangling, statistical analysis, and plotting. Again, none of these are perfect, but they work well together and offer a robust set of tools for data science all within the same language.

This textbook has not been sponsored by Python or anything like that, and we don’t mean to imply there aren’t other excellent languages we could be using or that you can expect to encounter as you continue in data science. Another common, and useful, language widely used in the quantitative social sciences is a language called R. If you’ve previously taken computer science courses, it’s reasonably likely you learned Java. You may have also heard of others – everything from classics like C++ and Ruby, to database-oriented languages like SQL, or even (slightly) more recent ones like Go and Scala. All programming languages (well, many!) have advantages and disadvantages, and it’s likely that if you continue as a data scientist you’ll pick up a few more. The good news about learning a programming language is that – much like learning a human language – the more you know, the easier it is (usually) to pick up new ones. While the specific syntax may be (quite) different from one to the other, many of the underlying components of code – and the deeper insincts of designing a program and sub-problems – are close to universal.

All of that said: we think Python is pretty great, relatively fun(!) to use, pretty flexible in terms of its ability to be used for everything from data organizing, to “traditional” statistics, to machine learning, to data visualizations, and more, and quite in-demand both in industry and research. So, here we are with Python.

Frequently-asked questions from students in the past about programming languages#

1. Hang on, what if I want to be a data scientist but don’t want to learn Python?#

Well, there are lots of resources out there that offer data science education in other languages besides Python. That said – learning any language is useful, and will generally make your next language easier to learn. Plus, it’s a widely used language so, like English or Mandarin or Spanish, is generally an advantage to know wherever you go.

2. What if I learn Python but then get (or want to get) a job that requires SQL (or something else)?#

As we said earlier in this chapter, one of our teaching philosophies when it comes to coding is not just teaching you useful code, but also teaching you how to find, understand, and implement code that goes beyond the specific snippets we provide. This skill applies just as well to teaching yourself a new language in the future. Of course, you can always take future courses, but we bet that, for example, if in a few years you find yourself needing to use R instead of Python, if you have strong intuition about coding and an ability to make sense of online documentation (again, two strong focuses in this book), you’ll be able to teach yourself pretty quickly. How awesome is that!

3. Is a programming language a “language” like a human language?#

Yes and no. Comparing programming languages to human languages is useful for sure, but the mapping isn’t perfect. Here are a few distinctions:

  • Unlike human languages, programming languages have zero tolerance for error. If you’re speaking Spanish and forget or mispronounce a word, your conversation partner may still be able to understand you given the context. If you have so much as a comma missing in Python, you’ll get an error message.

  • Unlike human languages, programming languages don’t really require things like conjugating verbs or memorizing how to change tenses or anything like that. Programming languages do have a grammar, in the sense that there are rules for the order in which to put together commands so your program does what you want it to, but it’s a comparatively simple version that mostly relies on logic. Also, you don’t have to worry about irregular verbs or strange spellings!

  • One way to think of programming languages is as somewhere in between math and a human language. You need to give instructions for the computer to carry out certain things (such as import, drop, and groupby), but the instructions will almost always be verbs followed by some kind of mathematical or logical expression (“divide each number by two”). You can kind of think of it as a numbers-centric shorthand for communicating tasks.

  • One of the biggest misperceptions about programming, at least in the data science context, is that one becomes “fluent” in a programming language and can sit down and write a program from start to finish just like you’d write an essay. Of course, with both programming and essay-writing, you do go back and edit, rearrange, remove, and otherwise tidy up your work, but writing a data science program is typically far more iterative, as you need to regularly check in with the computer and make sure it’s understanding what you’re saying. Because computers are sensitive to errors, you also will almost always spend a lot of time looking up syntax (“is it [( )] or ([ ])?”). Some people do memorize this, but not many.

4. Got any tips for learning to code?#

I’m so glad you asked! Here are a few (note that most apply to any skill you want to learn):

  • PRACTICE. Coding is something you do and the best way to learn how to do it is to … do it. Just like learning a human language (finally, a similarity!), passive learning will only get you so far. You have to get out and speak and be willing to make mistakes.

  • Error messages are your friends! Students new to coding think of error messages as something to be avoided. Of course, in our final program we can’t have errors, but error messages can be helpful feedback as we go for why something isn’t working. They seem scary, and can be a little hard to decipher sometimes, but are information. Sometimes I run code that I know will run an error just to get some information about where to start my revisions.

  • Write out code we provide, or that you find online, rather than simply copy/paste. We’ve said this before but it’s worth repeating – it feels trivial, but makes a big difference in understanding what’s happening – especially since code can be somewhat unintuitive to read.

  • Practice OFTEN. Notice a theme? The more frequently you open up code and try it out, the less scary it will become. Also, coding is definitely something that can take a surprisingly long time, so don’t wait until the last minute for any assignments, but especially not assignments with lots of coding. Sometimes a line of code that you think should take 30 seconds to write will take 3 hours due to a bug (i.e., a trivial error you can’t figure out).

  • Take breaks. Writing code is kind of like solving a puzzle you’ve created. If you are stuck, the temptation can be to sit for hours obsessing over each line. Go for a walk, clear your head, and you might be surprised when you return with fresh eyes.