7. Programming 2#

Welcome back to more programming! This is the second (and last!) chapter that will explicitly focus on programming. Our aim is to walk through a few additional fundamentals of coding that are at the heart of much more advanced coding to come.

You may have noticed, however, that we’ve already presented a fair amount of code that we have not explicitly explained in previous chapters. For example, in the chapter 5 (Statistics 2) discussion on bootstrapping, we found the median of our population using the code, np.percentile(). We did not start by explaining what np.percentile() does. Yet, you likely were able to figure out what it does, and why the 50 is so important in that same line of code. (If you don’t see it, go back and try inputting different numbers to see what changes.)

Believe it or not, all of this is by design. As we’ve stated many times, our goal in this book is not just to teach you useful things (though that’s a big part of it). We are also here to teach you how to figure out new things on your own. Every dataset, every problem, and every circumstance will inevitably require you to find new code, new techniques, or (usually) both – and no single course can possible anticipate or cover them all. Indeed, one thing that will make you a better data scientist is not just knowing a lot of code (again, you do not need to explicitly memorize code, though you’re welcome to, and probably will as a by-product of use some syntax a lot), it’s knowing how to read, understand, implement, and modify (with appropriate credit to the original coder!) other code out there in the world.

Another way to think of this is that programming generally – and especially so within Python – comes with a strong community. There are lots of coders out there who have already encountered problems like the one you’re working on – or something similar enough – who generously share their code and their work online. Being able to find new code, work through it and make sense of it on your own, and modify it for your needs isn’t cheating – it’s building on the work of others, and saving you from reinventing the wheel. (Again, you want to be careful to cite anyone whose code you rely on!)

That said – this is a class and we don’t totally want to leave you in the dark. So, here are three more techniques that are absolutely crucial in data science (and programming generally). As with what we’ve seen before, the techniques should be rather conceptually intuitive, but not necessarily simple as we put them into practice. Now, as ever, we encourage you to practice with these techniques as we give them to you – both by replicating the examples we provide and thinking up and testing some of your own. The sooner you internalize how these techniques work, the smoother your coding life will be going forward.