3.5. Libraries#

Libraries, which we’ll also call packages (the terminology varies by programming language, though we’ll treat them interchangeably here), provide additional functionality in Python without us having to write code from scratch. In some cases throughout this book in order to build intuition for the techniques we’re practicing, we will write rather complicated programs by scratch. But in many cases, particularly for widely used data science techniques, we can also import packages into our programs, which you can think of as providing additional specialized functions that we don’t have to write, define, or declare ourselves.

Given that Python comes with its own set of built-in functions, you can kind of think of packages as a way of installing additional “built-in” functions that offer functionality related to a specific task, such as more advanced calculation, data manipulation, data visualization, statistical analysis, and more.

To import a package, we use Python’s built-in function import. First, let’s import a package called NumPy, which offers useful scientific computing functionality (“NumPy” is short for Numerical Python, and you can find the official documentation here).

Importing the NumPy package to create NumPy arrays#

import numpy as np                # import the package NuPy, call it np so we can refer to it later as np

Notice the second half of the above code as np. This tells the computer that we are going to refer to numpy as just np in our program. This means any time we want to use a function that is part of the NumPy package, rather than write out numpy, we can simply write np. Note that we do not have to to this – we could simply write import numpy, and then we would have to write numpy in full every time we wish to call a function from this package.

Let’s make this concrete with the following example, in which we’ll also introduce an array. Like a list, an array is a variable that can hold more than one value at a time. We’ll work here with one-dimensional arrays, which operate much like lists, but with more functionality. We’ll work with two-dimensional arrays later in this book (these are effectively tables, where we’ll have values that are stored both horizontally and vertically).

Below, we create our first array. We’re creating a NumPy array, which is common, but note there are other packages we could use to create arrays. To create a NumPy array, we write np.array([ ]). (Note here is where, if we hadn’t written as np above, we would have to write numpy.array, which is perfectly acceptable, but slows us down! Note also that we don’t have to write np as our abbreviation. We could write whatever shorthand we want as long as it uses letters and isn’t already used elsewhere, but np is the convention, so we’ll stick with it, as this is what you’ll find in documentation for NumPy all over the internet.)

my_array = np.array([6, 7, 8, 9])               # create an array, call it my_array
print(my_array)                                 # print the contents of my_array
[6 7 8 9]

Note we can also use our familiar type built-in function to inspect the type of my_array. Notice the returned type specifies that is is a numpy array.

type(my_array)                                  # inspect the type of my_array
numpy.ndarray

We can also convert my_array to a list, as before:

list(my_array)                                  # convert my_array to a list
[6, 7, 8, 9]

Also as before, we’d need to rename my_array if we wanted our conversion into a list to persist (something like my_new_array_which_is_a_list = list(my_array). For now, we want to keep it as an array, so we can confirm, indeed, it’s an array still:

my_array                                        # see for ourselves that my_array is still an array
array([6, 7, 8, 9])

We’ll do more with arrays in the future, but for now notice as well that multiplying arrays operates by actually multiplying each element:

my_array*2
array([12, 14, 16, 18])

Modules#

Modules are subsets of packages/libraries. In some cases, we may prefer to install just one part of a package rather than the whole thing. We might prefer to do this for memory/computation reasons (unlikely to really matter in our work, but in future work may be more relevant), for code readability and simplicity (so others know specifically what functionalities you’ll be carrying out in your program), and for convenience (as we’ll see, depending on how we import a module, it may allow us to write less each time we use it).

For example, to calculate the mean of my_array, we can import a package called statistics. Once we import statistics, we can then use the module mean by writing statistics.mean.

import statistics                              # import the package statistics
statistics.mean(my_array)                      # calculate the mean of my_array
7

Alternatively, if we know, for example, that all we want to use from the statistics package is the functionality to calculate the mean, we could simply import the module mean. The way we’ve done this below means we don’t need to rewrite statistics when we call mean:

from statistics import mean                   # import the module mean from the package statistics
mean(my_array)                                # calculate the mean of my_array
7

Which version you use in your code generally depends on a mix of personal preference, the tasks you plan to carry out, and, often, simply convention (i.e., it’s what other people are doing, and thus it’s easier to check and share your code).

Note here we could have imported statistics as some abbreviation, too, like we did for import numpy as np. Just as a for instance, we could have written:

import statistics as stats                 # import the package statistics and call it stats
stats.mean(my_array)                       # calculate the mean of my_array
7

Feel free to experiment and see for yourself that you can abbreviate packages however you wish – though for simplicity and debugging in the future, we advise sticking with convention (how unimaginative, we know!) as much as possible.

Other libraries#

As we continue, we’ll use – and have ample opportunity to practice with – more libraries, including:

  • Pandas (Panel Data): working with data structures, data manipulation, and some analysis

  • SciPy (Scientific Python): scientific computing, mathematical functions, machine learning

  • Matplotlib (Mathematical Plotting): plotting/graphing

  • Scikit Learn or “sklearn” (SciPy Toolkit): machine learning; works with SciPy and NumPy

  • Seaborn and Bokeh: two packages we’ll spend less time with, but are useful (and fun) for making more aesthetically pleasing visualizations, including interactive ones