Bi 1x 2016: Setting up a Python distribution

This tutorial was generated from a Jupyter notebook. You can download the notebook here.

In this tutorial, you will set up a Python computing environment for scientific computing. There are two main ways people do this.

  1. By downloading and installing package by package with tools like apt-get, pip, etc.
  2. By downloading and installing a Python distribution that contains binaries of many of the scientific packages needed. The major distributions of these are Anaconda and Enthought Canopy. Both contain IDEs.

In this class, we will use Anaconda, with its associated package manager, conda. It has recently become the de facto package manager/distribution for scientific use.

Python 2 vs Python 3

We are at an interesting point in Python's history. Python is currently in version 3.5 (as of the start of Bi 1x 2016). The problem is that Python 3.x is not backwards compatible with Python 2.x. Many scientific packages were written in Python 2.x, and have been very slow to update to Python 3. However, Python 3 is Python's present and future, so all packages eventually need to work in Python 3. Today, most important scientific packages work in Python 3. All of the packages we will use do, so we will use Python 3 in this course.

For those of you who are already using Anaconda with Python 2, you can create a Python 3 environment. The rest of you, please continue reading.

Downloading and installing Anaconda

Downloading and installing Anaconda is simple.

  1. Go to the Anaconda homepage and click "Download Anaconda."
  2. You will be prompted for your email address, which you should provide. You may wish to use your Caltech email address because educational users get some of the non-free goodies in Anaconda (like MKL routines that will increase performance).
  3. Be sure to download Anaconda for Python 3.5.
  4. Follow the on-screen instructions for installation.

That's it! After you do that, you will have a functioning Python distribution.

The conda package manager

conda is a package manager for keeping all of your packages up-to-date. It has plenty of functionality beyond our basic usage in class, which you can learn more about by reading the docs. We will use conda to install and update packages.

conda works from the command line. To get a command line prompt, do the following.

  • Mac: Fire up the Terminal application. It is typically in the /Applications/Utilities folder. Otherwise, hold down Command-space bar and type "terminal" in the search box, and you can select the Terminal Application.
  • Windows: Fire up PowerShell. To do this, select "Search programs and files" from the Start menu and type "powershell" and hit enter. This works on Windows 7 and presumably also on Windows 8 and 10.
  • Linux: If you're using Linux, it's a good bet you already know how to navigate a terminal.

Now that you have a command line prompt, you can start using conda. The first thing we'll do is update conda itself. To do this, enter the following on the command line:

conda update conda

If conda is out of date and needs to be updated, you will be prompted to perform the update. Just type y, and the update will proceeed.

Now that conda is updated, we'll use it to see what packages are installed. Type the following on the command line:

conda list

This gives a list of all packages and their versions that are installed. Now, we'll update all packages, so type the following on the command line:

conda update --all

You will be prompted to perform all of the updates. They may even be some downgrades. This happens when there are package conflicts where one package requires an earlier version of another. conda is very smart and figures all of this out for you, so you can almost always say "yes" (or "y") to conda when it prompts you.

Finally, we will use conda to install a package that is not included in the Anaconda distribution that we would like to use. We will install Seaborn, which is a nice package for data visualization. To do this, type the following on the command line:

conda install seaborn

You will again be prompted to approve the installation. Go for it! Seaborn is pretty cool.

Firing up a Jupyter notebook

We will use Jupyter notebooks during tutorials and also to do analysis in the homework. We will cover Jupyter notebooks in our tutorial on Python for scientific computing. For now, you should just launch a Jupyter notebook and make a plot to check to make sure your installation works. If it does not, we will troubleshoot it in class.

To launch a Jupyter notebook, enter

jupyter notebook

on the command line and hit enter. Jupyter will launch in a browser window. To the upper right, you can use a pulldown menu to create a new Python 3 Jupyter notebook. This will open a new tab or window with a fresh notebook.

In the first cell, copy and paste the code below. Then press Shift+Enter, and you should see the graphic displayed below. If you do, you have successfully set up your Python environment for Bi 1x!

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

# Generate x values
x = np.linspace(0.0, 10.0, 100)
y = x * np.exp(-x)

# Generate the plot
plt.plot(x, y, 'k-')
plt.xlabel(r'$x$', fontsize=18)
plt.ylabel(r'$y$', fontsize=18)
<matplotlib.text.Text at 0x11998b630>