(c) 2020 Justin Bois. This work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

This document was prepared at Caltech with support financial support from the Donna and Benjamin M. Rosen Bioengineering Center.

*This tutorial was generated from an Jupyter notebook. You can download the notebook here.*

You should have already installed a Python distribution using Anaconda. Anaconda contains most of what we need to do scientific computing with Python. At the most basic level, it has Python 3.7. It contains other packages we will make heavy use of, the three most important ones being NumPy, Bokeh, and Jupyter. We will also make heavy use of and scikit-image throughout the course. Importantly, it also has JupyterLab, which we will use to do all of our assignments in Bi 1x.

In this tutorial, we will first learn some of the basics of the Python programming language at the same time exploring the properties of NumPy's very useful (and as we'll see, ubiquitous) `ndarray`

data structure, which we will refer to as a "NumPy array." Finally, we'll load some data and use Bokeh to generate plots.

You should have already learned how to launch a Jupyter notebook for JupyterLab from the tutorial on setting up your environment.

As a new programmer, your first task is to greet the world with your new set of skills! We'll start by printing "`Hello, world.`

" In a code cell in your Jupyter notebook, type the following:

In [1]:

```
print('Hello, world.')
```

Do you see "`Hello, world.`

" printed to the screen right below your command? The Python interpreter is taking your input and printing it out in the output of the cell in your Jupyter notebook.

Here we see the syntax for function calls in Python. Function arguments are enclosed in parentheses. We also learned another important piece of Python syntax. Strings are enclosed in single (or double) quotes.

Now try the following:

In [2]:

```
# This prints hello world
print ('Hello, world.')
```

Notice how even though you wrote text above the `print`

statement, only "`Hello, world.`

" is printed. The `#`

starts a comment; anything after the `#`

, will not be read or interpreted by Python. This is how you add notes to your program about what certain lines do. Including comments in your code is essential so that other people can read your code and understand what you are trying to accomplish.

To assign a variable in python, we use an `=`

, just like you would expect from any math class. Arithmetic operations are also as expected: `+`

, `-`

, `*`

, `/`

.

Try the following:

In [3]:

```
a = 3
b = a**3
c = (b + 2*a) / 2
d = (b + 2*a) // 2
print('a is', a)
print('b is', b)
print('c is', c)
print ('d is', d)
```

In the assignment of the variable `b`

, we used the `**`

operator. This means "raise to the power of." In the assignment of the variable `d`

, we used the `//`

operator. This does integer division, returning the floor of the result of the division as an integer.

A list is a mutable array, meaning it can be edited. Let's explore below! Notice that a list is created using `[]`

.

In [4]:

```
my_list = [1,2,3,4]
print('my_list is', my_list)
my_list[0] = 3.14
print('my_list is now', my_list)
print('the last element in my_list is', my_list[-1])
```

Notice that indexing is done with brackets, `[]`

. Importantly, **in Python, indexing starts are zero.** Also notice that we can index the last element of a list with `-1`

.

A **tuple** is like a list, but it's immutable, meaning that once created, it cannot be changed. Let's try the same thing as above, but with a tuple, created with `()`

. But notice that the indexing of the tuple is still denoted with `[]`

.

In [5]:

```
# Create a tuple and print it
my_tuple = (1,2,3,4)
print('my tuple is', my_tuple)
# This will make Python scream at us because tuples are immutable.
my_tuple[0] = 3.14
```

What's the error? The Python interpreter is objecting because it cannot replace the `1`

in `my_tuple[0]`

with `3.14`

; that operation is not supported. If you try printing out `my_tuple`

again, you will see it hasn't been changed.

A **string** is just a bunch of letters, or letters and numbers, strung together in an order. It can also be indexed, like we did above with the list and the tuple. Let's look at our favorite phrase.

In [6]:

```
my_string = 'Hello, world.'
print('The fifth letter in the phrase is', my_string[4])
print('The first four letters are', my_string[0:4])
```

IMPORTANT! Python interprets `[0:4]`

as [0,4), so be careful when pulling out strings of specific length. Pulling small strings out of our larger string is called slicing, and can also be done with lists and tuples. This can be very powerful, as we can even pull out pieces at regular intervals.

In [7]:

```
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
a = my_list[0:5]
print(a)
b = my_list[5:]
print(b)
c = my_list[0:10:2]
print(c)
d = my_list[1:10:3]
print(d)
```

Make sure you notice how we create lists `c`

and `d`

. We select the entries in the list from position 0 to 9, selecting every 2 or 3, respectively.

Python is object-oriented, and all values in a program are objects. An object consists of an **identity** (where it is stored in memory), a **type** (a definition of how the object is represented), and **data** (the value of the object). An object of a given type can have various methods that operate on the data of the object. How do we keep track of what type our variables are? Fortunately, Python has a function for this, called `type()`

. Let's try it out. First, we'll create some new objects.

In [8]:

```
a = 4
b = 4.6
my_list = [1, 3.49, 'bi1x']
print('the type of a is', type(a))
print('the type of b is', type(b))
print('the type of my_list is', type(my_list))
print('the type of my_list[0] is', type(my_list[0]))
print('the type of my_list[1] is', type(my_list[1]))
print('the type of my_list[2] is', type(my_list[2]))
```

What is most important to notice here is that `my_list`

is a list, and that it can contain many different objects, from numbers to strings.

The data are the numbers and values that you associate with your variable.

Finally, objects have methods that can perform operations on the data. A method is called similarly to a function. This is best seen by example.

In [9]:

```
my_list = [1 , 5 , 4 , 13 , 3 , 5 , 19 , 31 , 3 , 1 , 17]
print("the number of 5's in the list is", my_list.count(5))
print("the number of 4's in the list is", my_list.count(4))
# Sort the list in place
my_list.sort()
print(my_list)
# Tack on a string to the end of the list
my_list.append('bi1x')
print(my_list)
```

As you can see, a `list`

object has several methods including `count()`

and `sort()`

. They are called just like functions, with arguments in parentheses. The name of the method comes after the object name followed by a dot (`.`

).

The `count()`

method takes a single argument and returns the number of times that argument appears in the list. The sort function takes no arguments (but still requires open and closed parentheses to be called), and sorts the list in place. Note that `my_list`

has been changed and is now sorted. We also use the method `append()`

, which adds another element to the end of a list.

Within JupyterLab, you can see what methods and data are available by entering the object name followed by a dot (`.`

), and then pressing tab. Try it!

It is common that scientific software packages such as Matlab and Mathematica are optimized for a flavor of scientific computing (such as matrix computation in the case of Matlab) and are rather full-featured. On the other hand, Python is a generic programming language. It was not specifically designed to do scientific computing. So, plain old Python is very limited in scientific computing capability.

However, Python is very flexible and allows use of **packages**. A package contains classes, functions, attributes, data types, etc., beyond what is built in to the core language. In order to use a module, you need to import it to make it available for use. So, as we begin working on data analysis and simulations in Bi 1x, we need to import modules we will use.

It is important to note that until we import a module, its capacities are not available. Keep that in mind: *Plain old Python won't do much until you import a package.*

Let's now import one of the major workhorses of our class, NumPy!

In [10]:

```
# Importing is done with the import statement
import numpy as np
# We now have access to some handy things, i.e. np.pi
print('circumference / diameter = ', np.pi)
print('cos(pi) = ', np.cos(np.pi))
```

Notice that we used the `import <module> as <abbrev>`

construction. This enabled us to abbreviate the name of the module so we do not have to type `numpy`

each time. Also, notice that to access the (approximate) value of $\pi$ in the `numpy`

module, we prefaced the name of the attribute (`pi`

) with the module name followed by a dot (`np.`

). This is generally how you access attributes in modules.

We're already getting dangerous with Python. So dangerous, in fact, that we'll write our own function!

We will create a function that finds the roots of the quadratic equation

\begin{align} ax^2 + bx + c = 0. \end{align}To do this, we will first write a function to compute the discriminant and then one to return the roots.

In [11]:

```
def discriminant(a, b, c):
"""
Returns the discriminant of a quadratic polynomial
a * x**2 + b * x + c = 0.
"""
return b**2 - 4.0 * a * c
def roots(a, b, c):
"""
Returns the roots of the quadratic equation
a * x**2 + b * x + c = 0.
"""
delta = discriminant(a, b, c)
root_1 = (-b + np.sqrt(delta)) / (2.0 * a)
root_2 = (-b - np.sqrt(delta)) / (2.0 * a)
return root_1, root_2
```

There is a whole bunch of syntax in there to point out.

- A function is defined with the
statement. It has the function prototype, followed by a colon.`def`

**Indentation in Python matters!**Everything indented after the`def`

statement is part of the function. Once the indentation goes back to the level of the`def`

statement, you are no longer in the function.- The
`return`

statement is used to return the result of a function. If multiple objects are returned, they are separated by commas. - The text within triple quotes are
**doc strings**. They say what the function does. These are essential for people to know what your code is doing.

Now, let's test our new function out!

In [12]:

```
# Python has nifty syntax for making multiple definitions on the same line
a, b, c = 3.0, -7.0, -6.0
# Call the function and print the result.
root_1, root_2 = roots(a, b, c)
print('roots:', root_1, root_2)
```

Very nice! Now, let's try another example. This one might have a problem....

In [13]:

```
# Specify a, b, and c that will give imaginary roots
a, b, c = 1.0, -2.0, 2.0
# Call the function and print the result
root_1, root_2 = roots(a, b, c)
print('roots:', root_1, root_2)
```

Oh no! It gave us `nan`

, which means "not a number," as our roots. It also gave some warning that it encountered invalid (negative) arguments for the `np.sqrt()`

function. The roots should be $1 \pm i$, where $i = \sqrt{-1}$. We will use this opportunity to introduce Python's control flow, starting with an `if`

statement.

We will decree that our quadratic equation solver only handles real roots, so it will raise an **exception** if an imaginary root is encountered. So, we modify the `roots()`

function as follows.

In [14]:

```
def roots(a, b, c):
"""
Returns the roots of the quadratic equation
a * x**2 + b * x + c = 0.
"""
delta = discriminant(a, b, c)
if delta < 0.0:
raise RuntimeError('Imaginary roots! We only do real roots!')
else:
root_1 = (-b + np.sqrt(delta)) / (2.0 * a)
root_2 = (-b - np.sqrt(delta)) / (2.0 * a)
return root_1, root_2
```

We have now exposed the syntax for a Python `if`

statement. The conditional expression ends with a colon, just like the `def`

statement. Note the indentation of blocks of code after the conditionals. (We actually did not need the `else`

statement, because the program would just continue without the exception, but I left it there for illustrative purposes. It is actually preferred not to have the `else`

statement.)

Let's try out our new function.

In [15]:

```
# Pass in parameters that will give imaginary roots (use qd.roots)
a, b, c = 1.0, -2.0, 2.0
root_1, root_2 = roots(a, b, c)
```

This threw the appropriate exception.

If we want to do something many times, we can use a for loop. This is again best learned by example. Let's make a function that counts the number of times a substring is present in a sequence of DNA.

In [16]:

```
def n_substring(seq, substr):
"""
Given a sequence seq , returns the number of occurrances of substring.
"""
# First make sure the length of substring is shorter than seq.
if len(substr) > len(seq) :
return 0
# We loop through the sequence to check for matches
num_substr = 0
for i in range(0, len(seq)-len(substr)+1):
if seq [i:i+len(substr)] == substr:
num_substr += 1
# We are done looping now , so return the number of subsequences
return num_substr
```

Let's see how it works!

In [17]:

```
# Specify sequences
seq = 'ACTGTACGATCGAGCGATCGAGCGAGTCATTACGACTGAGATCC'
substr = 'GAT'
# Call function to get number of GAT substrings
n_gat = n_substring(seq, substr)
# Print the result
print('There are %d GATs in the sequence.' % n_gat)
```

Now that we know it works, let's look at how the loop was constructed. In the statement at the beginning of the loop, we use the function `range()`

to define an *iterator*. Calling `range(n)`

creates an object akin to a list of integers from `0`

to `n-1`

. The `for`

statement says that the variable `i`

will successively take the values of the iterator.

Before concluding our quick trip through the very basics of Python and on to NumPy, I want to show a very handy tool in Python, **keyword arguments**. Before, when we defined a function, we specified its arguments as variable names separated by commas in the def statement. We can also specify keyword arguments. Here is an example from our quadratic equation solver.

In [18]:

```
def roots(a, b, c, print_discriminant=False, message_to_the_world='Bi 1x rules'):
"""
Returns the roots of the quadratic equation
a * x**2 + b * x + c = 0.
If print_discriminant is True, prints discriminant to screen
"""
delta = discriminant(a, b, c)
if print_discriminant:
print('discriminant =', delta)
if message_to_the_world is not None:
print('\n' + '*'*len(message_to_the_world))
print(message_to_the_world)
print('*'*len(message_to_the_world) + '\n')
if delta < 0.0:
raise ValueError('Imaginary roots! We only do real roots!')
else:
root_1 = (-b + np.sqrt(delta)) / (2.0 * a)
root_2 = (-b - np.sqrt(delta)) / (2.0 * a)
return root_1, root_2
```

The function quadratic_roots now has two keyword arguments. They are signified by the equals sign.

The function has three required arguments, `a`

, `b`

, `c`

. If the keyword arguments are omitted in the function call, they take on the default values, as specified in the function definition.

In the example, the default for `print_discriminant`

is `False`

and the default for `message_to_the_world`

is `'Bi 1x rules!'`

. Furthermore, ordering of keyword arguments does not matter when calling the function. They are called in the function similarly to the way they are defined in the function definition.

Let's try it!

In [19]:

```
a, b, c = 3.0, -7.0, -6.0
root_1, root_2 = roots(a, b, c)
print('roots:', root_1, root_2)
```

Since we did not specify the keyword arguments, the defaults were used. We can specify other values.

In [20]:

```
root_1, root_2 = roots(
a,
b,
c,
print_discriminant=True,
message_to_the_world="Bi 1x TAs are the best!"
)
print("roots:", root_1, root_2)
```

In the above function call to `roots()`

, notice that I put the arguments on separate lines. This is valid Python syntax and, when used appropriately, can make your code more readable.

Here are some words to live by: **If you are trying to do a task that you think might be common, it's probably part of NumPy or some other package.** Look, or ask Google, first. In this case, NumPy has a function called `roots()`

that computes the roots of a polynomial. To figure out how to use it, we can either look at the doc string, or look in the NumPy and SciPy documentation online (the documentation for `np.roots()`

is available here). To look at the doc string, you can enter the following in a code cell:

In [21]:

```
np.roots?
```

We see that we need to pass the coefficients of the polynomial we would like the roots of using an "`array_like`

" object. We will discuss what this means in a moment, but for now, we will just use a **list** to specify our coefficients and call the `np.roots()`

function.

In [22]:

```
# Define the coefficients in a list (using square brackets)
coeffs = [3.0, -7.0, -6.0]
# Call np.roots. It returns an np.ndarray with the roots
roots = np.roots(coeffs)
print('Roots for (a, b, c) = (3, -7, -6):', roots)
# It even handles complex roots!
roots = np.roots([1.0, -2.0, 2.0])
print('Roots for (a, b, c) = (1, -2, 2): ', roots)
```

`array_like`

data types¶In the previous example, we used a list as an `array_like`

data type. Python has several native data types. We have already mentioned `int`

s and `float`

s. We just were not very explicit about it. Python's native `array_like`

data types are **lists** and **tuples**. Internally, these things are converted into **NumPy arrays**, which is the most often used `array_like`

data type we will use.

Lists and tuples can be useful, but for many applications in data analysis, the `np.ndarray`

, which we will colloquially call a "**NumPy array**," is most often used. They are created using the `np.array`

function with a list or tuple as an argument. Once created, we can do all sorts of things with them. Let's play!

Let's make some arrays to see what they look like:

In [23]:

```
array_1 = np.array([1, 2, 3, 4])
print('array 1:')
print(array_1, '\n')
array_2 = np.array([[1, 2], [1, 2]])
print('array_2:')
print(array_2, '\n')
array_3 = np.array([[1, 2, 3], [1, 2, 3], [1, 2, 3]])
print('array_3:')
print(array_3)
```

Sometimes you want an array of all zero values, and that can also be done with `numpy`

.

In [24]:

```
zero_array = np.zeros((3,4))
print(zero_array, '\n')
print('the dimesions are', zero_array.shape)
```

Now let's see how we can do operations on some arrays.

In [25]:

```
a = np.array([1, 2, 3])
b = np.array([4.0, 5.0, 6.0])
# Arithmetic operations are done elementwise
print('a: ', a)
print('b: ', b)
print('a + b: ', a + b)
print('a * b: ', a * b)
print('1.0 + a:', 1.0 + a)
print('a**2: ', a**2)
print('b**a: ', b**a)
```

We can check the data type of our matrix.

In [26]:

```
print(a.dtype)
print(b.dtype)
```

We can also change the type of the entries of our arrays, for example, from integers to floating point.

In [27]:

```
array_a = a.astype(float)
print(array_a)
array_b = b.astype(int)
print(array_b)
```

Slicing is also intuitive.

In [28]:

```
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
print(a, "\n")
print(a[0:3, 2:3])
```

In [29]:

```
# view second column
a[:,1]
```

Out[29]:

In [30]:

```
# see all entries below the value of 10
a[a<10]
```

Out[30]:

Using slices, we can reassign values to the entries in a NumPy array. For example, say we wanted the third row to be all zeros.

In [31]:

```
a[2, :] = np.zeros(a[2 ,:].shape)
print(a)
```

We can also reshape arrays.

In [32]:

```
a = a.reshape (2 ,8)
print(a, '\n')
a = a.reshape(4,4)
print(a)
```

Here are some useful NumPy functions we think you might want to use! Go through these one-by-one in separate cells to see what they do.

In [33]:

```
# create evenly spaced points
np.linspace(0, 1, 10)
# matrix or vector dot products
np.dot(a, a)
# concatenate in row dimensions
np.concatenate((a, a))
# concatenate in the column dimension
np.concatenate((a, a), axis=1)
# transpose (the result of this final operation will be displayed)
np.transpose(a)
```

Out[33]:

Bokeh is a tool for generating interactive plots that can be views in a browser (and therefore also Jupyter notebooks). Its syntax is intuitive, and best learned by example. First, we need to import packages and set things up so that the graphics can be displayed in the notebook.

In [34]:

```
# These are the necessary submodules
import bokeh.io
import bokeh.plotting
# This is necesssary to display in the notebook
bokeh.io.output_notebook()
```

Now that we have Bokeh ready, we can make some plots.

In [35]:

```
# Make an x-variable for plotting
x = np.linspace(0, 2 * np.pi, 200)
# This is a nice function
y_1 = np.exp(np.sin(x))
# We can make another one
y_2 = np.exp(np.cos(x))
# We can make some random data to plot as well
x_rand = np.random.rand(20) * 2 * np.pi
y_rand = np.random.rand(20) * 3.0
# Set up plot
p = bokeh.plotting.figure(
frame_width=500, frame_height=300, x_axis_label="x", y_axis_label="y"
)
# Populate the plot with glyphs
p.line(x, y_1, line_width=2)
p.line(x, y_2, line_width=2, color="orange")
p.circle(x_rand, y_rand, color="black")
# Show the plot
bokeh.io.show(p)
```

To export the figure for insertion into a document, you can click on the disk image in the Bokeh tools. This will give you a PNG, but for publications, you should use vector graphics. There are ways to save Bokeh plots as vector graphics, but we will not use them here.

**PEP**s (Python Enhancement Proposals) document suggestions and guidelines for the Python language, its development, etc. My personal favorite is PEP8, the Style Guide for Python Code. This was largely written by Guido von Rossum, the inventor of Python and its benevolent dictator for life. It details how you should style your code. As Guido says, code is much more often read than written. I strongly urge you to follow PEP8 the best you can. It's a lot to read, so I will highlight important points here. If you follow these guidelines, your code will be much more readable than otherwise. This is particularly useful because you are working in groups and the TAs need to grade your work.

- Limit line widths to 79 characters (the line break character for Python is
`\`

). - Use whitespace around low-priority operators. E.g.,
`x = a**2 + b**2`

has whitespace around the`+`

operator. Another good example is`y = m*x + b`

. - In function calls, use a space after each comma. Use no spaces before and after the equals sign when using keyword arguments. E.g.,
`my_fun(a, b, c=True)`

. - Do not use excess space when indexing. E.g.,
`a[i]`

, not`a [ i ]`

. - Function names should be lowercase, with words separated by underscores as necessary to improve readability.
- Comment lines should appear immediately before the code they describe. Use in-line comments sparingly.

For me, an easy way to keep my code looking pretty is to use Black. If you have installed black and black cell magic, you can use Black in the Jupyter notebook. In a cell at the beginning of your notebook, execute the following code.

In [36]:

```
%load_ext blackcellmagic
```

Then, in any code cell where you want Black to adjust the formatting, put `%%black`

at the top of the cell and execute it. The code in the cell will not be executed; rather, it will be reformatted according to Black's style.

As a case study of how we can load in data and do analysis with it, we will consider a data set from one of the classic organisms in biology: Darwin's finches. Peter and Rosemary Grant have been working on the Galápagos island of Daphne for over forty years. During this time, they have collected lots and lots of data about physiological features of finches. A couple years ago, they published a book with a summary of some of their major results (Grant P. R., Grant B. R., Data from: 40 years of evolution. Darwin's finches on Daphne Major Island, Princeton University Press, 2014). They made their data from the book publicly available via the Dryad Digital Repository.

We will investigate their data on the heritability of beak depth (the distance, top to bottom, of a closed beak) in the ground finch *Geospiza fortis*. The data set consists of the maternal beak depth, the paternal beak depth, and the mean beak depth of their offspring. You can download the data set, which I got from Dryad and tidied up into a CSV file, here. Our goal it so do a linear regression to see how parental beak depth is related to that of the offspring.

First, we will load in the data set. Before attempting to load the data set, it is important to know where it is on your machine. It should be in the same directory from which you launched Jupyter lab. Otherwise, you can enter the full path below. It is in a comma delimited text file. For this particular file, the data are preceded by comments about their content, with each comment line starting with a `#`

. We will use the wonderful package pandas to read in the data.

In [37]:

```
import pandas as pd
# Load the data
df = pd.read_csv('grant_and_grant_2014.csv', comment='#')
# Take a look
df.head()
```

Out[37]:

This pandas `DataFrame`

has 413 measurements with three columns. We can index the columns to get Numpy arrays containing the data we want.

In [38]:

```
offspring_bd = df['Average offspring beak depth (mm)'].values
paternal_bd = df['Paternal beak depth (mm)'].values
maternal_bd = df['Maternal beak depth (mm)'].values
```

We want to plot mean offspring beak depth vs. mean parental beak depth, so we need to compute the mean parental beak depth. Conveniently, we do not have to write a loop to compute this for each parent pair. NumPy arrays allow for convenient elementwise calculation in a single command.

In [39]:

```
parental_bd = (maternal_bd + paternal_bd) / 2
```

Now we have the arrays we need to make our plot.

In [40]:

```
p = bokeh.plotting.figure(
frame_width=500,
frame_height=300,
x_axis_label="parental beak depth (mm)",
y_axis_label="offpsring beak depth (mm)",
)
p.circle(parental_bd, offspring_bd, alpha=0.3)
bokeh.io.show(p)
```

I chose to use the `alpha`

keyword argument in the `p.circle()`

function because it helps visualize overlapping data points. We see that parental and offspring beak depth are correlated, as we might expect.

Let's perform a linear regression to get the slope and intercept. To perform the linear regression, we will use `scipy.optimize.curve_fit()`

. This function takes a set of data and returns the parameters of the model function that best describe the data. Its first input is the model function (which we have defined as linear); next is the $x$-data (in our case parental beak depth, `parental_bd`

); the third input is the $y$-data (offspring beak depth, `offspring_bd`

). The last piece that we will need to address is a best guess for our parameters. In this case, we have to provide two values, the slope and the intercept. `curve_fit()`

works by finding the curve that minimizes the distance of every point from the curve (in our case, this will be minimizing the vertical distance between measured data points and the line describing the data).

More explicitly, `scipy.optimize.curve_fit()`

will take the sum of the squares of the vertical differences of each data point to the model function and try to minimize this value by adjusting the model parameters. For nonlinear fits, you will need to provide reasonable guesses for the best-fit parameters, as there may be many local minima for minimizing the error. In the case of linear regression, we do not need to be too careful, as there will actually be one minimum that best fits the data. For more on the `scipy.optimize.curve_fit()`

function, please read the documentation.

When we run `scipy.optimize.curve_fit()`

, we will get two arrays out. The first array contains the optimal set of parameters for the curve fit. The second array is a 2d array with the covariance of the optimal parameters. For this course, we will ignore the second array and will only use the first array, which, for linear regressions, will give us the slope and intercept.

In [41]:

```
import scipy.optimize
# Define linear function
def linear_fun(x, slope, intercept):
return slope * x + intercept
# Compute the curve fit (Guess is unit slope and zero intercept)
popt, _ = scipy.optimize.curve_fit(linear_fun, parental_bd, offspring_bd, p0=[1, 0])
# Parse the results
slope, intercept = popt
# Print the results
print("slope =", slope)
print("intercept =", intercept, "mm")
```

So, we got our values for the slope and intercept. We can plot the best fit line along with the data.

In [42]:

```
# Plot best fit line
x = np.array([7, 12])
y = linear_fun(x, slope, intercept)
p.line(x, y, line_width=2, color='orange')
bokeh.io.show(p)
```

This does not look exactly like what we would draw by eye, but this is because of overlapping data points. This is, indeed, the best fit line.

If you want to include images in your notebook, you can do that directly in a Markdown cell using HTML tags. For example:

```
<img src='my_image.png'>
```

You may have scans of equations you want to include, and you can use this technique to do it. However, if you are displaying images from your experiments, you should load them in using `skimage`

and display them as shown in the next tutorial on image processing.

For all of your assignment, you will need to submit a Jupyter notebook and the notebook conversion of the notebook. To convert to HTML, click through the following menu option.

```
File -> Export Notebook As... -> HTML
```

Note that if you are displaying images using HTML tags, you must also include the images with your submission. The best way to do the submission is to zip everything together in a single ZIP file.

This concludes our introductory tour of Python with some NumPy, SciPy, Pandas, and Bokeh thrown in for good measure. There is still *much* to learn, but you will pick up more and more tricks and syntax as we go along.

For the next tutorial, we will use Python to do some image processing. That is, we will write code to extract data of interest from images. As you get more and more proficient, coding (particularly in Python, in my opinion) will be more and more empowering and FUN!

In [43]:

```
%load_ext watermark
%watermark -v -p numpy,scipy,pandas,bokeh,jupyterlab
```