Bi 1x 2016: Nonlinear Regression

This tutorial was generated from a Jupyter notebook. You can download the notebook here. You can download the data set used in this tutorial here.

In [1]:
import numpy as np
import scipy.optimize
import matplotlib.pyplot as plt
import seaborn as sns

# Import magic function for graphics in IPython notebook
%matplotlib inline
%config InlineBackend.figure_formats = {'png', 'retina'}

# Pretty plotting
rc = {'lines.linewidth': 2, 
      'axes.labelsize': 18, 
      'axes.titlesize': 18, 
      'axes.facecolor': 'DFDFE5'}
sns.set_context('notebook', rc=rc)
sns.set_style('darkgrid', rc=rc)

In this tutorial, we will expand on our curve fitting techniques to include nonlinear regression. There is not much difference syntactically than the linear regression we did in our very first tutorial. Nonetheless, this is good practice, because you will need to perform a nonlinear regression in the Drosophila embryogenesis module.

Our sample data set

The data set we will consider in this problem comes from Margot Quinlan's lab at UCLA. You can download the data set here.

The authors were investigating the biochemistry of Spire-actin interactions. Actin is an important protein in the eukaryotic cytoskeleton. Spire is an actin binding protein that can nucleate actin filaments. In particular, it has four domains (called $S_A$, $S_B$, $S_C$, and $S_D$), which bind monomeric actin. These four domains, acting in concert, can line up actin monomers to help in nucleation. In this tutorial, we will determine the dissociation constant, $K_d$, describing binding of $S_D$ to monomeric actin.

The strategy is to perform a titration experiment and then use nonlinear regression to determine $K_d$. Consider the chemical reaction describing $S_D$-actin binding.

\begin{align} \text{actin}\cdot S_D \rightleftharpoons \text{actin} + S_D, \end{align}

which has dissociation constant $K_d$. Let $c_a$ be the equilibrium concentration of actin and $c_d$ be the equilibrium concentration of $S_D$, and $c_{ad}$ be the equilibrium concentration of bound actin-$S_D$. Then, at equilibrium,

\begin{align} K_d = \frac{c_a c_d}{c_{ad}}. \end{align}

Now, if we start with a total actin concentration of $c_a^0$ and a total $S_D$ concentration of $c_d^0$, we also have

\begin{align} c_a^0 = c_a + c_{ad}, \\[1mm] c_d^0 = c_d + c_{ad}, \end{align}

by conservation of mass.

With these relations, we can now write $c_{ad}$ in terms of $c_a^0$ and $c_d^0$, which are known quantities (this is what we pipetted into our solution).

\begin{align} K_d &= \frac{(c_a^0 - c_{ad})(c_d^0 - c_{ad})}{c_{ad}},\\[1mm] \Rightarrow\;&\;\;c_{ad}^2 - (K_d + c_a^0 + c_d^0)c_{ad} + c_a^0 c_d^0 = 0. \end{align}

The solution to this quadratic equation gives $c_{ad}$ as a function of $K_d$. Note that we must choose one of the two roots, the one that is physical. The physical root satisfies $0 < c_{ad} < \min(c_a^0, c_d^0)$. In this case, it is

\begin{align} c_{ad} = \frac{1}{2}\left(K_d + c_a^0 + c_d^0 - \sqrt{\left(K_d + c_a^0 + c_d^0\right)^2 - 4c_a^0c_d^0}\right). \end{align}

So, since we know $c_a^0$ and $c_d^0$, if we can measure $c_{ad}$, we can compute $K_d$. In a titration experiment, we fix $c_d^0$ and vary $c_a^0$, and measure $c_{ad}$ to get a curve. From the curve, we can perform a regression to get $K_d$. Example curves are shown below.

We can write a function to compute $c_{ad}$ for a given $K_d$, $c_a^0$, and $c_d^0$. In an experiment, $c_a^0$ and $c_d^0$ are fixed and known; they are what we pipetted into the reaction solution.

In [25]:
def conc_ad(c_a_0, K_d, c_d_0):
    """
    Compute concentration of actin-S_D for a given value of c_a_0 and c_d_0.
    """
    b = -(K_d + c_a_0 + c_d_0)
    c = c_a_0 * c_d_0
    return (-b - np.sqrt(b**2 - 4*c)) / 2

So, since we know $c_a^0$ and $c_d^0$, if we can measure $c_{ad}$, we can compute $K_d$. In a titration experiment, the authors fix $c_d^0$ at 5 nM and vary $c_a^0$, and measure $c_{ad}$ to get a curve. From the curve, we can perform a regression to get $K_d$. Example curves are shown below.

In [27]:
# Values of K_d to consider, units of micromolar (uM)
K_d_vals = [0.1, 0.3, 1.0, 3.0, 10.0]

# Fixed S_D concentration, units of micromolar (uM)
c_d_0 = 0.005

# Varied actin concentration for plotting (uM)
c_a_0 = np.linspace(0.0, 10.0, 200)

# Make curves and plot
colors = sns.color_palette('Blues', 7)
for i, K_d in enumerate(K_d_vals):
    # Compute c_ad over the values of c_a_0.
    c_ad = conc_ad(c_a_0, K_d, c_d_0)

    # Make plot
    label = u'$K_d = $%g µm' % K_d
    plt.plot(c_a_0, c_ad, '-', color=colors[i+2], label=label)
    plt.xlabel(u'$c_a^0$ (µm)')
    plt.ylabel(u'$c_{ad}^0$ (µm)')
    plt.legend(loc='lower right')

The problem with this approach is that we do not have a direct way of measuring $c_{ad}$. The authors instead employed fluorescence anisotropy. I will not go into the details here of how it works, but will simply say that larger complexes rotate more slowly, and therefore give a higher fluorescence anisotropy signal (which is dimensionless) than do smaller complexes.

So, the authors fluorescently tagged $S_D$. When free in solution, this molecule gives an anisotropy signal of $r_f$. When bound to actin, it gives an anisotropy signal of $r_b$. So, the total anisotropy signal we could detect is

\begin{align} r = \frac{1}{c_{d}^0}\,\left(r_f c_{d} + r_b c_{ad}\right). \end{align}

Clearly, when all $S_{D}$ is free, the anisotropy signal is $r_f$ and when all $S_{D}$ is bound to actin, the signal is $r_b$. Remembering our conservation of mass, $c_{d} = c_{d}^0 - c_{ad}$, we have

\begin{align} r = \frac{1}{c_{d}^0}\,\left(r_f (c_{d}^0 - c_{ad}) + r_b c_{ad}\right) = r_f + \frac{r_b-r_f}{c_{d}^0}\, c_{ad}. \end{align}

Now, returning to our equilibrium expression, we have

\begin{align} c_{ad} = \frac{1}{2}\left(K_d + c_a^0 + c_{d}^0 - \sqrt{\left(K_d + c_a^0 + c_{d}^0\right)^2 - 4c_a^0c_{d}^0}\right), \end{align}

so we can write the measured anisotropy $r$ as a function of $K_d$ and the known quantities $c_a^0$ and $c_{d}^0$. Let's write a function for the anisotropy.

In [49]:
def anisotropy(c_a_0, K_d, r_f, r_b):
    """
    Anisotropy measured from a solution of actin and SD.
    
    Note, we do not pass c_d_0 into this function, assuming it has scope
    beyond the function.
    """
    return r_f + (r_b - r_f) / c_d_0 * conc_ad(c_a_0, K_d, c_d_0)

So, we now have a theoretical curve for anisotropy. We will load our data, plot it, and perform a regression to find $K_d$. Note, though, that we have three parameters for our regression, $K_d$, $r_f$, and $r_b$, since the latter two are not known a priori.

Performing a regression

We proceed to do the curve fit as we did in our first tutorial. First, we will load and plot the data.

In [50]:
# Load in data
data = np.loadtxt('rasson_et_al.csv', comments='#', delimiter=',');
c_a_0 = data[:,0]
r = data[:,1]

# Make plot
plt.plot(c_a_0, r, '.', markersize=12)
plt.margins(0.02)
plt.xlabel('$c_a^0$ (µM)')
plt.ylabel('anisotropy');
Out[50]:
<matplotlib.text.Text at 0x11a77b2e8>

Now, let's proceed with our curve fit. We use exactly the same syntax as before. We need to provide a guess for the three fit parameters.

In [51]:
# Set up initial guess
K_d_guess = 1  # µM
r_f_guess = 60
r_b_guess = 115
p0 = np.array([K_d_guess, r_f_guess, r_b_guess])

Now we are ready to left scipy.optimize.curve_fit() do its job, just like in the first tutorial.

In [61]:
# Perform the curve fit
popt, _ = scipy.optimize.curve_fit(anisotropy, c_a_0, r, p0)

# Show the result
print("""
K_d =   {0:.2f} µM
r_f =  {1:.2f}
r_b = {2:.2f}""".format(*popt))
K_d =   0.18 µM
r_f =  59.39
r_b = 114.46

That's all there is to it! Now, let's plot the resulting curve fit.

In [62]:
# Get smooth curve
c_a_0_smooth = np.linspace(0, 35, 200)
r_fit = anisotropy(c_a_0_smooth, *popt)

# Plot results
plt.plot(c_a_0_smooth, r_fit, lw=2, color=sns.color_palette()[2])
plt.plot(c_a_0, r, '.', markersize=12)
plt.margins(0.02)
plt.xlabel('$c_a^0$ (µM)')
plt.ylabel('anisotropy');

A watch-out

Note that $K_d$ must be positive by construct, as must $r_f$ and $r_b$. The scipy.optimize.curve_fit() function has no way of knowing this. We could, instead, fit the logarithm of the respective values, which keeps us out of trouble. We just need to re-write our anisotropy() function.

In [65]:
def anisotropy_with_logs(c_a_0, log_K_d, log_r_f, log_r_b):
    """
    Anisotropy measured from a solution of actin and SD.
    
    Note, we do not pass c_d_0 into this function, assuming it has scope
    beyond the function.
    """
    K_d = np.exp(log_K_d)
    r_f = np.exp(log_r_f)
    r_b = np.exp(log_r_b)
    
    return anisotropy(c_a_0, K_d, r_f, r_b)

# Perform the curve fit
popt, _ = scipy.optimize.curve_fit(anisotropy_with_logs, c_a_0, r, np.log(p0))

# Show the result, being sure to exponentiate what the solver spit out
print("""
K_d =   {0:.2f} µM
r_f =  {1:.2f}
r_b = {2:.2f}""".format(*np.exp(popt)))
K_d =   0.18 µM
r_f =  59.39
r_b = 114.46

We get exactly the same result.