Hacker stats for Boolean data¶

This document was prepared at Caltech with support financial support from the Donna and Benjamin M. Rosen Bioengineering Center.

This tutorial was generated from an Jupyter notebook. You can download the notebook here.

In this tutorial, we use data from "coin flips" to estimate the bias of the coin. More specifically, we will assume we flipped a coin $n$ times and got $h$ heads, so a fraction of $h/n$ flips landed heads. From these data, we will estimate the fraction $f$ of flips that would land heads if we repeated our coin flipping experiment again. This is the "bias" of the coin.

This is analogous to many experiments we may encounter in biology. A "coin flip" is any measurement that has a yes (heads) or no (tails) answer. In the C. elegans optogenetics experiment, "heads" is a reversal and "tails" is a non-reversal upon exposure to blue light. We will estimate $f$, the fraction of worms that reverse after exposure to blue light.

As usual, we begin by importing our favorite modules.

Our example problem¶

Let's say we have two coins, A and B, and we do an experiment where we flip them several times. For coin A, we get 28 out of 84 flips to land heads and for coin B, we get 12 out of 23 flips land heads. Our goal is to estimate $f_A$ and $f_B$, the biases of these respective coins. Remember that here, "bias" means the fraction of flips that we will land heads if we perform the same experiment of 84 or 23 flips again.

Note that this is analogous to the C. elegans reversal experiment, and you can analyze your reversal data in an analogous way.

Hacker stats¶

Hacker statistics is an umbrella term used to describe statistical analyses that use random number generation and sampling to do statistical inference as opposed to perhaps more familiar methods involving analytical calculation.

Consider one definition of probability: the probability of an outcome of an experiment is the fraction of times that outcome would be achieved if the experiment were performed over and over again. I like to think of hacker stats as a way of understanding probability by directly simulating data acquisition using a computer. That is, we use a computer to repeat the experiment over and over again.

Bootstrapping estimate for the bias¶

For coin A, we had 84 flips and got 28 heads. We could repeat this experiment in a computer by making an array of flips, assigning a value of 1 for each of the 28 heads and 0 for each of the 56 tails. We then "do" a flip by randomly selecting an element out of this array and storing it. We can do this 83 more times to get a set of 84 flips. Note that it is important that we put the flip we randomly chose back into the array each time. This is called "sampling with replacement." It is as if we did the experiment over again, where each flip has the same probability of being heads as in the original experiment. (Yes, this is exactly equivalent to drawing out of the Binom(84, 28/84), but the technique is far more general, so we will proceed as if we didn't know this convenient fact.)

This technique is an example of bootstrapping, and each set of new flips we compute is called a bootstrap sample. We are interested in the fraction of heads in each of these bootstrap samples; this fraction is called a bootstrap replicate. Let's code up this procedure.

First, let's make our array of flips based on the experimental results.

In [2]:

# Number of coin flips
n_A = 84
n_B = 23

# Number of heads
h_A = 28
h_B = 12

# Flips arrays (order does not matter)
flips_A = np.array([1] * h_A + [0] * (n_A - h_A))
flips_B = np.array([1] * h_B + [0] * (n_B - h_B))

Now we can write a function to make a bootstrap sample. The np.random.choice() function is really convenient because it randomly chooses and stores values out of an array, with replacement if desired.

This function returns a new set of flips. From these, we want to make a replicate, which is in general something we can compute from the sample. In this case, it is the fraction of flips that landed heads. Let's write a function that computes a bootstrap replicate by generating a sample and then computing the quantity of interest.

Great! Now, it would be good to write a function that draws a whole bunch of replicates, which is like doing the experiment over and over again.

And now we are ready to get our replicates! Let's draw 100,000 replicates for each of our two coins.

Confidence intervals¶

We can compute confidence intervals from these replicates. There are a few ways to define a confidence interval. I prefer this one:

If an experiment is repeated over and over again, the estimate I compute for a parameter will lie between the bounds of the 95% confidence interval for 95% of the experiments.

We simulated repeating the experiment over and over again by drawing our bootstrap replicates. To compute the middle 95% confidence interval, we can simply compute the percentiles from the replicates. We use the np.percentile() function, the second argument of which is a list of percentiles we want to compute. For the boundaries of the 95th percentile, we want the 2.5th and 97.5th percentiles.

In [7]:

print("coin A 95% conf. int.:", *np.percentile(bs_reps_A, [2.5, 97.5]))
print("coin A 95% conf. int.:", *np.percentile(bs_reps_B, [2.5, 97.5]))

coin A 95% conf. int.: 0.23809523809523808 0.44047619047619047
coin A 95% conf. int.: 0.30434782608695654 0.7391304347826086

It might be easier to look at these on a plot.

In [8]:

p = bokeh.plotting.figure(
    plot_height=150,
    plot_width=400,
    x_axis_label="bias",
    x_range=[0, 1],
    y_range=[0, 2],
    tools=[],
)
p.yaxis.ticker = [0.5, 1.5]
p.yaxis.major_label_overrides = {1.5: "Coin A", 0.5: "Coin B"}
p.yaxis.axis_line_alpha = 0

p.line(np.percentile(bs_reps_A, [2.5, 97.5]), [1.5, 1.5], line_width=2)
p.line(np.percentile(bs_reps_B, [2.5, 97.5]), [0.5, 0.5], line_width=2)
p.circle([np.median(bs_reps_A)], [1.5], size=7)
p.circle([np.median(bs_reps_B)], [0.5], size=7)

bokeh.io.show(p)

There is clearly some overlap in the confidence intervals. So, even though coin B had 52% of the flips land heads and coin A had 33% land heads, it is still hard to say definitively that coin B is more biased toward heads than coin A.

CDFs and the difference¶

While the confidence intervals for the bias of each of the coins is useful, we might be interested in quantifying the difference in the bias between the two coins. As we mentioned before, our previous bootstrapping technique was equivalent to sampling out of a Binomial distribution. But now, we want to know the difference in the bias of the two coins. It is not obvious how this should be distributed. However, using the power of bootstrapping, we can work this out easily: we just subtract the bootstrap replicates!

Let us define $\delta = f_B - f_A$, the difference of the biases. Let's compute the bootstrap replicates.

In [9]:

# Compute the replicates
delta_reps = bs_reps_B - bs_reps_A

# Compute the 95% confidence interval
np.percentile(delta_reps, [2.5, 97.5])

Out[9]:

array([-0.04089027,  0.41407867])

Notice that the confidence interval dips below zero, indicating the coin A might appear to be more biased towards heads if the experiment were repeated again.

We have summarized the distribution of $\delta$ with its 95% confidence interval. We could instead plot the distribution of $\delta$ values that we acquired by bootstrapping. To do this, we can plot a histogram to approximate the probability density function of $\delta$.

While this is descriptive, it suffers from binning bias. We had to choose a number of bins, and counts of values of δ within those bins are added up. In my opinion, a much better way to visualize data like this is with an empirical cumulative distribution function, or ECDF. If we define $F(x)$ as the ECDF,

\begin{align} F(x) = \text{fraction of samples with value } \le x. \end{align}

The iqplot.ecdf() function generates a plot of an ECDF.

Looking at the ECDF, there is about a 0.05% chance of doing the coin flips again and getting coin A to have a greater bias towards heads than coin B.

Computing environment¶

In [12]:

%load_ext watermark
%watermark -v -p numpy,bokeh,iqplot,jupyterlab

Python implementation: CPython
Python version       : 3.9.7
IPython version      : 8.1.1

numpy     : 1.21.2
bokeh     : 2.4.2
iqplot    : 0.2.4
jupyterlab: 3.3.2