Using QIIME2¶

(c) 2022 Justin Bois. With the exception of pasted graphics, where the source is noted, this work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

This document was prepared at Caltech with financial support from the Donna and Benjamin M. Rosen Bioengineering Center.

This tutorial was generated from an Jupyter notebook. You can download the notebook here.

In this tutorial, we get you up and running with QIIME2, which is the software we will use to analyze our kombucha sequencing data sets. QIIME2 is one of the best documented software packages I have seen, and their tutorials are excellent. We will work through their Moving Pictures tutorial to get the hang of using QIIME2. Before working through that tutorial, you need to have a functioning QIIME2 installation. There are two options for this. First, you can install locally if you have a Mac of are using Linux. Second, you can use Amazon Web Services (AWS).

Option 1: Install QIIME2 locally¶

Follow these instructions to install Qiime2 locally and run all of your analyses on your own machine.

Once you complete those instructions and do

conda activate qiime2-2022.2

on the command line, you will need to install JupyterLab and wget for use with Qiime. To do that, run

conda install jupyterlab wget

on the command line. You should also enable JupyterLab to handle QIIME visualizations. To do that, run the following on the command line.

jupyter serverextension enable --py qiime2 --sys-prefix

It may also be useful to install the iqplot and bi1x packages in the qiime2-2022.2 environment. To do that, do the following on the command line.

pip install iqplot bi1x

After that, you should be good to go for running QIIME2 locally and you are done with this part of the tutorial, and you can proceed to do the Moving Pictures tutorial on your own machine. Note, though, that you may still use to use AWS, since the machines are likely more powerful and the calculations will be faster.

Option 2: Use AWS¶

1. Launch your instance¶

After you have created an AWS account and launched the AWS console, you can launch your instance. We have set up an Amazon Machine Image (AMI), which has the software you need for the course installed and the data sets pre-loaded. The AMI is available in Oregon (us-west-2). Be sure to select this region from the top right corner of the console.

To launch an instance with this AMI, choose EC2 among the services available from your AWS console. You can select EC2 from the Services pulldown menu at the top of your screen.
After selecting EC2, you will see a menu of options the left pane. Under Images there, click AMIs.
The resulting menu will default to AMIs Owned by me (you likely do not have any). Select instead Public images.
In the search menu, search for Bi1x_2022_sequencing_analysis, and the class AMI should appear. If it does not, double check to make sure your region is Oregon.
You will see the Bi1x_2022_sequencing_analysis AMI listed. Right click on it and select Launch instance from AMI.
In the Launch an instance page, there are several selections to make.
- You can name (tag) your instance. Maybe something like "Bi 1x antibiotic resistance soil analysis."
- The AMI tab should already be filled out.
- For instance type, I recommend selecting a c5.xlarge, which has 4 cores and 8 GB of RAM.
- You will need to make a keypair the first time you log in. Specify a keypair name and use RSA encryption. You should choose a .pem key unless you will use PuTTY to log in to your instance. The keypair will be provided to you only once, so download it and store it locally on your machine. DO NOT, I repeat, DO NOT store it store it in any git repository, or anything that is backed up to the cloud, like Dropbox. ONLY store it locally on your machine and never, ever let it out to the internet. If it is your second time launching, you can use an existing key pair. After you store your keypair, you may need to make its permissions more restrictive, e.g., by doing chmod 400 ~/key_pairs/bi1x_aws_keypair.pem on the command line.
- Click Edit on the Network settings tab. And then click Add security group rule. The default will be for Custom TCP. Change the Port Range to 8888-8892. Under Source type, select Anywhere. This will allow you to run as many as five notebooks (on ports 8888, 8889, 8890, 8891, and 8892) from AWS. then, click Add security group rule again. Select HTTPS from the pulldown menu. Under Source type, select Anywhere.
- Under Configure storage, change the storage limit to 30 GiB.
Now that the network configuration is set up, click Launch instance, the orange button to the right.

After following those steps, your instance will spin up. You can view your running instances by clicking on Instances on the dashboard on the left part of the screen.

It will take a while for your instance to spin up. When the Instance State says running and the Status Checks are complete, your instance is ready for you.

2. Connect to your instance¶

Now that your instance is launched, you can connect to it using your computer and the ssh protocol. The instructions work for Windows, macOS, or Linux, assuming you have a terminal running bash or zsh. In Windows, this is accomplished using GitBash, which you should install if you have not already. For macOS, use Terminal.

Identify where you put your keypair file. For the purposes of this exercise, I will assume that you have a directory in your home directory called key_pairs/ and that your keypair file is ~/key_pairs/bi1x_aws_keypair.pem.
Change permissions on your keypair for security. Do this in the terminal using

chmod 400 ~/key_pairs/bi1x_aws_keypair.pem
Open a new GitBash (Windows) or Terminal (macOS) window.
SSH into your instance in the terminal. To do this, click on your instance on the Instances page in the Management Console. At the bottom of the webpage will appear information about your instance, including the IPv4 Public IP. It will look something like 54.92.67.113. Copy this. In what following, I refer to this as <IPv4 Public IP>. SSH into your instance by doing

ssh -i "~/key_pairs/bi1x_aws_keypair.pem" ec2-user@<IPv4 Public IP>
(optional, may only work for macOS) To avoid having to use -i "~/key_pairs/bi1x_aws_keypair.pem" each time, you can add your keypair to your bash profile by doing

echo ssh-add --apple-use-keychain $HOME/keypairs/bi1x_aws_keypair.pem >> ~/.zshrc; source ~/.zshrc

3. Launch screen (optional)¶

When you launch JupyterLab, you may want to use screen. By running screen, your JupyterLab session will not get interrupted if you disconnect from your instance. This can happen if you have an unreliable internet connection and need to reconnect to your instance. So, on the command line in your instance, execute

screen

4. Launch JupyterLab¶

You can launch JupyterLab by executing

jupyter lab --no-browser

on the command line. This will launch JupyterLab. You will see output like this:

To access the notebook, open this file in a browser:
    file:///home/ec2-user/.local/share/jupyter/runtime/nbserver-1821-open.html
Or copy and paste one of these URLs:
    http://localhost:8888/?token=b9910e579549381a3b6dvd359fada1624bcdf718422bab95
 or http://127.0.0.1:8888/?token=b9910e579549381a3b6dvd359fada1624bcdf718422bab95

Keep this window open.

In order to use JupyterLab through a browser on your machine, you need to set up a socket. To do so, open up another GitBash or Terminal window and execute the following.

ssh -i "~/key_pairs/bi1x_aws_keypair.pem" -L 8000:localhost:8888 ec2-user@<IPv4 Public IP>

This sets up a socket connecting port 8888 on your EC2 instance to port 8000 on your local machine. You can change these numbers as necessary. For example, in the URL listed above that you got with you launched JupyterLab, the port may be localhost:8889, in which case you need to substitute 8889 for 8888 in your ssh command. You may also want a different local port if you already have a JupyterLab instance running on port 8000, e.g., 8001. In what follows, I will use port number 8000 and 8888, which you will probably use 90% of the time, but you can make changes as you see fit.

After you have set up the socket, you can paste the URL given when you launched JupyterLab on your EC2 instance into your browser, but substitute 8000 for 8888. That is, direct your browser to

http://localhost:8000/?token=b9910e579549381a3b6dvd359fada1624bcdf718422bab95

You will now have JupyterLab up and running!

Note that you may be running JupyterLab locally on your own machine. You should make sure you do not use the same port number of any JupyterLab instance running on your local machine when you launch JupyterLab on AWS. You can specify the port number to be, for example 8890, by launching JupyterLab with

jupyter lab --no-browser --port 8890

If you do that, make sure you use the corresponding port numbers when setting up your socket.

5. If you get detached¶

If you are not running screen, if you get detached, you need to restart your notebook. Remember that if you save often, you will not use much work at all from detachment. Furthermore, any files you generated in your analysis will still be available.

If you are using screen, you can reattach. execute screen -r on the command line after SSH-ing back in to your EC2 instance to do this.

You can see what screens are active by doing screen -ls on the command line. You can also detach the current screen by using screen -d.

6. Copying results to and from AWS to your local machine¶

As you work on notebooks and create new files you want to save, you may want to move them to your local machine. For these file, you an use scp. Within your GitBash or Terminal window on your local machine (you probably have to open yet another), you can copy files as follows.

scp -i "~/key_pairs/bi1x_keypair.pem" ec2-user@<IPv4 Public IP>:~/my_file.csv ./

This command will copy files from your EC2 instance to your present working directory. Simply put the full path to the file you want to transfer after the colon above (remember ~/ means "home directory"). The second argument of scp is where you want to copy the file.

Similarly, you can upload files to your EC2 instance as follows (in this example to the home directory in your instance).

scp -i "~/key_pairs/bi1x_keypair.pem" my_file.txt ec2-user@<IPv4 Public IP>:~/

7. Exiting¶

When you are finished with your session, you can shut down your notebook in the browser. Then, in the terminal window, you can shut down JupyterLab by pressing Ctrl-c. After Jupyter is terminated, you should detach your screen by doing screen -d. Finally, you should quit your screen by doing screen -X quit.

In the past, I have had students have their instances littered with detached screens. You should clean house from time to time and run screen -X quit.

After you are finished with your work on your instance, you should stop your instance. To do this, go back to the Instances page on your EC2 console. Right click your instance, and navigate to Instance State → Stop. Do not terminate your instance unless you really want to. Terminating an instance will get rid of any changes you made to it.

8. Seriously. Stop your instances if you are not using them.¶

If your instance is not stopped and you leave it running, your free AWS credits will quickly evaporate.

9. Restarting your instances after the initial setup¶

Now that you already have your account set up, to start using AWS again, all you need to do is:

Log in to your AWS account in the browser.
Select EC2 from the Services pulldown menu at the top of your screen.
After selecting EC2, you will see the EC2 Dashboard on the left pane. Under Instances there, click Instances. Alternatively, you can also click the Running Instances link under the Resources main heading at the top of the page.
Right click over the name of the instance you wish to start and go to Instance State → Start.

It may take a little while for your instance to get going. When the Instance State says running and the Status Checks are complete, your instance is ready for you to get working.

10. Terminate your instances after the class is over¶

After the class is over, you might want to terminate your instance. This is because the storage in your instance (stored using AWS's EBS, which is what keeps your repository, installations, etc., all in tact) is not free. Once the class ends, you will be removed from the AWS Organization that is funding your usage and your credit card will begin to be charged. Once your free tier accessibility expires in a year if you are new to AWS, you will start getting bills for your EBS usage. These get wiped if you terminate your instance and you will not get billed.

Now that you are finished setting up AWS, you can proceed to do the Moving Pictures tutorial.