(c) 2022 Justin Bois. With the exception of pasted graphics, where the source is noted, this work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.
This document was prepared at Caltech with financial support from the Donna and Benjamin M. Rosen Bioengineering Center.
This tutorial was generated from an Jupyter notebook. You can download the notebook here.
In this tutorial, we get you up and running with QIIME2, which is the software we will use to analyze our kombucha sequencing data sets. QIIME2 is one of the best documented software packages I have seen, and their tutorials are excellent. We will work through their Moving Pictures tutorial to get the hang of using QIIME2. Before working through that tutorial, you need to have a functioning QIIME2 installation. There are two options for this. First, you can install locally if you have a Mac of are using Linux. Second, you can use Amazon Web Services (AWS).
Follow these instructions to install Qiime2 locally and run all of your analyses on your own machine.
Once you complete those instructions and do
conda activate qiime2-2022.2
on the command line, you will need to install JupyterLab and wget for use with Qiime. To do that, run
conda install jupyterlab wget
on the command line. You should also enable JupyterLab to handle QIIME visualizations. To do that, run the following on the command line.
jupyter serverextension enable --py qiime2 --sys-prefix
It may also be useful to install the iqplot
and bi1x
packages in the qiime2-2022.2
environment. To do that, do the following on the command line.
pip install iqplot bi1x
After that, you should be good to go for running QIIME2 locally and you are done with this part of the tutorial, and you can proceed to do the Moving Pictures tutorial on your own machine. Note, though, that you may still use to use AWS, since the machines are likely more powerful and the calculations will be faster.
After you have created an AWS account and launched the AWS console, you can launch your instance. We have set up an Amazon Machine Image (AMI), which has the software you need for the course installed and the data sets pre-loaded. The AMI is available in Oregon (us-west-2). Be sure to select this region from the top right corner of the console.
Services
pulldown menu at the top of your screen. Images
there, click AMIs
. Owned by me
(you likely do not have any). Select instead Public images
. Bi1x_2022_sequencing_analysis
, and the class AMI should appear. If it does not, double check to make sure your region is Oregon.Bi1x_2022_sequencing_analysis
AMI listed. Right click on it and select Launch instance from AMI
. Launch an instance
page, there are several selections to make.chmod 400 ~/key_pairs/bi1x_aws_keypair.pem
on the command line.Edit
on the Network settings
tab. And then click Add security group rule
. The default will be for Custom TCP
. Change the Port Range
to 8888-8892
. Under Source type
, select Anywhere
. This will allow you to run as many as five notebooks (on ports 8888, 8889, 8890, 8891, and 8892) from AWS. then, click Add security group rule
again. Select HTTPS
from the pulldown menu. Under Source type
, select Anywhere
.Configure storage
, change the storage limit to 30 GiB.Launch instance
, the orange button to the right.After following those steps, your instance will spin up. You can view your running instances by clicking on Instances
on the dashboard on the left part of the screen.
It will take a while for your instance to spin up. When the Instance State
says running
and the Status Checks
are complete, your instance is ready for you.
Now that your instance is launched, you can connect to it using your computer and the ssh protocol. The instructions work for Windows, macOS, or Linux, assuming you have a terminal running bash or zsh. In Windows, this is accomplished using GitBash, which you should install if you have not already. For macOS, use Terminal.
key_pairs/
and that your keypair file is ~/key_pairs/bi1x_aws_keypair.pem
.Change permissions on your keypair for security. Do this in the terminal using
chmod 400 ~/key_pairs/bi1x_aws_keypair.pem
Open a new GitBash (Windows) or Terminal (macOS) window.
SSH into your instance in the terminal. To do this, click on your instance on the Instances
page in the Management Console. At the bottom of the webpage will appear information about your instance, including the IPv4 Public IP. It will look something like 54.92.67.113
. Copy this. In what following, I refer to this as <IPv4 Public IP>
. SSH into your instance by doing
ssh -i "~/key_pairs/bi1x_aws_keypair.pem" ec2-user@<IPv4 Public IP>
(optional, may only work for macOS) To avoid having to use -i "~/key_pairs/bi1x_aws_keypair.pem"
each time, you can add your keypair to your bash profile by doing
echo ssh-add --apple-use-keychain $HOME/keypairs/bi1x_aws_keypair.pem >> ~/.zshrc;
source ~/.zshrc
When you launch JupyterLab, you may want to use screen
. By running screen, your JupyterLab session will not get interrupted if you disconnect from your instance. This can happen if you have an unreliable internet connection and need to reconnect to your instance. So, on the command line in your instance, execute
screen
You can launch JupyterLab by executing
jupyter lab --no-browser
on the command line. This will launch JupyterLab. You will see output like this:
To access the notebook, open this file in a browser:
file:///home/ec2-user/.local/share/jupyter/runtime/nbserver-1821-open.html
Or copy and paste one of these URLs:
http://localhost:8888/?token=b9910e579549381a3b6dvd359fada1624bcdf718422bab95
or http://127.0.0.1:8888/?token=b9910e579549381a3b6dvd359fada1624bcdf718422bab95
Keep this window open.
In order to use JupyterLab through a browser on your machine, you need to set up a socket. To do so, open up another GitBash or Terminal window and execute the following.
ssh -i "~/key_pairs/bi1x_aws_keypair.pem" -L 8000:localhost:8888 ec2-user@<IPv4 Public IP>
This sets up a socket connecting port 8888
on your EC2 instance to port 8000
on your local machine. You can change these numbers as necessary. For example, in the URL listed above that you got with you launched JupyterLab, the port may be localhost:8889
, in which case you need to substitute 8889
for 8888
in your ssh command. You may also want a different local port if you already have a JupyterLab instance running on port 8000
, e.g., 8001
. In what follows, I will use port number 8000
and 8888
, which you will probably use 90% of the time, but you can make changes as you see fit.
After you have set up the socket, you can paste the URL given when you launched JupyterLab on your EC2 instance into your browser, but substitute 8000
for 8888
. That is, direct your browser to
http://localhost:8000/?token=b9910e579549381a3b6dvd359fada1624bcdf718422bab95
You will now have JupyterLab up and running!
Note that you may be running JupyterLab locally on your own machine. You should make sure you do not use the same port number of any JupyterLab instance running on your local machine when you launch JupyterLab on AWS. You can specify the port number to be, for example 8890, by launching JupyterLab with
jupyter lab --no-browser --port 8890
If you do that, make sure you use the corresponding port numbers when setting up your socket.
If you are not running screen, if you get detached, you need to restart your notebook. Remember that if you save often, you will not use much work at all from detachment. Furthermore, any files you generated in your analysis will still be available.
If you are using screen
, you can reattach. execute screen -r
on the command line after SSH-ing back in to your EC2 instance to do this.
You can see what screens are active by doing screen -ls
on the command line. You can also detach the current screen by using screen -d
.
As you work on notebooks and create new files you want to save, you may want to move them to your local machine. For these file, you an use scp
. Within your GitBash or Terminal window on your local machine (you probably have to open yet another), you can copy files as follows.
scp -i "~/key_pairs/bi1x_keypair.pem" ec2-user@<IPv4 Public IP>:~/my_file.csv ./
This command will copy files from your EC2 instance to your present working directory. Simply put the full path to the file you want to transfer after the colon above (remember ~/
means "home directory"). The second argument of scp
is where you want to copy the file.
Similarly, you can upload files to your EC2 instance as follows (in this example to the home directory in your instance).
scp -i "~/key_pairs/bi1x_keypair.pem" my_file.txt ec2-user@<IPv4 Public IP>:~/
When you are finished with your session, you can shut down your notebook in the browser. Then, in the terminal window, you can shut down JupyterLab by pressing Ctrl-c
. After Jupyter is terminated, you should detach your screen by doing screen -d
. Finally, you should quit your screen by doing screen -X quit
.
In the past, I have had students have their instances littered with detached screens. You should clean house from time to time and run screen -X quit
.
After you are finished with your work on your instance, you should stop your instance. To do this, go back to the Instances
page on your EC2 console. Right click your instance, and navigate to Instance State
→ Stop
. Do not terminate your instance unless you really want to. Terminating an instance will get rid of any changes you made to it.
If your instance is not stopped and you leave it running, your free AWS credits will quickly evaporate.
Now that you already have your account set up, to start using AWS again, all you need to do is:
Log in to your AWS account in the browser.
Select EC2 from the Services
pulldown menu at the top of your screen.
After selecting EC2, you will see the EC2 Dashboard on the left pane. Under Instances
there, click Instances
. Alternatively, you can also click the Running Instances
link under the Resources
main heading at the top of the page.
Right click over the name of the instance you wish to start and go to Instance State
→ Start
.
It may take a little while for your instance to get going. When the Instance State
says running
and the Status Checks
are complete, your instance is ready for you to get working.
After the class is over, you might want to terminate your instance. This is because the storage in your instance (stored using AWS's EBS, which is what keeps your repository, installations, etc., all in tact) is not free. Once the class ends, you will be removed from the AWS Organization that is funding your usage and your credit card will begin to be charged. Once your free tier accessibility expires in a year if you are new to AWS, you will start getting bills for your EBS usage. These get wiped if you terminate your instance and you will not get billed.
Now that you are finished setting up AWS, you can proceed to do the Moving Pictures tutorial.