{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Using QIIME2\n",
"\n",
"(c) 2024–25 Justin Bois. With the exception of pasted graphics, where the source is noted, this work is licensed under a [Creative Commons Attribution License CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/). All code contained herein is licensed under an [MIT license](https://opensource.org/licenses/MIT).\n",
"\n",
"This document was prepared at [Caltech](http://www.caltech.edu) with financial support from the [Donna and Benjamin M. Rosen Bioengineering Center](http://rosen.caltech.edu).\n",
"\n",
"
\n",
"\n",
"*This tutorial was generated from an Jupyter notebook. You can download the notebook [here](qiime2.ipynb).*\n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this tutorial, we get you up and running with [QIIME2](https://qiime2.org) (pronounced \"Chime Two\"), which is the software we will use to analyze our antibiotic resistance sequencing data sets. QIIME2 is one of the best documented software packages I have seen, and their tutorials are excellent. We will work through their [Moving Pictures tutorial](https://amplicon-docs.qiime2.org/en/latest/tutorials/moving-pictures.html) to get the hang of using QIIME2. Before working through that tutorial, you need to have a functioning QIIME2 installation. There are two options for this. First, you can install locally if you have a Mac of are using Linux. Second, you can use [Amazon Web Services (AWS)](http://aws.amazon.com/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Option 1: Install QIIME2 locally (recommended)\n",
"\n",
"Follow [these instructions](https://library.qiime2.org/quickstart/amplicon) to install QIIME2 locally and run all of your analyses on your own machine. The instructions are all to be run on the command line of your machine. Be sure to install the Amplicon Distribution.\n",
"\n",
"A couple of notes on the installation.\n",
"\n",
"- QIIME2 currently does not support Windows. If you want to install on Windows, you will need to use a [Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/about). If you do not want to set up a WSL, you may wish to use AWS (described below), instead.\n",
"- You will not have to install Miniconda if you already have Anaconda installed, and you can skip to the [Install the base distribution's conda environment](https://library.qiime2.org/quickstart/amplicon#id-3-install-the-base-distributions-conda-environment).\n",
"\n",
"Once you complete those instructions, be sure to activate the environment using\n",
"\n",
" conda activate qiime2-amplicon-2025.4\n",
" \n",
"on the command line. You will need to activate this QIIME2 environment every time you start a sequence analysis. \n",
"\n",
"This environment does lack a couple packages you should install. First, JupyterLab.\n",
"\n",
" conda install jupyterlab\n",
"\n",
"Next, we should use pip to install a few packages we will need. First, we'll install Polars. If you are using an a Mac with Apple Silicon (M-series processors) or Linux, do\n",
"\n",
" pip install polars\n",
"\n",
"on the command line. If you are using an Intel Mac or a computer withour AVX2 support, do\n",
"\n",
" pip install polars-lts-cpu\n",
"\n",
"Finally, you should install the `bi1x`, `pyarrow`, and `iqplot` packages.\n",
"\n",
" pip install bi1x iqplot pyarrow watermark\n",
"\n",
"You should also enable JupyterLab to handle QIIME visualizations. To do that, run the following on the command line.\n",
"\n",
" jupyter serverextension enable --py qiime2 --sys-prefix\n",
"\n",
"After that, you should be good to go for running QIIME2 locally and you are done with this part of the tutorial, and you can proceed to do the [Moving Pictures tutorial](https://amplicon-docs.qiime2.org/en/latest/tutorials/moving-pictures.html) on your own machine. Note, though, that you may still wish to use AWS, since the machines are likely more powerful and the calculations will be faster."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Option 2: Use AWS (recommended only if you fail to install locally)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Launch your instance\n",
"\n",
"After you have created an AWS account and launched the AWS console, you can launch your instance. We have set up an Amazon Machine Image (AMI), which has the software you need for the course installed and the data sets pre-loaded. The AMI is available in **Oregon** (us-west-2). Be sure to select this region from the top right corner of the console.\n",
"\n",
"1. To launch an instance with this AMI, choose EC2 among the services available from your AWS console. You can select `Compute`, followed by `EC2` from the `Services` pulldown menu at the top of your screen. \n",
"2. After selecting EC2, you will see a menu of options the left pane. Under `Images` there, click `AMIs`. \n",
"3. The resulting menu will default to AMIs `Owned by me` (you likely do not have any). Select instead `Public images`. \n",
"4. In the search menu, search for `Bi1x_2025_sequencing_analysis`, and the class AMI should appear. If it does not, double check to make sure your region is Oregon. Alternatively, if you are just doing the Moving Pictures Tutorial, you can search for `Bi1x_2025_moving_pictures`.\n",
"5. You will see the `Bi1x_2025_fall_sequencing_analysis` (and/or `Bi1x_2025_moving_pictures`) AMI listed. Right click on it and select `Launch instance from AMI`. \n",
"6. In the `Launch an instance` page, there are several selections to make.\n",
" - You can name (tag) your instance. Maybe something like \"Bi 1x antibiotic resistance soil analysis\" or \"Bi 1x moving pictures tutorial.\"\n",
" - The AMI tab should already be filled out.\n",
" - For instance type, I recommend selecting a c5.xlarge, which has 4 cores and 8 GB of RAM. It costs 17 cents per hour to use. If you are a new AWS user, the t2.micro instances are free, but are really underpowered, with a single core and only 1 GB of RAM. You may not be able to complete the analysis with a machine of this size.\n",
" - You will need to make a keypair the first time you log in. Specify a keypair name and use RSA encryption. You should choose a .pem key unless you will use PuTTY to log in to your instance. The keypair will be provided to you only once, so download it and store it locally on your machine. **DO NOT, I repeat, DO NOT store it store it in any git repository, or anything that is backed up to the cloud, like Dropbox. ONLY store it locally on your machine and never, ever let it out to the internet.** If it is your second time launching, you can use an existing key pair. After you store your keypair, you may need to make its permissions more restrictive, e.g., by doing `chmod 400 ~/key_pairs/bi1x_aws_keypair.pem` on the command line.\n",
" - Click `Edit` on the `Network settings` tab. And then click `Add security group rule`. The default will be for `Custom TCP`. Change the `Port Range` to `8888-8892`. Under `Source type`, select `Anywhere`. This will allow you to run as many as five notebooks (on ports 8888, 8889, 8890, 8891, and 8892) from AWS. then, click `Add security group rule` again. Select `HTTPS` from the pulldown menu. Under `Source type`, select `Anywhere`.\n",
" - Under `Configure storage`, change the storage limit to 30 GiB.\n",
"7. Now that the network configuration is set up, click `Launch instance`, the orange button to the right.\n",
"\n",
"After following those steps, your instance will spin up. You can view your running instances by clicking on `Instances` on the dashboard on the left part of the screen. \n",
"\n",
"It will take a while for your instance to spin up. When the `Instance State` says `running` and the `Status Checks` are complete, your instance is ready for you."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Connect to your instance\n",
"\n",
"Now that your instance is launched, you can connect to it using your computer and the *ssh* protocol. The instructions work for Windows, macOS, or Linux, assuming you have a terminal running bash or zsh. In Windows, this is accomplished using [GitBash](https://git-scm.com), which you should install if you have not already. For macOS, use Terminal.\n",
"\n",
"1. Identify where you put your keypair file. For the purposes of this exercise, I will assume that you have a directory in your home directory called `key_pairs/` and that your keypair file is `~/key_pairs/bi1x_aws_keypair.pem`.\n",
"2. Change permissions on your keypair for security. Do this in the terminal using\n",
"\n",
" `chmod 400 ~/key_pairs/bi1x_aws_keypair.pem`\n",
"\n",
"3. Open a new GitBash (Windows) or Terminal (macOS) window. \n",
"4. SSH into your instance in the terminal. To do this, click on your instance on the `Instances` page in the Management Console. At the bottom of the webpage will appear information about your instance, including the IPv4 Public IP. It will look something like `54.92.67.113`. Copy this. In what following, I refer to this as ``. SSH into your instance by doing \n",
"\n",
" `ssh -i \"~/key_pairs/bi1x_aws_keypair.pem\" ec2-user@`\n",
"\n",
"5. (optional, may only work for macOS) To avoid having to use `-i \"~/key_pairs/bi1x_aws_keypair.pem\"` each time, you can add your keypair to your bash profile by doing\n",
"\n",
" `echo ssh-add --apple-use-keychain $HOME/keypairs/bi1x_aws_keypair.pem >> ~/.zshrc;`\n",
" `source ~/.zshrc`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Launch screen (optional)\n",
"\n",
"When you launch JupyterLab, you may want to use [`screen`](https://en.wikipedia.org/wiki/GNU_Screen). By running screen, your JupyterLab session will not get interrupted if you disconnect from your instance. This can happen if you have an unreliable internet connection and need to reconnect to your instance. So, on the command line in your instance, execute\n",
"\n",
" screen"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Launch JupyterLab\n",
"\n",
"You can launch JupyterLab by executing\n",
"\n",
" jupyter lab --no-browser\n",
" \n",
"on the command line. This will launch JupyterLab. You will see output like this:\n",
"\n",
" To access the notebook, open this file in a browser:\n",
" file:///home/ec2-user/.local/share/jupyter/runtime/nbserver-1821-open.html\n",
" Or copy and paste one of these URLs:\n",
" http://localhost:8888/?token=b9910e579549381a3b6dvd359fada1624bcdf718422bab95\n",
" or http://127.0.0.1:8888/?token=b9910e579549381a3b6dvd359fada1624bcdf718422bab95\n",
"\n",
"Keep this window open.\n",
"\n",
"In order to use JupyterLab through a browser on your machine, you need to set up a socket. To do so, open up another GitBash or Terminal window and execute the following.\n",
" \n",
" ssh -i \"~/key_pairs/bi1x_aws_keypair.pem\" -L 8000:localhost:8888 ec2-user@\n",
" \n",
"This sets up a socket connecting port `8888` on your EC2 instance to port `8000` on your local machine. You can change these numbers as necessary. For example, in the URL listed above that you got with you launched JupyterLab, the port may be `localhost:8889`, in which case you need to substitute `8889` for `8888` in your ssh command. You may also want a different local port if you already have a JupyterLab instance running on port `8000`, e.g., `8001`. In what follows, I will use port number `8000` and `8888`, which you will probably use 90% of the time, but you can make changes as you see fit.\n",
"\n",
"After you have set up the socket, you can paste the URL given when you launched JupyterLab on your EC2 instance into your browser, but substitute `8000` for `8888`. That is, direct your browser to\n",
"\n",
" http://localhost:8000/?token=b9910e579549381a3b6dvd359fada1624bcdf718422bab95\n",
" \n",
"You will now have JupyterLab up and running!\n",
"\n",
"Note that you may be running JupyterLab locally on your own machine. You should make sure you do not use the same port number of any JupyterLab instance running on your local machine when you launch JupyterLab on AWS. You can specify the port number to be, for example 8890, by launching JupyterLab with\n",
"\n",
" jupyter lab --no-browser --port 8890\n",
" \n",
"If you do that, make sure you use the corresponding port numbers when setting up your socket."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. If you get detached\n",
"\n",
"If you are not running screen, if you get detached, you need to restart your notebook. Remember that if you save often, you will not use much work at all from detachment. Furthermore, any files you generated in your analysis will still be available.\n",
"\n",
"If you are using `screen`, you can reattach. execute `screen -r` on the command line after SSH-ing back in to your EC2 instance to do this.\n",
"\n",
"You can see what screens are active by doing `screen -ls` on the command line. You can also detach the current screen by using `screen -d`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Copying results to and from AWS to your local machine\n",
"\n",
"The easiest way to get files from your AWS instance to your local machine is to download them via JupyterLab's file management system.\n",
"\n",
"As you work on notebooks and create new files you want to save, you may want to move them to your local machine. For these file, you an use `scp`. Within your GitBash or Terminal window on your local machine (you probably have to open yet another), you can copy files as follows.\n",
"\n",
" scp -i \"~/key_pairs/bi1x_keypair.pem\" ec2-user@:~/my_file.csv ./\n",
"\n",
"This command will copy files from your EC2 instance to your present working directory. Simply put the full path to the file you want to transfer after the colon above (remember `~/` means \"home directory\"). The second argument of `scp` is where you want to copy the file.\n",
"\n",
"Similarly, you can upload files to your EC2 instance as follows (in this example to the home directory in your instance).\n",
"\n",
" scp -i \"~/key_pairs/bi1x_keypair.pem\" my_file.txt ec2-user@:~/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7. Exiting\n",
"\n",
"When you are finished with your session, you can shut down your notebook in the browser. Then, in the terminal window, you can shut down JupyterLab by pressing `Ctrl-c`. After Jupyter is terminated, you should detach your screen by doing `screen -d`. Finally, you should quit your screen by doing `screen -X quit`.\n",
"\n",
"In the past, I have had students have their instances littered with detached screens. You should clean house from time to time and run `screen -X quit`.\n",
"\n",
"After you are finished with your work on your instance, you should stop your instance. To do this, go back to the `Instances` page on your EC2 console. Right click your instance, and navigate to `Instance State` → `Stop`. **Do not terminate your instance** unless you really want to. Terminating an instance will get rid of any changes you made to it."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 8. Seriously. Stop your instances if you are not using them.\n",
"\n",
"If your instance is not stopped and you leave it running, your free AWS credits will quickly evaporate."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 9. Restarting your instances after the initial setup\n",
"\n",
"Now that you already have your account set up, to start using AWS again, all you need to do is:\n",
"\n",
"1. Log in to your AWS account in the browser.\n",
"\n",
"2. Select EC2 from the `Services` pulldown menu at the top of your screen. \n",
"\n",
"3. After selecting EC2, you will see the EC2 Dashboard on the left pane. Under `Instances` there, click `Instances`. Alternatively, you can also click the `Running Instances` link under the `Resources` main heading at the top of the page.\n",
"\n",
"4. Right click over the name of the instance you wish to start and go to `Instance State` → `Start`.\n",
"\n",
"It may take a little while for your instance to get going. When the `Instance State` says `running` and the `Status Checks` are complete, your instance is ready for you to get working."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 10. Terminate your instances after the class is over\n",
"\n",
"After the class is over, you might want to terminate your instance. This is because the storage in your instance (stored using AWS's EBS, which is what keeps your repository, installations, etc., all in tact) is not free. Once the class ends, you will be removed from the AWS Organization that is funding your usage and your credit card will begin to be charged. Once your free tier accessibility expires in a year if you are new to AWS, you will start getting bills for your EBS usage. These get wiped if you terminate your instance and you will not get billed.\n",
"\n",
"Now that you are finished setting up AWS, you can proceed to do the [Moving Pictures tutorial](https://amplicon-docs.qiime2.org/en/latest/tutorials/moving-pictures.html)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}