Table of Contents
On this Page
Getting Started
This tutorial is designed to introduce MacCoss lab members and collaborators to hermie, the peptide mass spectrum data analysis pipeline. It assumes you know a few basic UNIX commands so that you can copy files, list the contents of a directory, and move to different directories. You can jump to a topic on this page with the links in the navigation bar to the left. Experienced users might try the examples page.
Before you begin
For this tutorial you will need an account on proteome and an MS2 file. In order to complete this tutorial quickly, you might want to use a small MS2 file with only a few spectra. There is a sample on proteome at /home/frewen/examples/sample.ms2 which you can copy to your home directory and use for this tutorial. Or if you prefer, you can generate your own. Here's how.
If you have a file, say platypus-01.ms2, you can create a shorter version of it by running this command
$ head -n 15000 platypus-01.ms2 > short-platypus-01.ms2
$ grep -c ^S short-platypus-01.ms2Hopefully, you were given a web directory on proteome so that you can view your HTML results. To check for that directory, do
$ ls ~/public_htmlBefore you begin, you might want to check that there are processors available on the cluster. Run this command
$ qstatAn example run.
We will assume that you are using the sample MS2 file called sample.ms2 and that it contains spectra from yeast.
- Navigate to a directory where you would like to store your results. Copy your MS2 file to this directory.
- Make sure that hermie is in your
$PATH environment variable by running this command.
If you get an error message saying command not found, try this
$ hermieYou should see the following text printed to your screen.$ export PATH=$PATH:/net/maccoss/vol2/software/bin
$ hermieFATAL: No organism and/or search mode given Usage: hermie [options] <organism> <mode> hermie [options] <organism> <mode> <directory name> [<directory name>...] hermie [options] <organism> <mode> <file name> [<file name>...] Options: -name <output name> Specify directory name for results. Default is 'pipeline'. -check-setup Quit after printing out setup details. -help Print the complete list of options. -list-organisms Print a list of established organisms. -list-modes Print a list of predefined modes. -verbose <0|1|2|3> Adjust the level of output to stdout. Default level is 2. 0-silent. 1-setup details 2-progress. 3-output of each step.
This is the usage statement preceded by an error message. - The error message on the first line is telling you that you did not
provide the necessary arguments: the organism and the search mode. By
specifying an organism, you are determining which protein database
and library will be used for the search. By selecting a search
mode, you are determining how the pipeline will run. There are
several predefined organisms and search modes or you can create your
own. (see Custom Modes and the documentation for details) To see a list
of available organisms, run the command
Note that other is in the list. This can be used with a custom mode to specify organisms not in the default list.You should see yeast in the list, which is the one we will use. To see a list of available search modes, run the command
$ hermie -list-organismsWe will use the standard-perc mode (also see the Choosing a Search Mode section below).$ hermie -list-modes - Look once more at the help message generated in step 2. The
section beginning with Usage gives the overall
format of hermie commands. The organism
and mode are mandatory and there are some options--information that
isn't necessary but can be included. Some of those options are
listed in the usage statement. You can see all of the options with
the command
$ hermie -help - Once we have chosen an organism, search mode, and options, we can
get a preview of what hermie will do by
using the
-check-setupoption. Run the commandYou should see the following$ hermie -check-setup yeast standard-perc
hermie: Spectrum Analysis Pipeline Scheduled to run BiblioSpec SEQUEST percolator DTASelect update-library Using ms2 files: sample.ms2 Using library /net/maccoss/vol2/software/pipeline/libraries/yeast.lib Using protein database /net/maccoss/vol2/software/pipeline/dbase/yeast/yeast-200209-contam.fasta and decoy database /net/maccoss/vol2/software/pipeline/dbase/yeast/yeast-200209-contam-rev.fasta
The preview is telling you that hermie will run five steps in the analysis beginning with BiblioSpec and ending update-library. There is a list of the MS2 files that will be used (by default, any .ms2 files in the current directory. In this case sample.ms2) as well as the names of the libraries and fasta protein databases. - Now we are ready for a real run. Run the analysis with the command
We added the
$ hermie -sleep-time 1 -nodes 1 yeast standard-perc-sleep-timeoption so you don't have to wait 20 minutes for hermie to finish and we added the-nodesoption so that you are only requesting one node on the cluster. Usually, you will want to use the default values for these options.
Once again, you should see the set-up information printed to the screen. Now it should also be followed by the name of the program being run as the analysis proceeds. Once it has completed, it will print Pipeline complete and give you a prompt. - Take a look at the results. First look at the contents of
the directory. (
$ ls) You should see two files named log and log-1 and a new directory named pipeline. This is the default name and it can be changed by using the option-name. For example, to name the the directory storing the results tutorial run the commandThe log files are a more detailed version of the information printed to the screen. The first one was produced when we did the check and the second one came from the actual run. You can control how much information is printed to the screen with the$ hermie -name tutorial yeast standard-verbosityoption. This does not affect the log file, so you could runhermiewith no screen output and still have all the details saved to the log file. - Within the pipeline directory are all of the
intermediate outputs of the programs run. They are organized by
sub-directories named for the program. For example, the output from
SEQUEST is in pipeline/sequest.
Note: Don't panic if there is no DTASelect-filter.txt file or if the DTASelect.html file is empty. Remember, we are only looking at a handful of spectra so it's pretty unlikely that it found any good matches. Have a look at the end of the file pipeline/dtaselect/ sequest/dtas-messages There may be a line saying "No proteins passed criteria!"The file you are probably most interested in is pipeline/dtaselect/sequest/DTASelect-filter.txt. There is also the corresponding HTML file for viewing the results. The commandwill put the HTML file in your proteome web directory (see Before you begin) and you can view it by pointing your web browser to
$ cp pipeline/dtaselect/sequest/DTASelect.html ~/public_html/tutorial.htmlproteome.gs.washington.edu/~<username>/tutorial.html. Don't forget to change<username>to your proteome login name.
Choosing a search mode
The search mode defines which steps are performed and the arguments
passed to the programs at each step. There are five defined search
modes: high-res-perc, standard, standard-perc,
lib-as-filter, lib-as-filter-perc. In most cases, you will want to
use standard-perc. This mode runs
BiblioSpec, SEQUEST, percolator, DTASelect (on both the BiblioSpec and
SEQUEST search results), and update-library. The
standard mode does not run percolator. Unless you have a
compelling reason not to use it, you should probably choose to run
percolator. The other critical difference between standard
and standard-perc is the set of options passed to DTASelect.
Without percolator, DTASelect chooses good matches based on features
like XCorr and deltaCn. Percolator inserts its new score into the Sp
field so DTASelect must be configured to ignore all of the usual
features and select primarily on Sp score.
The mode high-res-perc is like standard-perc but it is
intended for high resolution MS1 data. It runs two additional steps,
Hardklor and Bullseye, which identify a more accurate
precursor m/z and charge and filter out MS/MS spectra which do
not appear to be derived from a peptide. To use this mode you must have an
.ms1 file for each .ms2 file.
The other modes are lib-as-filter and
lib-as-filter-perc.
These modes run all the same steps as the two standard
modes. The difference is that the with lib-as-filter the
library search results are used to limit which spectra are searched by
SEQUEST. Any spectrum with a good library match is considered to have
been identified and only those without good hits are passed on to
SEQUEST to be searched. The other major difference is that the search
results will be combined together by DTASelect. In theory, filtering
out the easily identified spectra should speed up the SEQUEST search.
However, even in the best circumstances, only a small fraction of the
spectra will be identified so the time savings are marginal.
These modes are meant to cover typical use of hermie and may not meet your specific needs. Any combination of features can be combined in a customized mode. See this section for more details.
return to topUsing MS2 files from different locations
In the above example, we put the MS2 files we wanted to analyze in our current working directory. This, however, is not always convenient, so hermie has two additional ways of looking for input files. You may specify any number of MS2 files or directories containing MS2 files on the command line. Consider this example
$ hermie yeast standard ~/research/runs/best-run.ms2 ~/research/others/
../another.ms2. to the list of files.)
return to top
Running longer analyses
As you well know, SEQUEST can take hours or days to complete. While you and hermie are waiting for the results, the terminal window in which you are running hermie is occupied. You cannot work in that terminal and if you close it, the run will stop. There are several ways of dealing with this.- Open a new window. This solves the first problem. Now you can continue doing work, but if you close the first window or log off the computer the run will still stop.
- Suspend the run. You can pause hermie by typing
Ctrl+z. A line telling you that the process has been stopped will appear and you will be given a prompt. At this point you can work in the window and when you are ready to proceed with hermie, you can run the command$ fg. Hermie will proceed as usual as long as the window is open. - Run without interruption. Another option is to tell the
computer to continue running hermie even if
you close the window. Try this command
The utility
$ nohup hermie yeast standard &nohupruns any program (in this case hermie) even after the terminal that launched it has been closed. It will no longer print the progress of the run to the screen. Since it will keep running even after the window is closed, it needs a more stable place to print its output. Therefore, the information that would have been printed to the screen will be put in a file called nohup.out. Since you are not using the screen to monitor progress, you could run
$ nohup hermie -verbosity 0 yeast standard &
which prints nothing to the screen. Now you can monitor progress by reading the log file instead of the nohup.out file. The next section describes additional ways to monitor the progress of the run.
Checking on your run's progress
There are several ways to check on your run.
- Screen. Information about the status of the
run is typically printed to
the screen (stdout). The amount of information printed can be
controlled with the
-v <level>option wherelevelis between 0 (no output) and 3 (most output). The default level is 2. As each step in the pipeline begins, it is announced with an output like Running BlibSearch. If you are usingnohup, the output will be printed to the file nohup.out. - Log File. Similar information is printed to
the log files. The output to the log files is not affected by
-v. Checking the log file provides an advantage when running nohup. If you have multiple runs started from the same directory,nohupwill write the output of all of them to the same nohup.out file whereas each run will have a unique log file. -
Note: to view just your own jobs, useqstat. Once a job has been submitted to the queue, you can check on its status using the command
$ qstat -u <username>
Note: there may be a delay between the time your jobs finish and when hermie starts the next step. This is controlled, in part, by the-sleep-timeoption.$ qstat. It prints out a list of all of the jobs running and waiting to be run. Your runs will be listed with your user name (proteome login) and the name of the run (a code, like "cz" or "seq", a long number, and an identifying number) After your login name is a code that gives the status of the job. The most common codes are 'qw' for a job that is queued and waiting, 'r' for jobs that are running, and 'e' when there is an error. - ps. If you want another confirmation that
hermie is still running, you can use
ps. For example, run the commandto get a list of all the processes you have (replace$ ps -u <username><username>with your proteome login name). This option is now less informative since most processes are being run on the cluster. It is, however, a confirmation that hermie is still running.
return to top