hermie: tutorial

Home
Getting Started
Documentation
Customization
Examples
Troubleshooting
Index

Getting Started

This tutorial is designed to introduce MacCoss lab members and collaborators to hermie, the peptide mass spectrum data analysis pipeline. It assumes you know a few basic UNIX commands so that you can copy files, list the contents of a directory, and move to different directories. You can jump to a topic on this page with the links in the navigation bar to the left. Experienced users might try the examples page.

Before you begin

For this tutorial you will need an account on proteome and an MS2 file. In order to complete this tutorial quickly, you might want to use a small MS2 file with only a few spectra. There is a sample on proteome at /home/frewen/examples/sample.ms2 which you can copy to your home directory and use for this tutorial. Or if you prefer, you can generate your own. Here's how.

If you have a file, say platypus-01.ms2, you can create a shorter version of it by running this command

$ head -n 15000 platypus-01.ms2 > short-platypus-01.ms2

You can count the number of spectra in your new file like this

$ grep -c ^S short-platypus-01.ms2

Hopefully, you were given a web directory on proteome so that you can view your HTML results. To check for that directory, do

$ ls ~/public_html

Either it will list the contents of the directory (which may be empty) or it will return an error telling you that no such directory exists.

Note: if you have never run SEQUEST before, you should follow the set up instructions.

Before you begin, you might want to check that there are processors available on the cluster. Run this command

$ qstat

If it returns nothing, you are in good shape. If there are jobs listed that are in the queue and not yet running (qw), you may have to wait a while for your run to complete.

return to top

An example run.

Note: Hermie can read spectra from .ms2, .cms2 or .bms2 files. For this tutorial, ".ms2" will refer to all three types.

We will assume that you are using the sample MS2 file called sample.ms2 and that it contains spectra from yeast.

Navigate to a directory where you would like to store your results. Copy your MS2 file to this directory.

Make sure that hermie is in your $PATH environment variable by running this command.

$ hermie

If you get an error message saying command not found, try this

$ export PATH=$PATH:/net/maccoss/vol2/software/bin
$ hermie

You should see the following text printed to your screen.

FATAL: No organism and/or search mode given
Usage:  hermie [options] <organism> <mode>
        hermie [options] <organism> <mode> <directory name> [<directory name>...]
        hermie [options] <organism> <mode> <file name> [<file name>...]

Options: -name <output name>  Specify directory name for results.
                                Default is 'pipeline'.
         -check-setup         Quit after printing out setup details.
         -help                Print the complete list of options.
         -list-organisms      Print a list of established organisms.
         -list-modes          Print a list of predefined modes.
         -verbose <0|1|2|3>   Adjust the level of output to stdout.
                                Default level is 2. 0-silent. 1-setup details
                                2-progress. 3-output of each step.

This is the usage statement preceded by an error message.

The error message on the first line is telling you that you did not provide the necessary arguments: the organism and the search mode. By specifying an organism, you are determining which protein database and library will be used for the search. By selecting a search mode, you are determining how the pipeline will run. There are several predefined organisms and search modes or you can create your own. (see Custom Modes and the documentation for details) To see a list of available organisms, run the command
Note that other is in the list. This can be used with a custom mode to specify organisms not in the default list.

$ hermie -list-organisms
You should see yeast in the list, which is the one we will use. To see a list of available search modes, run the command
$ hermie -list-modes
We will use the standard-perc mode (also see the Choosing a Search Mode section below).
Look once more at the help message generated in step 2. The section beginning with Usage gives the overall format of hermie commands. The organism and mode are mandatory and there are some options--information that isn't necessary but can be included. Some of those options are listed in the usage statement. You can see all of the options with the command
$ hermie -help
Once we have chosen an organism, search mode, and options, we can get a preview of what hermie will do by using the -check-setup option. Run the command
$ hermie -check-setup yeast standard-perc
You should see the following
```
hermie: Spectrum Analysis Pipeline

Scheduled to run BiblioSpec SEQUEST percolator DTASelect update-library 
Using ms2 files:
	sample.ms2
	
Using library /net/maccoss/vol2/software/pipeline/libraries/yeast.lib
Using protein database /net/maccoss/vol2/software/pipeline/dbase/yeast/yeast-200209-contam.fasta
  and decoy database /net/maccoss/vol2/software/pipeline/dbase/yeast/yeast-200209-contam-rev.fasta
```
The preview is telling you that hermie will run five steps in the analysis beginning with BiblioSpec and ending update-library. There is a list of the MS2 files that will be used (by default, any .ms2 files in the current directory. In this case sample.ms2) as well as the names of the libraries and fasta protein databases.
Now we are ready for a real run. Run the analysis with the command
$ hermie -sleep-time 1 -nodes 1 yeast standard-perc
We added the -sleep-time option so you don't have to wait 20 minutes for hermie to finish and we added the -nodes option so that you are only requesting one node on the cluster. Usually, you will want to use the default values for these options.
Once again, you should see the set-up information printed to the screen. Now it should also be followed by the name of the program being run as the analysis proceeds. Once it has completed, it will print Pipeline complete and give you a prompt.
Take a look at the results. First look at the contents of the directory. ($ ls) You should see two files named log and log-1 and a new directory named pipeline. This is the default name and it can be changed by using the option -name. For example, to name the the directory storing the results tutorial run the command
$ hermie -name tutorial yeast standard
The log files are a more detailed version of the information printed to the screen. The first one was produced when we did the check and the second one came from the actual run. You can control how much information is printed to the screen with the -verbosity option. This does not affect the log file, so you could run hermie with no screen output and still have all the details saved to the log file.
Within the pipeline directory are all of the intermediate outputs of the programs run. They are organized by sub-directories named for the program. For example, the output from SEQUEST is in pipeline/sequest.
Note: Don't panic if there is no DTASelect-filter.txt file or if the DTASelect.html file is empty. Remember, we are only looking at a handful of spectra so it's pretty unlikely that it found any good matches. Have a look at the end of the file pipeline/dtaselect/ sequest/dtas-messages There may be a line saying "No proteins passed criteria!"
The file you are probably most interested in is pipeline/dtaselect/sequest/DTASelect-filter.txt. There is also the corresponding HTML file for viewing the results. The command
$ cp pipeline/dtaselect/sequest/DTASelect.html ~/public_html/tutorial.html
will put the HTML file in your proteome web directory (see Before you begin) and you can view it by pointing your web browser to proteome.gs.washington.edu/~<username>/tutorial.html. Don't forget to change <username> to your proteome login name.

return to top

Choosing a search mode

The search mode defines which steps are performed and the arguments passed to the programs at each step. There are five defined search modes: high-res-perc, standard, standard-perc, lib-as-filter, lib-as-filter-perc. In most cases, you will want to use standard-perc. This mode runs BiblioSpec, SEQUEST, percolator, DTASelect (on both the BiblioSpec and SEQUEST search results), and update-library. The standard mode does not run percolator. Unless you have a compelling reason not to use it, you should probably choose to run percolator. The other critical difference between standard and standard-perc is the set of options passed to DTASelect. Without percolator, DTASelect chooses good matches based on features like XCorr and deltaCn. Percolator inserts its new score into the Sp field so DTASelect must be configured to ignore all of the usual features and select primarily on Sp score.

The mode high-res-perc is like standard-perc but it is intended for high resolution MS1 data. It runs two additional steps, Hardklor and Bullseye, which identify a more accurate precursor m/z and charge and filter out MS/MS spectra which do not appear to be derived from a peptide. To use this mode you must have an .ms1 file for each .ms2 file.

The other modes are lib-as-filter and lib-as-filter-perc. These modes run all the same steps as the two standard modes. The difference is that the with lib-as-filter the library search results are used to limit which spectra are searched by SEQUEST. Any spectrum with a good library match is considered to have been identified and only those without good hits are passed on to SEQUEST to be searched. The other major difference is that the search results will be combined together by DTASelect. In theory, filtering out the easily identified spectra should speed up the SEQUEST search. However, even in the best circumstances, only a small fraction of the spectra will be identified so the time savings are marginal.

These modes are meant to cover typical use of hermie and may not meet your specific needs. Any combination of features can be combined in a customized mode. See this section for more details.

return to top

Using MS2 files from different locations

In the above example, we put the MS2 files we wanted to analyze in our current working directory. This, however, is not always convenient, so hermie has two additional ways of looking for input files. You may specify any number of MS2 files or directories containing MS2 files on the command line. Consider this example

$ hermie yeast standard ~/research/runs/best-run.ms2 ~/research/others/
../another.ms2

This will analyze the two files best-run.ms2 and another.ms2 as well as all of the MS2 files in ~/research/others and NONE of the MS2 files in the current directory. (To also include the ones in the current directory, add . to the list of files.)

return to top

Running longer analyses

As you well know, SEQUEST can take hours or days to complete. While you and hermie are waiting for the results, the terminal window in which you are running hermie is occupied. You cannot work in that terminal and if you close it, the run will stop. There are several ways of dealing with this.

Open a new window. This solves the first problem. Now you can continue doing work, but if you close the first window or log off the computer the run will still stop.
Suspend the run. You can pause hermie by typing Ctrl+z. A line telling you that the process has been stopped will appear and you will be given a prompt. At this point you can work in the window and when you are ready to proceed with hermie, you can run the command $ fg. Hermie will proceed as usual as long as the window is open.
Run without interruption. Another option is to tell the computer to continue running hermie even if you close the window. Try this command
$ nohup hermie yeast standard &
The utility nohup runs any program (in this case hermie) even after the terminal that launched it has been closed. It will no longer print the progress of the run to the screen. Since it will keep running even after the window is closed, it needs a more stable place to print its output. Therefore, the information that would have been printed to the screen will be put in a file called nohup.out. Since you are not using the screen to monitor progress, you could run
$ nohup hermie -verbosity 0 yeast standard &
which prints nothing to the screen. Now you can monitor progress by reading the log file instead of the nohup.out file. The next section describes additional ways to monitor the progress of the run.

Checking on your run's progress

There are several ways to check on your run.

Screen. Information about the status of the run is typically printed to the screen (stdout). The amount of information printed can be controlled with the -v <level> option where level is between 0 (no output) and 3 (most output). The default level is 2. As each step in the pipeline begins, it is announced with an output like Running BlibSearch. If you are using nohup, the output will be printed to the file nohup.out.
Log File. Similar information is printed to the log files. The output to the log files is not affected by -v. Checking the log file provides an advantage when running nohup. If you have multiple runs started from the same directory, nohup will write the output of all of them to the same nohup.out file whereas each run will have a unique log file.
Note: to view just your own jobs, use $ qstat -u <username>
Note: there may be a delay between the time your jobs finish and when hermie starts the next step. This is controlled, in part, by the -sleep-time option.
qstat. Once a job has been submitted to the queue, you can check on its status using the command $ qstat. It prints out a list of all of the jobs running and waiting to be run. Your runs will be listed with your user name (proteome login) and the name of the run (a code, like "cz" or "seq", a long number, and an identifying number) After your login name is a code that gives the status of the job. The most common codes are 'qw' for a job that is queued and waiting, 'r' for jobs that are running, and 'e' when there is an error.
ps. If you want another confirmation that hermie is still running, you can use ps. For example, run the command
$ ps -u <username>
to get a list of all the processes you have (replace <username> with your proteome login name). This option is now less informative since most processes are being run on the cluster. It is, however, a confirmation that hermie is still running.

return to top

Table of Contents

On this Page

Getting Started

Before you begin

An example run.

Choosing a search mode

Using MS2 files from different locations

Running longer analyses

Checking on your run's progress