hermie: the spectrum-to-protein analysis pipeline

Table of Contents

Home
Getting Started
Documentation
Customization
Examples
Troubleshooting
Index

On this Page

Fully Automated
Name the output directory
Specify MS2 files
Change database
Turn off individual steps
Specify program options
Custom search modes
Copying results to the web
Email notification
Running crux

Quick Reference: Examples for the experienced user

This page is meant to act as a quick reference guide for those who are already familiar with hermie but may need a syntax reminder. Each set of examples begins with one in the general form. Any words inside of angle-brackets (<>) are a description of the words that should be supplied by the user. Any words between square-brackets ([]) are optional and do not need to be included (however, it is probably what is being illustrated in the example). Note that options may be specified in any order, but they must come between hermie and organism. First-time users may want to read the tutorial for a detailed walk-through.

Bare-bones usage

For a fully automated run, you only need to provide an organism and a search mode.

$ hermie <organism> <search mode>
$ hermie yeast standard-perc
$ hermie human standard-perc
$ hermie fly standard
$ hermie yeast high-res-perc

Any combination of organism and search mode can be used. The commands $ hermie -list-organisms and $ hermie -list-modes list what is available. See Choosing a search mode for more information.

return to top

Name the output directory

By default, the results are put into a directory named 'pipeline'. Not very informative. You can specify a different name for the output directory. This is particularly useful doing multiple runs on the same MS2 files. Suppose you had a set of MS2 files that you wanted to try searching with two different protein databases. You could use these two commands from the same directory and have the results put into different places.

$ hermie [-name <directory name>] <organism> <search mode>
$ hermie -name mouseResults mouse standard-perc
$ hermie -name ratResults rat standard-perc
return to top

Specify MS2 files

The automated version uses all of the MS2 files in the current working directory. For more flexibility, you can specify which MS2 files to run.

$ hermie <organism> <search mode> [list of MS2 files]

Use three MS2 files in the current working directory:
$ hermie yeast standard-perc first.ms2 second.ms2 third.ms2

Use one file from the current directory and two from another locations:
$ hermie yeast standard-perc sample.ms2 ../other/location/ofthe.ms2 ~/third.ms2

Use all files in the current directory whose names start with '040507' and end with 'ms2':
$ hermie yeast standard-perc 040507*ms2

Instead of listing a specific file, you can give a directory and all files ending in 'ms2', 'cms2' or 'bms2' in that directory will be used. The current directory is specified with '.' (a dot). Directories and file names can be mixed together.

Use all MS2 files in the directory ~/sampleFiles/
$ hermie worm standard-perc ~/sampleFiles/

Use all MS2 files in two directories:
$ hermie worm standard-perc ~/sampleFiles/ ../otherfiles/

Use one file in another directory and ALL of the MS2s in the current directory:
$ hermie worm standard-perc ../bestruns/choice.ms2 .

Use one file in the current directory and ALL of the MS2s in the parent directory:
$ hermie worm standard-perc wormguts.ms2 ../
return to top

Use a different database

The organism defines the database (fasta file) used for the SEQUEST and/or crux search and the library used for the BiblioSpec search. By using the organism other and the -fasta option, you can specify a non-standard database. Suppose the database you want is in /share/database/platypus.fasta. You will also either need to specify a library or turn off the library search.

$ hermie [-fasta <file> -noblibSearch -noadd-lib] other <mode>
$ hermie -fasta /share/database/platypus.fasta -noblibSearch -noadd-lib other standard

OR

$ hermie [-fasta <file> -library <file>] <organism> <mode>
$ hermie -fasta /share/database/platypus.fasta -library /share/libraries/platypus.lib other standard

If you are running percolator, you also need to specify a decoy database. For this example, it is in the same location and named 'platypus-rand.fasta'.

$ hermie [-fasta <file> -decoy <file> -noblibSearch -noadd-lib] <organism> <mode>
$ hermie -fasta /share/database/platypus.fasta -decoy /share/database/platypus-rand.fasta -noblibSearch -noadd-lib other standard-perc
return to top

Turn off steps

Sometimes you want to skip certain steps in the analysis. This is particularly useful if your run was interrupted and you want to finish the steps that were not completed. For instance, if your run was interrupted during the DTASelect step, you would repeat the same command adding options to turn off all of the preceding steps.

Original command
$ hermie -name ratbrain rat standard-perc rb-1.ms2 rb-2.ms2

...interrupted...

Follow up (run from the same directory as above)
$ hermie -name ratbrain -nocharge -noblib -noseq -noperc rat standard-perc rb-1.ms2 rb-2.ms2

Note that the options can be abbreviated as long as the abbreviation is unique. So, for example, -nocharge-czar abbreviated as -n will not work, but -noc will.
return to top

Add options for component programs

Hermie allows the user to pass options to the component programs. For example, you might want to change the threshold used by DTASelect after a percolator run. The default behaviour is to use a maximum FDR of 1%. In order to do this, hermie runs this command.

$ DTASelect -S -0.01 [...other options]

(The values are negated so that DTASelect will return values greater than -0.01, which actually means values less than 0.01) To use a less stringent threshold of 5% FDR, you would want DTASelect to use the option -S -0.05. The option-value pair will be written with an = in place of the space (i.e. -S=-0.05) You tell hermie which program to pass it to with the option --dta-sequest. The final hermie command will look like this.

$ hermie [program-option option=value] <organism> <search mode>
$ hermie --dta-sequest -S=-0.05 yeast standard-perc

It is possible to pass DTASelect (or any other component program) more than one option. Simply insert another --dta-sequest option=value in the command. For instance, you can ignore any proteins with the string "ribosome" in the name with the option -e ribosome. Adding it to the above command looks like this.

$ hermie --dta-sequest -S=-0.05 --dta-sequest -e=ribosome yeast standard-perc

As another example, suppose you want BiblioSpec to search spectra of all charge states instead of limiting the candidates to those with the same charge as the query spectrum. The charge feature is controlled with the -z option. You would want hermie to run this command.

$ BlibSearch -z query.ms2 yeast.lib output.sqt

In this case, the option does not take an argument, so the option-value pair will be written as -z=. The final hermie command will look like this.

$ hermie --lib -z= yeast standard-perc
return to top

Using a custom mode file

Any of these options can be put into a separate file instead of being given on the command line. A file containing a collection of options is called a mode file (at least in the world of hermie). A detailed description of how to create and use custom mode files is given on the customization page. Here are the basics.

Let's say you put together a mode file and named it myfancysearch. Make sure the file is in your mode path. If your mode file does not contain options for naming the database or library, your command might look like this.

$ hermie <organism> <custom mode>
$ hermie yeast myfancysearch
$ hermie worm myfancysearch first.ms2 second.ms2
$ hermie -name ffs fly myfancysearch

If you included a custom database in your mode file (options -fasta and -decoy), then you need to use the organism "other". If you use "other", you also must specify a library or turn off the library search option. Remember that the --noblib-search or -library options could alternatively be included in the mode file.

Examples WITH the -library option in the mode file
$ hermie other myfancysearch
$ hermie -name ofs other myfancysearch
$ hermie other myfancysearch wormguts.ms2 ecoli.ms2

Examples WITHOUT the -library option in the mode file
$ hermie --noblib-search other myfancysearch
$ hermie --noblib-search -name whynamethis other myfancysearch
$ hermie -library ~/mylibs/platypus.lib other myfancysearch
return to top

Copying results to the web

The results you are most likely interested in are in the DTASelect.html file. You can have this file copied to your web directory automatically with the -web option. If you provide a directory, the copied file will still be named DTASelect.html. If you provide a directory and file name, the file will be renamed. This option will over-write a file of the same name.

$ hermie [-web <path>] <organism> <mode>
$ hermie -web /net/maccoss/vol3/home/glover/public_html/wormresults/ worm standard-perc
$ hermie -web /net/maccoss/vol1/home/gray/public_html/dtas/040508-fly.html fly standard-perc
return to top

Email notification

Hermie runs can take days to complete. Wouldn't it be nice if you didn't have to keep checking back to see if it was done? Now you don't have to. You can have an email sent to you when the run has finished.

$ hermie [-mail <address>] <organism> <mode>
$ hermie -mail schwartzx@u.washington.edu yeast standard-perc
return to top

Running crux

There are now two different database search algorithms available in the hermie pipeline, SEQUEST and crux. SEQUEST is run by default. Crux can be run in addition to or instead of SEQUEST. When using pre-defined organisms, crux will use an indexed version of the standard fasta. For custom databases use the -fasta option. Crux generates its own decoy sequences on the fly, so no decoy database is necessary. Crux results will also be analyzed by percolator when that step is turned on.

All crux options should be put in a parameter file and the name of the parameter file given to hermie with --crux-option. The path to the parameter file will be passed to crux as is, so a fully-qualified path (one begining with '/') is safest. Note that hermie actually runs parallel-crux, a wrapper that divides the search into blocks of scans and parcels them out to different cluster nodes. However, the options you specify with --crux-option should be crux options. A few options are disabled when crux is run via parallel-crux. See the parallel-crux documentation for details.

A run including crux might look like this.

$ hermie -crux yeast standard-perc
$ hermie -crux -noseq -noblib -noadd -fasta platypus.fasta other standard-perc
$ hermie -crux --crux-option --parameter-file=/net/maccoss/vol2/home/me/params/mod-search.params ecoli standard
return to top