hermie: the spectrum-to-protein analysis pipeline

Table of Contents

Home
Getting Started
Documentation
Customization
Examples
Troubleshooting
Index

On this Page

Custom Search Modes
Custom sequest.params
Dynamic modifications
Understanding $MODEPATH
Using other clusters

Customization: how to adjust hermie parameters

Hermie is quite flexible and can be adjusted to meet your needs. This section covers how to change the inputs to each step of the pipeline including the organism being searched, the fasta file used by the database search, the parameter files used, and the DTASelect criteria. All parameters can be given on the command line. Sometimes this leads to very long, cumbersome commands that you might not want to type more than once. For this reason, all parameters are also controlled by what is called the search mode. Creating and using custom search modes is described below.

Custom Search Modes and Organisms

A search mode is a text file containing a list of the desired options and their arguments. All of the available options can be found with the command

$ hermie -help
Options are available for controlling which steps are performed (e.g. run or skip charge_czar), the inputs to those steps (e.g. the BlibSearch library, the SEQUEST fasta file), and the options for those steps (e.g. DTASelect option -Smn 7). See the documentation for more details on the function of each option.

Syntax. When creating a custom mode file, you might want to refer to one of the default modes which are found at /net/maccoss/vol2/software/pipeline/modes/. The basic syntax is to write an option name followed by its argument, if any. Options may be separated by any white-space (space, tab, newline). Comments may be added to the file by beginning the line with a pound sign (#).

Multiple options. Most of the options should be specified only once. If they are included twice, the value of the last occurrence will be used. If they are specified in both the mode file and on the command line, the command-line value will be used. There is an exception. Six options may be given multiple times. These are --dta-sequest, --dta-library, --cz, --lib, --perc-option, --crux-option. With these, you may specify a set of options to be passed to programs run within hermie. Yes, this is slightly confusing; you are using a hermie option to specify, say, a DTASelect option. The general form is

--program option=argument
where program indicates which program this option is for, option is the option as it would be entered on the command line, and argument is the option's argument, if there is one. For instance, for hermie to run
$ DTASelect -Smn 7 -o -a false
you would include the following in the mode file
--dta-sequest -Smn=7 --dta-sequest -o= --dta-sequest -a=false

Custom organisms. Many model organisms are already established in the defaults. To get a list of available organisms run

$ hermie -list-organisms
The predefined organisms determine the protein database (fasta file) used in the SEQUEST and/or crux search and the library used by BlibSearch. To use a custom database and/or library, you can use the organism other and include the -fasta, -decoy (for percolator), and -library options in the search mode. These options can only be used with the organism other, otherwise they will be overridden by the default sources.

Using your custom mode. Once you have written your custom mode file, make sure it is located in a directory in your $MODEPATH (see Understanding $MODEPATH). To use your new mode named, for example, platypus.mode use the command

$ hermie other platypus.mode my.ms2

return to top

Custom sequest.params

The default sequest.params file used for the SEQUEST search will meet most people's needs. However, if you want to include modifications, change the number of matches reported, or make other adjustments to the SEQUEST search, you will need to provide a new sequest.params file and make sure hermie knows where to look for it. Follow these steps.
    Note that the database in sequest.params will NOT be used. See Custom Search Modes for how to specify the database.
  1. Write the sequest.params file with the options you want. If you are bold enough to do this, I'll assume you know what to change. The file can have any name (SEQUEST requires that it is called "sequest.params", but hermie will make a copy of it and correct the name, if necessary.)
  2. Move the file to a convenient location. "Convenient" can mean whatever you like. If you expect to do this sort of SEQUEST search only once, you might want to put the file in the directory where you will start the hermie run. Or if you plan on doing all your searches this way, you might want to put it in your home directory or some other appropriate location (e.g. ~/othermodes).
  3. Add the option -seq-params and the name of your file to your hermie command or to your custom mode file.
  4. The old method of using the $MODEPATH also still works. If you do not use -seq-params, the first file named sequest.params in your $MODEPATH will be used
  5. Check your configuration. Move to the directory where you plan on running hermie and check the setup. The command might look like this
    $ hermie -check -seq-params ~/param_files/phospho.params yeast standard
    Now look in pipeline/sequest/ and read the sequest.params file to make sure that looks right.
return to top

Dynamic Modifications

For SEQUEST, dynamic modifications are specified in the sequest.params file (see Custom sequest.params). They are given as a mass shift and a list of residues that could be modified. With percolator version 2, modifications must also be specified by their Unimod number. Hermie requires that you provide both of these values.

  1. Define your modifciations in a sequest.params file. To search for phosphorylation, the line might look like this
    diff_search_options = 79.966 STY 0.0 X 0.0 X 0.0 X 0.0 X 0.0 X
  2. Find the Unimod number for the modification. Look at the summary table to find your mod. The table includes the ID number, name, mass shift (monoisotopic and average masses) and a list of residues that might be affected. Try searching for the name or for the mass shift value. Phospho is ID 21.
    NOTE: If you forget the mods option, percolator will exit with an error. Rerun hermie, skipping all steps before percolator, and add the -mods option then.
  3. Include with your hermie command the -mods option followed by the Unimod ID. For our example
    $ hermie -mod 21 -seq-params myseq.params yeast standard-perc
  4. If you specify more than one modifciation, they must be given in the same order in the sequest.params as they are with the -mods param. For example, this line in a sequest.params
    diff_search_options = 79.966 STY 15.99 M 188.032956 K 0.0 X 0.0 X 0.0 X
    would be accompanied by this option
    -mods 21,35,42
return to top

Understanding $MODEPATH

The environment variable $MODEPATH is a list of directories where hermie will look for the search mode and the sequest.params file. It can be changed for every run you do, or you can set it to always be the same. To find out your current value of $MODEPATH, do

$ echo $MODEPATH
It may return nothing or a list of paths separated by a colon (:).

Note that the search mode and sequest.params file do not need to reside in the same directory.
Choosing an appropriate value. If $MODEPATH is undefined, hermie will look in the default location /net/maccoss/vol2/software/pipeline/modes/. This location is reserved for generic mode files, so you will need to add locations to $MODEPATH if you want to use a custom search mode or sequest.params file. The locations are up to you. A common choice would be to look first in the current directory (indicated by a dot '.'). You may commonly do searches with modifications and you have a directory ~/modsearches where you keep a custom sequest.params file. You might also have a some custom mode files in a directory ~/modes In that case, the value for your $MODEPATH could be set to .:~/modsearches:~/modes:/net/maccoss/vol2/software/pipeline/modes/ . For whatever value you choose, remember that hermie will look in those locations in the order they are listed and stop once it finds a file with the right name.

Syntax for setting the value. To set any environment variable in a bash shell, use the command

$ export VARNAME=newvalue
It is important that there is no space between VARNAME or newvalue and the equal sign (=). In our case VARNAME is MODEPATH. The newvalue is a path or a list of paths separated by colons such as .:~/my/new/path:/another. You can add new paths to the current value by including $MODEPATH in the list of paths. For instance, to add a new path to the end of the list, use the command
$ export MODEPATH=$MODEPATH:~/new/path
or to add to the beginning of the list use
$ export MODEPATH=~/new/path:$MODEPATH

Set the value temporarily. You can set the value for $MODEPATH (or any environment variable) temporarily for the shell you are currently working in by using the export command described above. Once you close the shell, the value disappears.

Set the value for all runs. You may set the value of $MODEPATH for every bash shell you open by putting the export command in the file ~/.bashrc. To do this, simply open the file in a text editor (vim, Emacs, etc.), add the command anywhere in the file (although you might want to put it near the definitions of other environment variables), save the changes, and close the file. You will have to open a new shell to see the changes.

return to top

Running SEQUEST on other clusters

When you log on to proteome and run hermie from there, the processor queue available is specific to the MacCoss lab (nodes named m001-m008). Genome Sciences also hosts a cluster that is available to everyone in the department. You may run hermie from there with a few minor modifications. Follow these steps.

  1. Log on to sage.gs.washington.edu. This is the master node for the cluster.
  2. DO NOT run hermie from sage. Instead, start an interactive session on one of the cluter nodes with the qlogin command.
  3. Your environment will be different than on proteome. Set $PATH to include the location of hermie and all of its componenet parts.
    $ export PATH=$PATH:/net/maccoss/vol2/software/bin64:/net/maccoss/vol2/software/bin
  4. Run the hermie command as usual, using nohup to ensure that the it runs after you exit your interactive session.
  5. (Optional) If you have permissions to run on the quartz queue (owned by the UW Proteome Resource), you can add the option --queue quartz.q to run on those processors. Log on to tephra.gs to access that queue.
  6. Once everything is running as expected, log out.

return to top