Index:
Description: Analyze peptide spectra. The pipeline script hermie is designed to automate the process of analyzing peptide spectra beginning with MS2 files and ending with a summary of peptide and protein identifications. There are several steps in the pipeline.
--charge-czar option to turn on this step.
high-res-perc mode to run these steps.
-crux option.
Usage: hermie [options] <organism> <search mode>
hermie [options] <organism> <search mode> <directory name> [<directory name>...]
hermie [options] <organims> <search mode> <file name> [<file name>...]
Input: In its simplest form, hermie analyzes all MS2 files in the current directory. Alternatively, the name of a directory can be specified or the names of specific files can be listed. Options, organisms, and search modes are described in the following sections. Given only organism and search mode arguments hermie will analyze all files in the current working directory with names ending with '.ms2', '.cms2' or '.bms2'. If a directory or list of directories is given, all MS2 files in those directories will be analyzed. An individual MS2 file name or a list of file names may also be specified.
Output: A directory tree containing all
intermediate output files and a
file recording the specifics of each step taken. Details of how the
analysis is progressing are printed to stdout (the screen) and can be
adjusted with the verbosity option (see Options). All output files are written to a new
directory whose name can be set using the name
option (see Options). Within this
directory are sub-directories for each step of the output. A text file
named 'log' describing all actions taken is also written.
In some cases, users will want to re-run only select steps in the pipeline. Therefore, if the output directory tree already exists, only those sub-directories that are used in the analysis will be changed. Details of the analysis will be written to a new log file numbered sequentially.
Options: The options below can be specified on the command line to control the pipeline analysis and return help messages. The search mode is a collection of options saved in a file. Several pre-defined search modes are available for typical searches or the user may define a custom search mode file.
-name <output name> |
Specify the name of the directory in which the results are written. Default is 'pipeline'. |
-check-setup |
Quit after printing out setup details. This option is useful for making sure that the input MS2 files are in place and that the mode and options have scheduled the steps you wish to perform. |
-help> |
Print the complete list of options available on the command line or in the mode file |
-list-organisms
| Print all of the predefined organisms. To specify an organism
not listed, use "other" and include the -library and
-fasta options in the mode. |
-list-modes
| Print all of the default modes available. (To read the details of these modes, look in proteome:/mnt/local/pipeline/modes/<mode name>) |
-verbose <0|1|2|3>
| Adjust the level of output to stdout. Default level is 2. 0 is silent. 1 prints the confirmation details from the set up step. 2 additionally displays progress by printing the name of the current step being run. 3 additionally prints all of the output from each step (which by default is written to a file). |
Organisms and Search Modes: The
details of the pipeline analysis are defined by the organism of
interest and what are called search modes. The organism defines which
spectrum library and fasta protein database is searched. Predefined
organisms are already associated with particular databases and
libraries. Use the organism other and a custom search mode to
specify different database and/or library files. A search mode
defines which steps of the analysis are performed, the options used
for each step, and so on. There are five basic search modes defined
(high-res-perc, standard, standard-perc, lib-as-filter, and
lib-as-filter-perc) and custom modes may be defined as well. the
pre-defined search mode files reside at
/net/maccoss/vol2/software/pipeline/modes/
The standard mode searches all spectra with both the library and database searches. The database contains both real protein sequences and a shuffled version of each as decoys. This mode uses common DTASelect criteria for defining good matches. The standard-perc mode adds the use of percolator to define good matches. In this mode, two SEQUEST searches are done for each ms2 file, one with a standard protein database and one with a database of only shuffled sequences. The high-res-perc mode is the standard-perc mode with the addition of two pre-search steps. This mode is for high-resolution MS1 data and requires a .ms1 (.cms1 or .bms1) file for each .ms2 file. The lib-as-filter mode uses the library search as a way of reducing the number of spectra searched by SEQUEST. All spectra with a good match to the library are removed from the SEQUEST search. The lib-as-filter-perc mode adds the use of percolator to define good SEQUEST hits.
To define a new mode, create a text file that contains the desired
parameters (described below). The mode is then referred to by the file
name. The environment variable $MODEPATH defines where the program
looks for mode files. If $MODEPATH is undefined, the default is
/net/maccoss/vol2/pipeline/modes/. A custom sequest.params
file can also be placed in your $MODEPATH to be used instead of the
default one. The database in sequest.params will be replaced by the
one defined by the organism or the one in the mode file.
The format of the mode file is the same as command-line options. Options may be separated by spaces, tabs, or new lines. Any line beginning with # will be ignored and can be used for comments. The following options may be specified.
-charge-czar |
Skip the charge_czar step. Off by default. |
-nocharge-czar |
Skip the charge_czar step. |
-noblibSearch |
Skip the library search (BiblioSpec) step. |
-nosequest |
Skip the database search with SEQUEST. |
-queue <queue name> |
Submit jobs to a specific queue. |
-crux |
Run a crux search. Turned off by default. |
-nocrux |
Skip the crux search (Use to override mode). |
-percolator |
A flag to select the use of percolator for scoring SEQUEST
results. A decoy database must also be specified with the
-decoy option. Not necessary with the
standard-perc mode. |
-nopercolator |
Skip the percolator step. Can be used to override a mode file, particularly when repeating select steps in a run. |
-bullseye |
Use Hardklor/Bullseye to assign precursor mass. Requires that
-hk-conf specify the configuration file to use. |
-nobullseye
| Do not run Hardklor/Bullseye. Can be used to overide mode file. |
-old-perc |
Use an older version of percolatir, v1.07. |
-noadd-lib |
Do not add new spectra to the library. |
-nodtaselect |
Do not run DTASelect on SEQUEST or library search results. |
-fasta <protein database file> |
(Only valid with the organism "other") Full path of the protein database file to be used by SEQUEST |
-decoy <decoy database> |
(Only valid with the organism "other". Required for
percolator) Full path of the random/shuffled protein database
file to be used by SEQUEST. This option is used in conjunction
with -percolator. Two separate SEQUEST searches are
done on each MS2 file, one with the real protein database and one
with a randomized database. |
-index <index name>
| Use index for SEQUEST. Replaces -fasta option. |
-decoy-index <index name> |
Use shuffled index for SEQUEST/percolator. Replaces -decoy option |
-seq-params <file> |
Specify a sequest.params file to be used by SEQUEST. The file
can have any name and does not have to be in the $MODEPATH. The
database in the specified file will NOT be used. The organism
argument or -fasta |
-tryptic <type> |
Use a tryptic digest in SEQUEST search where 'type' is full, partial, or non (i.e. full tryptic digest, partial tryptic digest, or non specific digest). Default partial. |
-hk-conf <file>
| File to replace the default hardklor.conf. |
-nodes <n> |
Number of nodes requested for the SEQUEST search of each MS2 file |
-sleep-time <n> |
Seconds between checks on SEQUEST progress. |
-library <library file > |
(Only valid with the organism "other". Required for library search and library update) Full path of the BiblioSpec library to be used in the search. |
-lib-as-filter |
Use the library search results as a filter for the database search. Only search those spectra which did not have a confident ID based on the library search. |
-mail <address> |
Send an email notification to the address when the hermie run is complete either successfully or due to fatal errors. |
-web <path> |
Copy the DTASelect.html file from the SEQUEST search to the web directory given by path. Something like /mnt/www/localhost/name/myresults.html will rename the file to myresults.html. The rest of the directory structure must already be present. |
| Options for component programs | |
Unlike the above options, these can be specified multiple times to add
multiple values. There is no gurarantee as to the order in which they
will be used in the command. The <option> should include any dashes.
Options that do not take arguments (e.g. the percolator -d flag)
should still be followed by a '=' and then no <argument>.
(Example: two different percolator options as they would be passed to hermie.
--perc-option -d= --perc-option --sqt-out=myfile.sqt )
Note that all SEQUEST, Hardklor and Bullseye options are passed via their
respective parameter files. |
|
--cz <option>=<argument> |
Specify the options used with charge-czar.
The format is the same as for the --dta-sequest
option. For available options, see the charge-czar
documentation |
--lib <option>=<argument> |
Specify the options used with the library search (BlibSearch).
The format is the same as for the --dta-sequest
option. For available options, see the BlibSearch
documentation |
--crux-option <option>=<argument> |
Specify options to be used with crux. For available options, see the parallel-crux documentation. |
--perc-option <option>=<argument> |
Specify options to be used with percolator. For available options, run percolator from a command prompt with no arguments. |
--dta-sequest <option>=<argument> |
Specify the options used with DTASelect on the SEQUEST search
results. A full list of available options can be found by
runnng DTASelect with the --help option. |
--dta-library <option>=<argument> |
Specify the options used with DTASelect on the library search data. |
Bugs:
Library search with no SEQUEST search. If a library search
is done with no SEQUEST search, DTASelect will fail on the library
results because there is no sequest.params file. As a work-around,
begin by running hermie with the -check option as though
you would also run SEQUEST. This writes the sequest.params file.
Then do the library search by running hermie without the
-check option and with the -nosequest
option.
Symbolic links and ms1 files. When running Hardklor/Bullseye, there must be an .ms1 file for each .ms2 file and they must be in the same directory. Hermie follows symbolic links back to their source and discards the link location. So unfortunately, putting links to ms1s and ms2s in the same directory will not work. The actual files must be in the same place. This problem is schedualed to be fixed.
Last updated March 30, 2010