parallel-crux
Usage:Description:parallel-crux [parallel options] sequest-search [search options] <ms2 file> <protein input>Runs
crux sequest-searchdistributed across a number of nodes in a cluster. The spectra in the ms2 file are divided into blocks of 100 and each block is searched separately. The firstnblocks are sent out to thennodes available. As each node finishes the next block is sent to it. When all have completed, the results are concatenated together into one set of output files.Input:
Output:
- sequest-search – The name of the crux command to run. See the
cruxdocumentation for more details. (Note: soon search-for-matches will also be available.)- <ms2 > – The name of the file (in ms2 or cms2 format) from which to parse the spectra.
- <protein input> – The name of the fasta file containing protein sequences or the directory containing a protein index from which to retrieve proteins and peptides.
Parallel Options:The output files are the same as those for
crux sequest-search. All files are put in the directory named crux-output.Search Options:
--nodes <filename>– A file containing the names of the nodes available to the process.--block-size <num spec searched per call>– The number of spectra searched at a time. Default 100.DISABLED Search Options:All of the options available to
crux sequest-searchcan be used withparallel-cruxwith a few exceptions.Using the cluster
--output-dir <directory name>– The results are always written to a directory in the CWD namedcrux-output--fileroot <file prefix>– The file prefix will be the root name of the .ms2 file searched.--overwrite <T | F>– Overwrite is always TRUE. Existing results for the same .ms2 files will be erased.--scan-number <range>– All scans will be searched.Users are encouraged to run
parallel-cruxas part of a script that is submitted to a cluster/queue management program, specifically the Sun Grid Engine (SGE). An example of a user-generated script might look like this#!/bin/sh #$ -S /bin/sh #$ -N pcrux #$ -pe mpich 3 parallel-crux --nodes $TMPDIR/machines sequest-search two-spec.ms2 test.fastaThe script (suppose it is named runpc.sh) would be submitted to the queue by the user from a directory containing the files two-spec.ms2 and test.fasta with a command such as
qsub -cwd runpc.shIn the example script, the
shshell is used, it's full path being/bin/sh. The job is named with the-Noption. The-peoption is requesting 3 nodes. The last line of the script is the actualparallel-cruxcommand. The SGE writes a file namedmachinesin the directory given by$TMPDIR. The file contains the names of the nodes assigned to this job. By passing the name of this file toparallel-cruxit will respect the node assignment determined by the queuing system.