parallel-crux

Usage:
parallel-crux [parallel options] sequest-search [search options] <ms2 file> <protein input>
Description:

Runs crux sequest-search distributed across a number of nodes in a cluster. The spectra in the ms2 file are divided into blocks of 100 and each block is searched separately. The first n blocks are sent out to the n nodes available. As each node finishes the next block is sent to it. When all have completed, the results are concatenated together into one set of output files.

Input:

Output:

The output files are the same as those for crux sequest-search. All files are put in the directory named crux-output.

Parallel Options: Search Options:

All of the options available to crux sequest-search can be used with parallel-crux with a few exceptions.

DISABLED Search Options: Using the cluster

Users are encouraged to run parallel-crux as part of a script that is submitted to a cluster/queue management program, specifically the Sun Grid Engine (SGE). An example of a user-generated script might look like this

#!/bin/sh
#$ -S /bin/sh
#$ -N pcrux
#$ -pe mpich 3
parallel-crux --nodes $TMPDIR/machines sequest-search two-spec.ms2 test.fasta

The script (suppose it is named runpc.sh) would be submitted to the queue by the user from a directory containing the files two-spec.ms2 and test.fasta with a command such as

qsub -cwd runpc.sh

In the example script, the sh shell is used, it's full path being /bin/sh. The job is named with the -N option. The -pe option is requesting 3 nodes. The last line of the script is the actual parallel-crux command. The SGE writes a file named machines in the directory given by $TMPDIR. The file contains the names of the nodes assigned to this job. By passing the name of this file to parallel-crux it will respect the node assignment determined by the queuing system.