This document gives instructions for setting up your environment to run SEQUEST jobs on the computers proteome, grid, and sage (for the quartz cluster). It has only a few details of the SEQUEST search itself. You should need to follow these instructions only once per computer when you first receive your account.
The program that runs SEQUST is run_ms_ssh. It distributes SEQUEST jobs over a set of nodes using ssh calls to farm out the processing of each spectra. Most of this set-up process deals with ensuring that ssh is working properly.
Your computer keeps track of what other computers you have connected to by keeping a list of "keys" for each computer. If you try to connect to a new computer, it will give you a warning and ask that you confirm (by typing 'y') that you really do want to connect. Since you will not be monitoring run_ms_ssh, you will not be able to confirm any new connections. To get around this, we need to generate ssh keys for each node (computer) in the cluster.
.. press return when prompted to save the key in /home/user/.ssh/id_dsa
.. press return when prompted for a passphrase
NOTE: The step cp authorized_keys authorized_keys.bak is an optional step to back-up any existing keys. If you are a very new user, you may not have the file authorized_keys in which case you will get an error saying "no file or directory". Ignore the error and continue with the next step.
Next, we need to make sure you are not prompted to verify the keys of the nodes. Run this script
SEQUEST v.27 (rev. 9), (c) 1993
Molecular Biotechnology, Univ. of Washington, J.Eng/J.Yates
Licensed to John Yates III @ Univ. of Washington
SEQUEST usage: search27 [options] [dtafiles]
options = -Dstring where string specifies the database to be searched
-Pstring where string specifies an alternate parameter file name
(sequest.params is the default parameters file)
-S sets SEQUEST to not re-search .dta files if .out files exists
For example: sequest *.dta
then everything is fine. You may need to replace 'm001' with a different
computer name if you are not on proteome. Take a name from the list
that was generated from setup_keys.pl (could be something like 'maccoss001'
or 'q1').
If you connect correctly, but get a 'command not found' warning, your $PATH environment variable will need to be set. Normally, $PATH would be set in your ~/.bashrc file, but this does not work for the non-interactive sessions used by SEQUEST. Please ask for assistance.
SEQUEST, which is run by run_ms_ssh, requires two inputs: an MS2 file (or files) containing spectra to be searched, and a file called 'sequest.params'. In order to run run_ms_ssh on the cluster, you will also need a script. These are described below. The easiest way to get a sequest.params file is to copy one from someone else
#!/bin/sh # #$ -S /bin/sh # #$ -N your_job_name # # Parallel Environment Request #$ -pe mpich 8-16 # echo "Got $NSLOTS slots." PATH=$SGE_O_PATH:$PATH run_ms_ssh -f $TMPDIR/machines *.ms2Change the word 'your_job_name' to some word (no spaces) which you would like to name the run. The numbers '8-16' decide how many nodes your job will request. You can change it to be a single number (like 4) or a range as it is here.