Table of Contents
On this Page
Troubleshooting
You've run hermie and it has finished...or at least stopped. But you don't have any results. This section will describe the general approach for how to troubleshoot a run that did not successfully complete and how to finish the analysis. There are also notes about the cluster and some common reasons that a run fails.
When your run is interrupted
There are two basic ways that your hermie run may be interrupted: external forces (closing the active window, a kill command, a proteome shutdown) or internal failures from hermie or one of its component programs. In the first case, there is likely no problem with your input data or set up configuration so rerunning the missing steps should be straightforward. In the second case, the nature of the error will need to be tracked down and action taken to remedy the problem. Below is overview of the troubleshooting process.- Finding the point of failure. The last output
of your hermie run should be "Pipeline Complete". Look for it in the
last line of your log file. If it's there, your errors are not due to
an interruption. Alternatively, the last line will tell you which
program was running when the interruption occurred. Hermie runs the
component programs in this order:
- charge_czar
- BlibSearch
- SEQUEST
- percolator
- DTASelect
- BlibSearch
- charge_czar
- Looking for error messages. If hermie was interrupted due to an error, it should return a message with details about the error. Error messages are printed to the screen (or nohup.out). More information can be found in the *-messages files in each sub-directory. For instance, if there was an error with BlibSearch then you would look in pipeline/library/lib-messages. Detailed information about all of the possible error messages can be found here.
- Looking at cluster output. Every step of the
pipeline is submitted to the cluster. This involves five files for
each MS2: the submitted script and four output files from the
cluster. The script files end in
.sh. The cluster files end in.o#, .e#, .po#, .pe#where#is the process ID number. When a program exits successfully, hermie cleans up all these files, but if there is an error the files will remain. Error messages and output from the component programs can be found in these four files. - Examining the program output files. We would like to find out how much of the last step was completed. Again, we can read the *-messages and look for the expected output files. Sometimes it is also necessary to look at the contents of all of the output files as some may be empty or truncated.
- Finish the interrupted step.
In some cases it is easiest to redo the entire step. For example, if
BlibSearch was interrupted while searching the
second of twelve MS2 files, you might as well run hermie with all of the input files, starting with
the BlibSearch step.
For the slowest steps, you might not want to repeat any work that was
already done. In that case we would run the interrupted step only
those files that weren't completed. This is where the script files
described above come in handy. You can resubmit individual jobs to
the cluster and once they are finished, complete the hermie run as
described below. To submit the script seq.1.sh from the directory containing it use this
command
$ qsub -cwd seq.1.sh - Complete the hermie run. Finally, you can run hermie skipping all of the steps that had already completed. Always run the command from the same directory you did the first time, not from one of the pipeline directories. In cases where charge_czar has completed, you will want to specify that your input files are in pipeline/charge-czar rather than the original MS2 files.
Cluster issues
Each step of hermie is submitted to the cluster which creates a few issues to keep in mind when troubleshooting.
- Timing. While jobs are running on the
cluster, hermie is waiting and checking every so often to see if they have
finished. If the log file says that hermie is waiting for the queue
but
qstatsays that your job has completed, do not fear. Hermie probably hasn't checked since it completed. Wait twenty minutes and check again. - Interruption. Hermie and the cluster are
somewhat independent of each other. If hermie is interrupted (by
closing an active window or issuing a kill command), the cluster
continues to run. In this case, use
qstatto find out if your job is still running. You can either cancel it or wait for it to complete before restarting hermie. - Extra files Hermie generates a script file
for each job it runs on the cluster. They begin with a prefix like
czfor charge-czar orseqfor SEQUEST followed by a number (one for each MS2 file) and 'sh'. So a script for running charge-czar the first MS2 file might be cz.0.sh. These files are useful for resubmitting jobs that failed. They are automatically deleted for jobs that complete successfully.
Each job submitted to the cluster generates four new files. Their names will end in .o#, .e#, .po#, .pe# where # is the process ID number for the job. The output from the program run is put in the .o# file and the error messages from the program are put in the .e# file. (However, hermie often redirects the output to a 'messages' file). You can look in these files for information about how the program failed and you can delete them when you rerun the step.
Common errors
Below is a list of common errors that you might encounter and suggestions for how to fix them. This is an ever-growing list. Let me know if it doesn't cover your problem and I'll add it.
- SEQUEST produces SQT files with a header and no
matches. The most common reason that this happens is
if the SEQUEST program is not in your $PATH environment variable for
non-interactive logins. Try these steps.
- Log on to proteome and trying this command
If you get a usage statement with the word SEQUEST in it, then $PATH is not the problem, but if you get an error continue with the next step.
$ ssh maccoss001 search27 - Fix $PATH in your .bashrc file.
Open ~/.bashrc in a text editor like vi or Emacs. If you do not have a .bashrc file, get one with this commandAdd the line$ cp /home/frewen/bashrc ~/.bashrc
$ cp /home/frewen/bash_profile ~/.bash_profileexport PATH=/mnt/local/bin:$PATHsomewhere in the file. Save and close. - Try the ssh test command again. Now you should see a usage statement.
- Log on to proteome and trying this command
- DTASelect.html is empty. It is possible that you didn't have any good matches. Look in pipeline/dtaselect/sequest/dtas-messages to see if it says 'No proteins passed criteria' toward the end of the file. Either there really aren't any good spectra or DTASelect may have been run with the wrong parameters. Look in the log file to see what options were scheduled to be used. If they were not correct, try rerunning DTASelect with the correct options.