hermie: Troubleshooting

Home
Getting Started
Documentation
Customization
Examples
Troubleshooting
Index

Troubleshooting

You've run hermie and it has finished...or at least stopped. But you don't have any results. This section will describe the general approach for how to troubleshoot a run that did not successfully complete and how to finish the analysis. There are also notes about the cluster and some common reasons that a run fails.

When your run is interrupted

There are two basic ways that your hermie run may be interrupted: external forces (closing the active window, a kill command, a proteome shutdown) or internal failures from hermie or one of its component programs. In the first case, there is likely no problem with your input data or set up configuration so rerunning the missing steps should be straightforward. In the second case, the nature of the error will need to be tracked down and action taken to remedy the problem. Below is overview of the troubleshooting process.

Finding the point of failure. The last output of your hermie run should be "Pipeline Complete". Look for it in the last line of your log file. If it's there, your errors are not due to an interruption. Alternatively, the last line will tell you which program was running when the interruption occurred. Hermie runs the component programs in this order:

charge_czar
BlibSearch
SEQUEST
percolator
DTASelect
If, for example, the last line of the log file was Running BlibSearch then charge_czar completed, but none of the other steps did.
Looking for error messages. If hermie was interrupted due to an error, it should return a message with details about the error. Error messages are printed to the screen (or nohup.out). More information can be found in the *-messages files in each sub-directory. For instance, if there was an error with BlibSearch then you would look in pipeline/library/lib-messages. Detailed information about all of the possible error messages can be found here.
Looking at cluster output. Every step of the pipeline is submitted to the cluster. This involves five files for each MS2: the submitted script and four output files from the cluster. The script files end in .sh. The cluster files end in .o#, .e#, .po#, .pe# where # is the process ID number. When a program exits successfully, hermie cleans up all these files, but if there is an error the files will remain. Error messages and output from the component programs can be found in these four files.
Examining the program output files. We would like to find out how much of the last step was completed. Again, we can read the *-messages and look for the expected output files. Sometimes it is also necessary to look at the contents of all of the output files as some may be empty or truncated.
Finish the interrupted step. In some cases it is easiest to redo the entire step. For example, if BlibSearch was interrupted while searching the second of twelve MS2 files, you might as well run hermie with all of the input files, starting with the BlibSearch step. For the slowest steps, you might not want to repeat any work that was already done. In that case we would run the interrupted step only those files that weren't completed. This is where the script files described above come in handy. You can resubmit individual jobs to the cluster and once they are finished, complete the hermie run as described below. To submit the script seq.1.sh from the directory containing it use this command
$ qsub -cwd seq.1.sh
Complete the hermie run. Finally, you can run hermie skipping all of the steps that had already completed. Always run the command from the same directory you did the first time, not from one of the pipeline directories. In cases where charge_czar has completed, you will want to specify that your input files are in pipeline/charge-czar rather than the original MS2 files.

Cluster issues

Each step of hermie is submitted to the cluster which creates a few issues to keep in mind when troubleshooting.

Timing. While jobs are running on the cluster, hermie is waiting and checking every so often to see if they have finished. If the log file says that hermie is waiting for the queue but qstat says that your job has completed, do not fear. Hermie probably hasn't checked since it completed. Wait twenty minutes and check again.
Interruption. Hermie and the cluster are somewhat independent of each other. If hermie is interrupted (by closing an active window or issuing a kill command), the cluster continues to run. In this case, use qstat to find out if your job is still running. You can either cancel it or wait for it to complete before restarting hermie.
Extra files Hermie generates a script file for each job it runs on the cluster. They begin with a prefix like cz for charge-czar or seq for SEQUEST followed by a number (one for each MS2 file) and 'sh'. So a script for running charge-czar the first MS2 file might be cz.0.sh. These files are useful for resubmitting jobs that failed. They are automatically deleted for jobs that complete successfully.
Each job submitted to the cluster generates four new files. Their names will end in .o#, .e#, .po#, .pe# where # is the process ID number for the job. The output from the program run is put in the .o# file and the error messages from the program are put in the .e# file. (However, hermie often redirects the output to a 'messages' file). You can look in these files for information about how the program failed and you can delete them when you rerun the step.

Common errors

Below is a list of common errors that you might encounter and suggestions for how to fix them. This is an ever-growing list. Let me know if it doesn't cover your problem and I'll add it.

SEQUEST produces SQT files with a header and no matches. The most common reason that this happens is if the SEQUEST program is not in your $PATH environment variable for non-interactive logins. Try these steps.
1. Log on to proteome and trying this command
  $ ssh maccoss001 search27
  If you get a usage statement with the word SEQUEST in it, then $PATH is not the problem, but if you get an error continue with the next step.
2. Fix $PATH in your .bashrc file.
  Open ~/.bashrc in a text editor like vi or Emacs. If you do not have a .bashrc file, get one with this command
  $ cp /home/frewen/bashrc ~/.bashrc $ cp /home/frewen/bash_profile ~/.bash_profile
  Add the line export PATH=/mnt/local/bin:$PATH somewhere in the file. Save and close.
3. Try the ssh test command again. Now you should see a usage statement.
DTASelect.html is empty. It is possible that you didn't have any good matches. Look in pipeline/dtaselect/sequest/dtas-messages to see if it says 'No proteins passed criteria' toward the end of the file. Either there really aren't any good spectra or DTASelect may have been run with the wrong parameters. Look in the log file to see what options were scheduled to be used. If they were not correct, try rerunning DTASelect with the correct options.

Table of Contents

On this Page

Troubleshooting

When your run is interrupted

Cluster issues

Common errors