Testing Sample CRAWDAD Data

Overview

We've created a directory containing MSMAT files and an align_config.xml in $CRAWHOME/Alignments/test_data/short_sample_start. Sample final outputs are in $CRAWHOME/Alignments/test_data/short_sample. Copy the short_sample_start directory to your home directory to try processing!

The default craw_config.xml contains some errors, so you'll need to fix these before proceeding. Run craw_conf.py on the craw_config.xml file, and it should report the first error it finds and will then bail out. There are three errors in all -- proceed below for tips on fixing them

We'll then cover a bit about how to view the results from this sample data

Fixing align_config.xml file

Why are you given a broken file? So that you can learn a bit about some common mistakes in creating these files. You'll have to run craw_conf.py on the craw_config.xml file to check what the mistakes are...
While there is copious output from craw_conf.py, you should be able to infer the source of the error in the file from the last 10 lines or so. Note that this section is just designed to help you with trying to figure out what some error sources are in the config file alone ...

For hints, go down to the end of the document If you just want to run the data, copy craw_config.xml.fixed to craw_config.xml and proceed to the next section.

What is the sample data?

It is the same data as from the CRAWDAD paper (Finney et. al. 2007, Anal. Chem. add link when published), except truncated to be from 40' to 45' of retention time, and binned to 1 m/z bin sizes, for faster processing. In more detail, it consists of 6 LC/MS runs each from a control and an IPTG-induced E. coli sample, run on an Orbitrap-LTQ at 30,000 MS resolution w/ up to 5 DDA MS/MS scans. Chromatography was on a Thermo surveyor pump, using a 2hr gradient. Refer to the paper for more details.

How do I run the data?

Briefly -- run
craw_conf.py craw_config.xml
you can edit the config file to have this submit to an SGE queue by uncommenting the line with <global_param name='queue'&rt; ... output will go into individual directories for each step. The final output will go into the directory specified in the 'webdir' parameter within the 'diffs' action

How do I know it's successful

1. Has it completed all steps?
The last msmat files produced should be for the 'mn_e2' step -- then go look at the output directory for this step



























































































XML File Hints

Or, go to solutions

Problem1

This is simply a problem in the XML itself. Recall that XML needs tags to be balanced -- this is a requirement of a well-formed document. You can also open up XML files in a web browser, which will point out the line location of the error. Note to advanced users -- there is currently no DTD for the CRAWDAD config files.

Problem2

In this case, there are problems with the way some of the msmats and ms1s are named -- look in the experiment_groups section of the XML file

Problem3

Drats!! -- remember rules about how action tags are to be set up. 1. labels need to be unique 2. the 'previous' tag refers to the _input_ files for a certain action. Therefore, it has to refer to an action that exists.





























































































Problem1

If you're too lazy to figure it out, I'm too lazy to write it up. Bwah hah hah!

Problem2

Problem3