craw_conf.py Documentation

Overview

Crawdad consists of a sequentially applied set of modules, as shown in the flowchart below

Briefly, the steps consist of:

Extracting ms1, ms2 data -- this is done with MakeMS2 on the instrument machines. .ms1 and .ms2 files are then moved to proteome
Binning to m/z intervals -- done with ms1_to_msmat.py script (TODO -- merge into MakeMS2)
Scan interval normalization -- scans are resampled so as to be at a consistent timescale
Chromatogram Smoothing -- chromatograms are smoothed using a savitsky-golay smoothing window
Alignment -- each run is aligned in the retention time dimension to a common master template run
Detecting Differences -- use the t-test on aligned data to find regions of differential intensity
Extraction of MS/MS data, SEQUEST, etc... (use hermie)
Associate MS/MS peptide IDs with difference regions -- you load your sqts into a database, as well as your results from difference detection, and produce a set of annotated differences
Assemble difference regions that are annotated with peptide identifications into lists of proteins and peptides that are changing in abundance

All these steps are controlled through the craw_conf.py application
.

craw_conf.py is a control script for CRAWDAD which uses an XML configuration file to define the steps and parameters for pre-processing, aligning, discovering signal differences, and mapping those differences to peptide identifications. Most interactions with craw_conf.py are set in the align_config.xml file, which is outlined below

align_config.xml

A sample align_config.xml file is shown here

the steps, or 'actions' as listed above are given a label in align_config.xml, and output for those steps are saved to the directories named after the labels. If the computer crawdad is running is able to produce images w/ python, then base peak chromatograms will be produced as images from the output of each sample group for each step. these are.

Generally speaking, if you edit the config file to run jobs on the queue, (using the ) as shown above, craw_conf.py will submit one single-processor job per file per step as outlined above. You can process more than two 'sample groups' at once, but at the moment comparisons to find differences are limited to comparing two.

Loading SQT into the MySQL database

Running Sample Data

run visualization programs

We have two classes of run visualization programs:

2D-heatmap based views of an entire LC-MS run

msmat_img.py will display an LC-MS run from an .msmat file as a 2D heatmap, which is zoomable. Instructions are available
chromatogram / spectra viewing software

chrom_viewer.py displays one or more spectra or chromatograms (base peak, XIC, or TIC) Instructions are available

these programs are useful for viewing individual steps of the output, i.e. for assessing alignment quality, or determining if the appropriate smoothing has been used

CRAWDAD File

MSMAT format files Store the initial binned MS1 data and intermediate steps of individual runs

End Products from CRAWDAD

brief notes on XML