Assumptions

Currently we only support loading in MS/MS IDs from the SQT data format, described in

Caveats

Loading sqts into the database


parse_sqt.pl experiment_name sqt_name ms2_name
Experiment name is arbitrary at this point, in the future it could be used to make queries to groups of runs
running it w/o commands will show some more options.
It is tedious to type this command for each file. To get around this tedium you can do for s in *.sqt ; do parse_sqt.pl your_experiment_name $s ${s/sqt/ms2} ; done MAKE SURE YOU HAVE THE MS2S for the sqts in this case.
Or, you can run it for one file, make sure it is working (it will take ~5-10 mins.), and then copy that command to a file, cut + paste the line repeatedly, and edit the file names in situ.


Loading FASTA Database annotations into the database


Sequence annotations are also loaded into the database from FASTA formatted files. The load_fasta_db.pl script is used for this purpose: load_fasta_db.pl FASTA_database_file Usage: load_fasta_db.pl [ --genbank | --uniprot | --fly | --yeast ] fasta_file organism_name (organism_name, use species name in quotes, i.e. 'E. coli' or 'Homo sapiens')

Brief MYSQL primer

If you have made a mistake in loading the files.

  • describe the 'runs' table : explain runs
  • show sqts in db matching a pattern: SELECT file_name from runs where file_name like '%WORD%'
  • delete an existing sqt in the db: DELETE from runs where file_name = 'something.sqt'
  • Do NOT type 'DELETE from runs;' or Anton Chigurh will pay you a visit
  • To get some idea of scans per sqt file, you can do: SELECT count(*) from runs r,scans s where r.runID = s.runID and r.filename = 'something.sqt'
    -- this is good to check that things are loaded correctly. To check a whole experiment do :
    SELECT count(*), r.filename from experiments e, runs r, scans s where e.experiment_name = 'some name AND r.experimentID = e.experimentID AND s.runID = r.runID GROUP BY r.runID