Loading sqts into the database

Assumptions

Currently we only support loading in MS/MS IDs from the SQT data format, described in

Caveats

Your sqt file must have a unique name across the whole database
If you make a mistake and load a sqt incorrectly you will have to go into the database to delete that sqt. See MySQL primer below.
The following problems listed above will be addressed at a later data

Loading sqts into the database

parse_sqt.pl experiment_name sqt_name ms2_name
Experiment name is arbitrary at this point, in the future it could be used to make queries to groups of runs
running it w/o commands will show some more options.
It is tedious to type this command for each file. To get around this tedium you can do for s in *.sqt ; do parse_sqt.pl your_experiment_name $s ${s/sqt/ms2} ; done MAKE SURE YOU HAVE THE MS2S for the sqts in this case.
Or, you can run it for one file, make sure it is working (it will take ~5-10 mins.), and then copy that command to a file, cut + paste the line repeatedly, and edit the file names in situ.

Loading FASTA Database annotations into the database

Sequence annotations are also loaded into the database from FASTA formatted files. The load_fasta_db.pl script is used for this purpose: load_fasta_db.pl FASTA_database_file Usage: load_fasta_db.pl [ --genbank | --uniprot | --fly | --yeast ] fasta_file organism_name (organism_name, use species name in quotes, i.e. 'E. coli' or 'Homo sapiens')

Brief MYSQL primer

If you have made a mistake in loading the files.

Connect :
mysql -u sqt -p sqt(password: diddly)

describe the 'runs' table : explain runs

show sqts in db matching a pattern: SELECT file_name from runs where file_name like '%WORD%'

delete an existing sqt in the db: DELETE from runs where file_name = 'something.sqt'


Do NOT type 'DELETE from runs;' or Anton Chigurh will pay you a visit
To get some idea of scans per sqt file, you can do:
SELECT count(*) from runs r,scans s where r.runID = s.runID and r.filename = 'something.sqt'
 -- this is
good to check that things are loaded correctly. To check a whole experiment do : 

SELECT count(*), r.filename from experiments e, runs r, scans s where  e.experiment_name = 'some name AND r.experimentID = e.experimentID AND s.runID = r.runID GROUP BY r.runID