BiblioSpec associated file formats

BiblioSpec makes use of several previously-defined file formats for input and output. Below are descriptions of these as well as links to additional information

MS2 format:  This is a text file that stores MS/MS spectrum data. There are four types of lines of data. Lines beginning with 'H' are header lines and contain information about how the data was collected as well as comments. Lines beginning with 'S' are followed by the scan number and the precursor m/z. Lines beginning with 'Z' give the charge state followed by the mass of the ion at that charge state. If a spectrum's charge state cannot be determined, there may be multiple Z lines indicating the possible charges. Finally, a line with a pair of numbers is a peak (the m/z and intensity) in a spectrum. These lines are arranged in the file with all the header lines at the beginning, then each spectrum has an S line, one or more Z lines and then a list of peaks sorted by m/z.

A complete description of MS2 files is published in McDonald et al. (2004) Rapid Commun. Mass Spectrom. 18: 2162-2168 . There is additional information at the SEQUEST support site.

Here is an example file

SQT format:  This is a text file that stores information about SEQUEST search results. We are borrowing the format for BiblioSpec search results so that it may be used with pre-existing tools that process SQT files.

The general format is for each line to begin with one of four codes: H, S, M, or L. The codes indicate a line with information about either the entire search (header, H), a query spectrum (S), a query-library match (M), or a locus (protein, L). BlibSearch does not produce L lines but they can be inserted with the programs PepGrep or addLoci. The file begins with header lines and is followed by search results for all query spectra. The pattern for one spectrum is S[MLk]n, where a query spectrum has n matches, each of which have k loci.

The tables below outline each field in the spectrum (S) and match (M) lines. The upper row contains the value entered from a BiblioSpec search and below is the value entered from a SEQUEST search, if different.
Spectrum lines
Sscan numberscan numbercharge0server precursor m/ztotal ion current0 number of library spectra compared
meaning for SEQUEST    process time  observed mass lowest spnumber of sequence matches
Match lines
M rank by primary comparison score duplicates (how many copies of the library spectrum occur in the redundant library) library spectrum precursor m/z DeltaCn primary score library ID number number of matched peaks number of library peaks in pre-processed spectrum library spectrum sequence Above or below threshold
meaning for SEQUEST Xcorr rank SP rank calculated mass   Xcorr SP score matched ions expected ions sequence matched validation status

The validation status is set to Y if the score for the match was above the pre-determined threshold or N if it was below. This is a convenient way of getting DTASelect to filter out all positive matches.

Sequence modifications in the SQT file are not yet implemented
Sequence modifications are indicated by non-alpha-numeric characters which follow the modified residue in the sequence string. The meaning of modification codes is defined in the header. For each code, there is a line of the form
H DiffMod [N]*=[+/-amount]
where [N] is a list of all possible residues that could have this modification, '*' (or other such character) is the code for this modification, and [+/-amount] is the value of the mass shift for this modification.

A complete description of SQT files is published in McDonald et al. (2004) Rapid Commun. Mass Spectrom. 18: 2162-2168 . Here is an example file produced by BiblioSpec, and here is a file produced by SEQUEST

BiblioSpec also uses several unique file formats for input and output, each of which are described below

SSL (spectrum-sequence list) format:  BlibBuild requires an SSL file as one of its inputs. This tab-delimited text file contains the list of spectra to be included in a library. The file begins with a (tab-delimited) header line that reads

file scan charge sequence modifications annotation

Each subsequent row describes one spectrum. The first column contains the full path of the MS2 file in which the spectrum is found, followed by the scan number, charge, peptide sequence of the spectrum, any post-translational modifications to the peptide, and the annotation code. Modifications are entered as a string of the same length as the sequence. Unmodified residues are encoded as zeros and modifications are encoded as a unique character. If a sequence has no modified residues, a single 0 may be entered as the modifications string. The table of modifications and their codes can be found here. For a description of the annotation codes, see the description of BlibUpdate. An example SSL file might look like this.

report format:  BlibSearch can produce a second output file containing the results of the search. The header contains the details of the search parameters. Each subsequent row summarizes one query-library match listing the query scan number, library identifier (only relevant if more than one library is searched), library spectrum ID number, the score of the match, the rank of this match for this query spectrum, the precursor m/z of the query, the charge of the query spectrum, the precursor m/z of the library spectrum, the charge of the library spectrum, the annotation for the library spectrum, the "redundancy" of the library spectrum, and the peptide sequence. See BlibUpdate for more on the annotation codes. The "redundancy" number is only relevant for filtered libraries. It is the number of spectra compared to make the selection.

params format:  Both BlibSearch and BlibFilter will take an optional params file in which additional options can be specified. Options are described on the programs' documentation pages. The file should begin with the word 'params' and be followed by options that are specified as they would be at a command line: a dash '-' followed by the letter code of the option or two dashes '--' followed by the name of the option, the name or letter being followed by any required arguments. Parameters may be separated by spaces, tabs or new-line characters.