Use of Shotgun
Proteomics for the Identification, Confirmation, and Correction of C. elegans Gene Annotations
We describe a general mass
spectrometry-based approach for gene annotation of any organism and
demonstrate
its effectiveness using the nematode C.
elegans. We detected 6,779 C.
elegans proteins (67,047 peptides),
including 384 that, although annotated in Wormbase WS150, lacked cDNA
or other
prior experimental support. We also identified 429 new coding sequences
that
were unannotated in WS150. Nearly half
(192/429) of the new coding sequences were confirmed with RT-PCR data.
Thirty-three (~8%) of the new coding sequences had been predicted to be
pseudogenes, 151 (~35%) reveal apparent errors in gene models, and 245
(57%)
appear to be novel genes. In addition, we verified 6,010 exon-exon
splice junctions
within existing Wormbase gene models.
Our work confirms that mass spectrometry is a powerful
experimental tool
for annotating sequenced genomes. In
addition, the collection of identified peptides should facilitate
future
proteomics experiments targeted at specific proteins of interest.
ms2
and sqt
file formats
are described
in McDonald
et al.(2004).