[midterm key]

Midterm is on October 27

Amino acid tables from QS3 will be provided as testing aids. You are responsible for bringing your own calculator. No graphing calculators allowed.


1. A capillary electropherogram, also known as a chromatogram, is generated for a sequence. Dideoxynucleotide triphosphate-conjugated dyes give the following colors:
ddATP - green
ddCTP - blue
ddGTP - black
ddTTP - red
What is the sequence the following chromatogram:

Is it the same sequence shown in the following gel?


2. Write the DNA sequence corresponding to this chromatogram:

What is base 66?


3. A single nucleotide substitution at which position in a codon would most likely have the greatest impact on the function of the encoded protein: the first, the second, or the third? Why?


4. What does MALDI stand for?



5. You have a gene isolated in a vector and know the DNA sequence on the ends of the gene. You want to sequence the rest of the gene, so you design two primers, P1 and P2. You put the primers in a tube with the vector and DNA polymerase, do 25 rounds of amplification, and get the following result:

What went wrong?


6. Sequences are available for the following genomes (among others): human, chimp, mouse, chicken, pufferfish (fugu ribres), D. melanogaster, C. elegans, Saccharomyces cerevisiae, Arabidopsis thaliana. (A) Ranks the genomes by size. (B) Rank the genomes by gene number.


7. Your friend tries to sequence your gene from question #5 for you. In her results, one of her chromatograms has a problem:

She decides to do PCR on the gene sequence using P1 and P2. Here are her results from the gel electrophoresis:

What is the reason for the bad chromatogram?


8. Which of the following point mutations would most likely have the greatest impact on the function of the encoded protein: a single nucleotide substitution mutation (i.e. A mutates to G) or a single nucleotide deletion (i.e. A is deleted from the sequence)? Why?


9. What is the difference between a Southern blot, a Northern blot, and a Western blot?


10. Name 3 pros and 3 cons of protein analysis using the mass spec shotgun approach over the gel-based approach.


11. Although the genetic code is universal, organisms usually have their own preference for codon usage. (For example, the web site http://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=155864 gives statistics on the codon usage of Escherichia coli.) Your colleague has an EST fragment from E. coli with the following sequence: AAGUCAUUAUUUUCG.

Assuming this is the coding strand, can you help him to identify the most likely translation frame?


12. The restriction endonuclease EcoRI has the following recognition site:

How frequently do you expect it to cut in the mouse genome? How about the E. coli genome?


13. You put the following in a tube: 10 copies of DNA template, 10 copies of a matching primer, and DNA polymerase. How many copies of DNA do you expect to have after 5 rounds of amplification?


14. What is the mass of this peptide: LNVEASAPQTR


15. What is the +2 m/z of the peptide in question #14?


16. Find the intron(s) in the "world's shortest intron-containing gene". In addition, spell out the amino acid sequence it encodes.

ATGCCGTCTAGGTAA


17. There are two general strategies for performing gene prediction: similarity based approaches and statitistics-based approaches. Explain which genes are likely to be missed by the statistics-based approach and which are likely to be missed by the similarity-based approach.


18. Find the a2-b2 ion pair and the first two N-terminal amino acids (in order) of the following spectrum:

[Note: The spectrum used on the exam will be different than the above].


19. Name the two most common methods for generating ions in proteomics?


20. What are the two low mass ions that are diagnostic of the y1 ions for a tryptic peptide?


21. How many phases (dimensions) of separation are there in the following MudPIT experiment?

SCX = Strong cation exchange, RP = Reverse Phase


22. You sequence an EST from your laboratory's favorite inbred mouse strain, and BLAST it against the mouse genome to match the renin gene. However, you did not get an exact match:

CATCCGCAAGTTCTATACA-GAGTTTGATCGGCATAACAATCGCGTTGGATTC-GCCTTG
||||||||||||||||||| |||||||||||||||||||||||| |||||||| || | |
CATCCGCAAGTTCTATACACGAGTTTGATCGGCATAACAATCGCATTGGATTCCGCATGG
Is there a mistake in the database? Explain your answer.


23. Explain the difference between DNA amplification and Polymerase Chain Reaction.


24. Below are the recognition sites of two restriction enzymes, BamHI and BclI.

a) Does cleavage by BamHI result in a 5’ or 3’ overhang? What is the sequence of this overhang?

b) Does cleavage by BclI result in a 5’ or 3’ overhang? What is the sequence of this overhang?
c) Given the DNA shown below:

5’ ATTGAGGATCCGTAATGTGTCCTGATCACGCTCCACG 3’
3’ TAACTCCTAGGCATTACACAGGACTAGTGCGAGGTGC 5’
If this DNA was cut with BamHI, how many DNA fragments would you expect? Write out the sequence of these double-stranded DNA fragments.


25. What is a genomic DNA library?


26. Why are two antibodies used in Western blotting?


27. Interpret the following Sanger sequencing gel: write the DNA sequence corresponding to the gel bands.


28. Use the information provided in question #27 to answer the following:
If the ratio of ddTTP to dTTP in the 'T' tube were adjusted to 1:4, what would be the probability of generating fragments of at least length 20bp? [Note: The sequence used on the exam will be different than the above].


29. What is peak capacity? Explain your answer in terms of SDS PAGE.


30. You have an interesting protein related to heart disease in mice. Careful analysis of the mRNA sequence shows the mutation to be a 3 amino acid deletion in the middle of the protein sequence. You decide to confirm this by peptide fragmentation analysis in a mass spectrometer. However, after database searching on your data (using the mouse reference proteome), you only find peptides upstream or downstream of your region of interest. Why did your approach fail?


31. You design the two primers: TTAGGGTTCAGTAGAACTTG (20 bases), ACAGCAAGGGGGCTAGTGAGGCT (23 bases). Each of them can bind to the human genome in hundreds of different places. How come when you perform PCR with them, you only get one band with a length of approximately 1.5 kb?


32. What does the acronym MudPIT stand for?


33. What is a BAC?


34. What is a contig?


35. Hypothetical organism X has the following DNA sequence. Part of the promoter is indicated by the bollded sequence. Transcription starts at the non-bold A/T base pair.

5’ xxxx TATTTGATAG CTCTATGCAT GCATGGGTCC TGAAGTTCAG ATCTTTGAGT CATAGGAGTC 3’
3’ xxxx ATAAACTATC GAGATACGTA CGTACCCAGG ACTTCAAGTC TAGAAACTCA GTATCCTCAG 5’
a) Give the RNA sequence of the first 25 bases following transcription.
b) What are the first 5 codons of the resulting protein?


36. What is the mass of the peptide LLVVYPWTQR?


37. What is the +2 m/z of the peptide in question #18?


38. Name 3 covalent modifications that occur on proteins post-translationally.


39. What is the molecular weight of the "protein" in this mass spectrum? (note: this is not a tandem mass spectrum)


40. What are the two dimensions in 2D-gel electrophoresis?


41. What is a pseudogene? Why do gene prediction algorithms have difficulty discerning pseudogenes from true genes?


42. Reverse phase chromatography separates molecules based on what general physiochemical property?


43. Where does the enzyme trypsin cleave proteins?


44. Most statistical gene prediction programs require a set of parameters, estimated based on a training set of DNA sequences with genes clearly marked. What are the two major experimental methods used to reliably find a gene?


45. Sequence homology or similarity information is used in both the similarity based and the comparitive genomics approaches for gene prediction. What is the difference between these two approaches?