Today's exercise involves gene prediction We will use Chris Burge's GENSCAN to search for gene sequence among three short human chromosomal sequences. Also, we will use Jim Kent's BLAT to map our sequences back to the UCSC Genome Browser. The first sequence will be analyzed in class. The remaining two sequences are assigned as homework.



Let's take a quick look at the GENSCAN interface.


The results of GENSCAN are posted in a table marking several useful features of genes that it found. Above is a key describing the elements GENSCAN can find. This table is also output below your results whenever you do a search for your reference.


You can also visualize any genes found in your sequence by clicking on the following links from your results page. There are two options you can choose, depending on which software you have installed on your computer.


The BLAT program can quickly look for a sequence in the human genome and return the genomic regions with high similarity to your query sequence. Follow the link above to use BLAT, or go to the UCSC genome browser home page, select BLAT. The tool looks like the picture above. Select human genome, paste your desired sequence and click submit.

One last point: To answer the homework questions, you must get comfortable using tables in the UCSC Genome Browser. Once you have isolated a cds with BLAT, you can click on the "Tables" link at the top of the page. This allows you to quickly look-up the exon boundaries. However, the tool is not intuitive, so make sure you use the following setting, indicated in red boxes:

Your results will look like the image below. You can click on the image for a larger view. If you want the UCSC Genome Browser to focus on a particular exon, simply get the information from the regions that are underlined. For example, to get the first exon, search for "chr11:5203271-5203532"


In class exercise

Sequence #1
The above link leads to a short sequence of chromosomal data that you will search with GENSCAN. Follow these instructions, then answer the following questions:

Step 1: Run Genscan. Go to the Genscan web site, submit your sequence, be sure to check "Predicted CDS and peptides" so that you can get the predicted cds, and wait for the result.

Step 2: In another window, display the genomic regions on UCSC genome browser. You can display the predicted cds on the UCSC genome browser BLAT tool. Copy the prediced cds, open the BLAT tool, and submit your sequence. You will be brought to the actual display window of the genome browser where your sequence from BLAT search is displayed together with several other annotation tracks. You can play around the browser as it integrates a lot of information. Click on your sequence from BLAT search track to see the actual base-by-base display. Make sure you can recognize the signals (Start codon, Stop codon, splicing signals, etc.)

Step 3: Compare your predictions against the annotation for known genes and predictions by other gene-finding algorithms. Compare your predictions against the known genes track. If the known genes track is not shown, you can display it by using the drop down controls under the browser window. Analyze prediction result for each sequence.

Answer the following questions:
1. In GENSCAN, how many exons are predicted?
2. Does the predicted gene match a known gene in the UCSC browser?
3. If you answered "yes" to #2, then compare the GENSCAN prediction to the UCSC reference sequence; How many exons are missed?
4. GENSCAN gives a probability score for each predicted exon. Are the probability values for exons predicted by GENSCAN informative?
5. Of the exons shared by GENSCAN and UCSC Genome browser, what is the accuracy at the nucleotide level - are all the predicted exons the same size as the UCSC reference exons? If not, state which exons are different. You can go to the browser home page, select Table Browser and select the knownGene table to extract the exact exon locations for the known genes.


Homework, Due Monday 5pm

homeworkkey

Here are two additional short human chromosomal sequences: Sequence #2, Sequence #3

Use GENSCAN and BLAT to answer the five questions above for sequences #2 and #3. Note that GENSCAN may give more than one gene prediction for a sequence. In that case, answer the five questions for each prediction.

Submit your answers by email to maxboeck@u.washington.edu
Please just type your answer in the body of the text (do not send word documents or attachments).