Today's exercise involves gene prediction We will use Chris Burge's GENSCAN to search for
gene sequence among three short human chromosomal sequences. Also, we will use Jim Kent's BLAT to
map our sequences back to the UCSC Genome Browser. The first sequence will be analyzed in class. The remaining two sequences are assigned as
homework.

Let's take a quick look at the GENSCAN interface.
- Organism allows you to select specific rules for searching for genes. There are not many options. For our purposes, we will always use
Vertebrate
- Suboptimal exon cutoff allows you to relax the constraints that determine where exons exist. For our purposes, we will always use 1.00
- Sequence name allows you to put a title on your sequence of interest. You may leave this blank.
- Print options selects how detailed you want your results to be. For this quiz section, select Predicted CDS and peptides
- You can choose to upload your sequence of interest from some location on your computer. This will not be necessary for today's quiz section.
All sequences can be cut-and-pasted from links on this webpage.
- The large white box is where you can paste in your sequence of interest.
- You can have your results mailed to you by adding an email address in the last white box. This is not necessary to do the assignment.
- The two buttons at the bottom are where you start your search, or clear the fields to try a new search.

The results of GENSCAN are posted in a table marking several useful features of genes that it found. Above is a key describing the elements
GENSCAN can find. This table is also output below your results whenever you do a search for your reference.

You can also visualize any genes found in your sequence by clicking on the following links from your results page. There are two options you can
choose, depending on which software you have installed on your computer.

The BLAT program can quickly look for a sequence in the human genome and return the genomic regions with high similarity to your query sequence.
Follow the link above to use BLAT, or go to the UCSC genome browser home page, select BLAT. The tool looks like the picture above. Select human
genome, paste your desired sequence and click submit.
One last point: To answer the homework questions, you must get comfortable using tables in the UCSC Genome Browser. Once you have isolated a
cds with BLAT, you can click on the "Tables" link at the top of the page. This allows you to quickly look-up the exon boundaries. However, the tool
is not intuitive, so make sure you use the following setting, indicated in red boxes:

Your results will look like the image below. You can click on the image for a larger view. If you want the UCSC Genome Browser to focus on a particular
exon, simply get the information from the regions that are underlined. For example, to get the first exon, search for "chr11:5203271-5203532"
In class exercise
Sequence #1
The above link leads to a short sequence of chromosomal data that you will search with GENSCAN.
Follow these instructions, then answer the following questions:
Step 1: Run Genscan. Go to the Genscan web site, submit your sequence, be sure to check "Predicted CDS and peptides" so
that you can get the predicted cds, and wait for the result.
Step 2: In another window, display the genomic regions on UCSC genome browser. You can display the predicted cds on the UCSC
genome browser BLAT tool. Copy the prediced cds, open the BLAT tool, and submit your sequence. You will be brought to the actual
display window of the genome browser where your sequence from
BLAT search is displayed together with several other annotation tracks. You can play around the browser as it integrates a
lot of information. Click on your sequence from BLAT search track to see the actual base-by-base display. Make sure you can
recognize the signals (Start codon, Stop codon, splicing signals, etc.)
Step 3: Compare your predictions against the annotation for known genes and predictions by other gene-finding algorithms. Compare
your predictions against the known genes track. If the known genes track is not shown, you can display it by using the drop
down controls under the browser window. Analyze prediction result for each sequence.
Answer the following questions:
1. In GENSCAN, how many exons are predicted?
2. Does the predicted gene match a known gene in the UCSC browser?
3. If you answered "yes" to #2, then compare the GENSCAN prediction to the UCSC reference sequence; How many exons are missed?
4. GENSCAN gives a probability score for each predicted exon. Are the probability values for exons predicted by GENSCAN informative?
5. Of the exons shared by GENSCAN and UCSC Genome browser, what is the accuracy at the nucleotide level - are all the predicted exons the
same size as the UCSC reference exons? If not, state which exons are different. You can go to the browser home page, select Table Browser
and select the knownGene table to extract the exact exon locations for the known genes.
Homework, Due Monday 5pm
homeworkkey
Here are two additional short human chromosomal sequences:
Sequence #2,
Sequence #3
Use GENSCAN and BLAT to answer the five questions above for sequences #2 and #3. Note that GENSCAN may give more than one gene prediction
for a sequence. In that case, answer the five questions for each prediction.
Submit your answers by email to maxboeck@u.washington.edu
Please just type your answer in the body of the text (do not send word documents or attachments).