Exercise II
- Due Oct 21, 2021 by 11:59pm
- Points 100
- Submitting a file upload
- Available until Oct 24, 2021 at 11:59pm
Exercise II: Genome Annotation by Gene Finding
In this exercise, we will perform ORF (Open Reading Frame) detection for annotating a bacteria genome, and compare your findings with EasyGene.
Individual work on programming
1. Retrieve a bacteria genome your group is interested in studying from here
Links to an external site.. Choose Prokaryotes, then open filters and choose Bacteria in "Kingdom" and "complete" in Assembly level as shown in the figure below.
The FASTA file of the genome can be downloaded from the "assembly" column and the ground-truth annotation is also provided in the "CDS" column as shown below.
2. Write a program to find all the start-codon and stop-codon and report all the ORFs longer than k (try multiple ks for the best result). For example, you can try k = 60, 100, 200. Compare your predictions with the ground-truth to make sure your program is working properly.
3. Submit your code and the output in this submission.
4. Complete the group work and the video summary.
Grading for class participation:
This is only an exercise rather than homework assignment. Your goal is to learn about the concept of ORFs and genome annotations, how to use EasyGene to annotate procaryote genomes and how to analyze your predictions in this class participation.
1. TA will read your submissions to confirm your participation and the quality of the video.
2. As long as they are reasonably completed, you will get all the points. You will not receive any feedback unless something significant is wrong/missing.