CSCI 5481 (001)
Homework 3: Phylogenetic tree and parsimony analysis
Skip To Content
Dashboard
  • Login
  • Dashboard
  • Calendar
  • Inbox
  • History
  • Help
Close
  • My Dashboard
  • CSCI 5481 (001)
  • Assignments
  • Homework 3: Phylogenetic tree and parsimony analysis
2021 Fall (08/10/2021-01/05/2022)
  • Home
  • Assignments
  • Media Gallery
  • Pages
  • Files
  • Syllabus
  • Modules
  • Collaborations
  • Gradescope
  • Library Course Materials
  • NameCoach
  • Student Rating of Teaching

Homework 3: Phylogenetic tree and parsimony analysis

  • Due Nov 14, 2021 by 11:59pm
  • Points 100
  • Submitting a file upload
  • Available until Nov 17, 2021 at 11:59pm
This assignment was locked Nov 17, 2021 at 11:59pm.

In this homework assignment, you will use a collection of small subunit ribosomal RNAs to infer a phylogenetic tree and the parsimony reconstruction of the ancestral sequences.  You will use the neighbor-joining algorithm and implement the Sankoff algorithm. 


Dataset:

  • Click here for the rRNA sequences (38 sequences total).

  • Use this scoring matrix Links to an external site. to implement problems 3.


Problems:

1. (20 points): Use Matlab function multialign, seqpdist, and seqneighjoin or the webtool Clustal Omega Links to an external site. to align the rRNA sequences and build a phylogenetic tree.

2. (60 points): Implement the Sankoff algorithm and test your algorithm with a few toy examples to demonstrate that your implementation is correct.

* As a toy example, if you got ['AUUCGUGAUU', 'AUUGAA-AUU', 'GUCCUCGGUU', 'GA-CACGAUC'], the model should be returned ['AUUCGUGAUU', 'AUUGAA-AUU', 'GUCCUCGGUU', 'GA-CACGAUC', 'AUUCAUGAUU', 'GUUCACGAUU', 'AUUCAUGAUU'].

Screen Shot 2021-11-05 at 13.19.39.png

3. (20 points): Use your Sankoff algorithm for a parsimony analysis with the multiple sequence alignment of the rRNA sequences using the tree structure inferred by the neighbor-joining algorithm.

* You can use BioPython for generating tree structure in this homework if you want.
* You are not allowed to use any existing implementations of the Sankoff algorithm.


Submission:

For this problem you should submit the following files:

  • Problem 1: Submit (1) a file showing the multiple sequence alignment results. (PDF), (2) a text file showing the pairwise distances (.txt), and (3) an image file of the tree. (PDF)
  • Problem 2: Submit (1) your source code of the Sankoff algorithm (.py or .m), (2) a source file calling the function to solve a toy example (.py or .m), and (3) a pdf file showing a toy example and its result in a format like page 83 of the phylogeny slides Download phylogeny slides.
  • Problem 3: Submit (1) your source file of applying the Sankoff algorithm in problem 2 for a parsimony analysis on the multiple sequence alignment (from problem 1) using the tree inferred in problem 1, and (2) a file reporting the total parsimony score and the inferred most likely sequence of each internal node as a multiple sequence alignment along with the given 38 rRNA sequences. (You can label the internal nodes in the tree and indicate a sequence corresponding to each label).
  • Please submit README.txt that states each problem containing which files with their inputs, and how to compile/run the scripts. (5 points will be deducted if no README.txt is submitted).

(Note: You may need to use dir Links to an external site. (Matlab function) to obtain a list of file names in a current folder so that you do not have to manually get every single rRNA sequence.)

1636955999 11/14/2021 11:59pm
Please include a description
Additional Comments:
Rating max score to > pts
Please include a rating title

Rubric

Find Rubric
Please include a title
Find a Rubric
Title
You've already rated students with this rubric. Any major changes could affect their assessment results.
 
 
 
 
 
 
 
     
Can't change a rubric once you've started using it.  
Title
Criteria Ratings Pts
This criterion is linked to a Learning Outcome Description of criterion
threshold: 5 pts
Edit criterion description Delete criterion row
5 to >0 pts Full Marks blank
0 to >0 pts No Marks blank_2
This area will be used by the assessor to leave comments related to this criterion.
pts
  / 5 pts
--
Additional Comments
This criterion is linked to a Learning Outcome Description of criterion
threshold: 5 pts
Edit criterion description Delete criterion row
5 to >0 pts Full Marks blank
0 to >0 pts No Marks blank_2
This area will be used by the assessor to leave comments related to this criterion.
pts
  / 5 pts
--
Additional Comments
Total Points: 5 out of 5