CSCI 5481 (001)
Homework 1: Anchored Global Sequence Alignment
Skip To Content
Dashboard
  • Login
  • Dashboard
  • Calendar
  • Inbox
  • History
  • Help
Close
  • My Dashboard
  • CSCI 5481 (001)
  • Assignments
  • Homework 1: Anchored Global Sequence Alignment
2021 Fall (08/10/2021-01/05/2022)
  • Home
  • Assignments
  • Media Gallery
  • Pages
  • Files
  • Syllabus
  • Modules
  • Collaborations
  • Gradescope
  • Library Course Materials
  • NameCoach
  • Student Rating of Teaching
  • Course Admin Tools

Homework 1: Anchored Global Sequence Alignment

  • Due Oct 8, 2021 by 11:59pm
  • Points 100
  • Submitting a file upload
  • Available until Oct 11, 2021 at 11:59pm
This assignment was locked Oct 11, 2021 at 11:59pm.

Our first homework assignment is implementation of an anchored version of the standard Needleman-Wunsch algorithm and application of the algorithm to align PAX and HOX proteins from human and fruit fly. The anchored global sequence alignment assumes known matched regions between two sequences and applies Needleman-Wunsch algorithm to align the unaligned regions between the matched ones. Implement the anchored global sequence alignment algorithm and align the given sequences.

Hint: There are two possible ways to implement this algorithm with very simple extension of standard Needleman-Wunsch algorithm. The first one is to implement a wrapper program on top of Needleman-Wunsch for subsequences between the anchored regions in the alignment. The other is to change the substitution scores for the matching in the anchored regions such that the optimal alignment always match the anchored regions. Either way is OK but think OK the advantages and disadvantages of the different strategies.)


Dataset:

Click here Download here for the sequences. In match.txt, the for columns are human_protein_start_pos human_protein_end_pos fly_protein_start_pos human_protein_end_pos.

Input and Output Format:

The command line for calling your program should be of the form: program_name seq1.fasta seq2.fasta [matches.txt]. Note that [matches.txt] means the third file is optional. If the matches.txt is not provided, your program should run standard Needleman-Wunsch. Output should be both the alignment score for this pair of sequences and the actual alignment itself printed with "-" as gaps.

Treat any special characters the same as the ones in alphabet, i.e. use the same match and mismatch costs.


Problems:

1. (25 points): Implement the Needleman-Wunsch algorithm (NWA.py or NWA.m) using fixed -3 for mismatches, 1 for a match, -2 for a gap.

2. (25 points): Implement another version of Needleman-Wunch algorithm (NWA_B62.py or NWA_B62.m) using BLOSUM62 scoring matrix and -5 for gap. The BLOSUM matrix is available for downloading at https://www.ncbi.nlm.nih.gov/Class/FieldGuide/BLOSUM62.txt Links to an external site..

3. (25 points) : Implement the anchored Needleman-Wunsch using BLOSUM62 matrix (NWA_anchor.py or NWA_anchor.m) and -5 for a gap as it is described above.

4. (25 points) : Use the three algorithms to align the provided two pairs of sequences. Report the alignment and the alignment score.


For this problem you should submit the following five files:

  • Source file (your code)
  • Readme file (text)
  • Alignment results in a single file (text)
1633755599 10/08/2021 11:59pm
Please include a description
Additional Comments:
Rating max score to > pts
Please include a rating title

Rubric

Find Rubric
Please include a title
Find a Rubric
Title
You've already rated students with this rubric. Any major changes could affect their assessment results.
 
 
 
 
 
 
 
     
Can't change a rubric once you've started using it.  
Title
Criteria Ratings Pts
This criterion is linked to a Learning Outcome Description of criterion
threshold: 5 pts
Edit criterion description Delete criterion row
5 to >0 pts Full Marks blank
0 to >0 pts No Marks blank_2
This area will be used by the assessor to leave comments related to this criterion.
pts
  / 5 pts
--
Additional Comments
This criterion is linked to a Learning Outcome Description of criterion
threshold: 5 pts
Edit criterion description Delete criterion row
5 to >0 pts Full Marks blank
0 to >0 pts No Marks blank_2
This area will be used by the assessor to leave comments related to this criterion.
pts
  / 5 pts
--
Additional Comments
Total Points: 5 out of 5