Homework 1: Anchored Global Sequence Alignment
- Due Oct 8, 2021 by 11:59pm
- Points 100
- Submitting a file upload
- Available until Oct 11, 2021 at 11:59pm
Our first homework assignment is implementation of an anchored version of the standard Needleman-Wunsch algorithm and application of the algorithm to align PAX and HOX proteins from human and fruit fly. The anchored global sequence alignment assumes known matched regions between two sequences and applies Needleman-Wunsch algorithm to align the unaligned regions between the matched ones. Implement the anchored global sequence alignment algorithm and align the given sequences.
Hint: There are two possible ways to implement this algorithm with very simple extension of standard Needleman-Wunsch algorithm. The first one is to implement a wrapper program on top of Needleman-Wunsch for subsequences between the anchored regions in the alignment. The other is to change the substitution scores for the matching in the anchored regions such that the optimal alignment always match the anchored regions. Either way is OK but think OK the advantages and disadvantages of the different strategies.)
Dataset:
Click here Download here for the sequences. In match.txt, the for columns are human_protein_start_pos human_protein_end_pos fly_protein_start_pos human_protein_end_pos.
Input and Output Format:
The command line for calling your program should be of the form: program_name seq1.fasta seq2.fasta [matches.txt]. Note that [matches.txt] means the third file is optional. If the matches.txt is not provided, your program should run standard Needleman-Wunsch. Output should be both the alignment score for this pair of sequences and the actual alignment itself printed with "-" as gaps.
Treat any special characters the same as the ones in alphabet, i.e. use the same match and mismatch costs.
Problems:
1. (25 points): Implement the Needleman-Wunsch algorithm (NWA.py or NWA.m) using fixed -3 for mismatches, 1 for a match, -2 for a gap.
2. (25 points): Implement another version of Needleman-Wunch algorithm (NWA_B62.py or NWA_B62.m) using BLOSUM62 scoring matrix and -5 for gap. The BLOSUM matrix is available for downloading at https://www.ncbi.nlm.nih.gov/Class/FieldGuide/BLOSUM62.txt Links to an external site..
3. (25 points) : Implement the anchored Needleman-Wunsch using BLOSUM62 matrix (NWA_anchor.py or NWA_anchor.m) and -5 for a gap as it is described above.
4. (25 points) : Use the three algorithms to align the provided two pairs of sequences. Report the alignment and the alignment score.
For this problem you should submit the following five files:
- Source file (your code)
- Readme file (text)
- Alignment results in a single file (text)