A heuristic search algorithm for optimizing phylogenetic trees
In this project, you will implement a tree search algorithm for parsimony analysis and apply the algorithm to analyze population genetics data.
Problems:
1. Download the SNP data of Mitochondrial DNA in the HAPMAP project from
Each file named with "chrM" contains the SNPs of a certain population on Mitochondrial DNA.
2. Use all the SNPs on Mitochondrial DNA and apply the neighbor-joining algorithm to construct a phylogenetic tree of all the individuals from the populations. Note that you can compute the evolutionary distance between each pair of sequences based on the SNPs. You need to derive a function to calculate the distance. Then the distance matrix can be used with neighbor-joining algorithm.
3. Apply your Sankoff algorithm implemented in HW4 on the phylogenetic tree for parsimony analysis of all the SNPs on Mitochondrial DNA .
4. Implement some tree search strategies discussed in Chapter 8 such as Nearest Neighbor Interchange to improve the tree generated by neighbor-joining algorithm.
5. Discuss the results of your new tree. Is it better than the tree constructed by neighbor-joining? Think about how to revise your tree branch interchange algorithm. The sample list is available from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130606_sample_info/20130606_sample_info.xlsx.
References:
A global reference for human genetic variation Links to an external site., The 1000 Genomes Project Consortium Links to an external site., Nature 526, 68–74 (01 October 2015) doi:10.1038/nature15393
1000 Genome projects: https://www.internationalgenome.org/ Links to an external site.