HMM for predicting protein secondary structures
In this project, you will implement the Baum-Welch Algorithm for the problem of protein secondary structure prediction.
Problems:
1. Design the HMM representation of protein secondary structure prediction, e.g. three states representing hexlices, sheets and loops; emissions are the on the 20 amino acide residues.
2. Use maximum likelihood learning to train a HMM with the training data, called HMM_ml. Note that both amino acid sequence and secondary structure sequence should be used to train HMM_ml.
3. Implement Baum-Welch algorithm for HMM training without using the given secondary structures in the training data. Learn a HMM with the test data, called HMM_bw.
4. Apply HMM_ml and HMM_bw to predict the secondary structure of the test data. Report your results by accuracy by the two HMMs. Explain your results.
Dataset:
A protein secondary structure dataset is provided here
Download here. The training data is in protein-secondary-structure.train and the test data is in protein-secondary-structure.test.