HMM for predicting protein secondary structures

In this project, you will implement the Baum-Welch Algorithm for the problem of protein secondary structure prediction.

Problems:
1. Design the HMM representation of protein secondary structure prediction, e.g. three states representing hexlices, sheets and loops; emissions are the on the 20 amino acide residues. 

2. Use maximum likelihood learning to train a HMM with the training data, called HMM_ml. Note that both amino acid sequence and secondary structure sequence should be used to train HMM_ml. 

3. Implement Baum-Welch algorithm for HMM training without using the given secondary structures in the training data. Learn a HMM with the test data, called HMM_bw.

4. Apply HMM_ml and HMM_bw to predict the secondary structure of the test data. Report your results by accuracy by the two HMMs. Explain your results.

Dataset:
A protein secondary structure dataset is provided here Download here. The training data is in protein-secondary-structure.train and the test data is in protein-secondary-structure.test.

References:
https://archive.ics.uci.edu/ml/datasets/Molecular+Biology+(Protein+Secondary+Structure) Links to an external site.