Bioinformatics Approach to Classification of Four Classes of Organism in Relation to Their Optimal Growth Temperature
Hanaa M. Hussain1, Huseyin Seker 2,
And Malde Gorania 2
1. The Public Authority of Applied Education and Training, College of Technological Studies, Department of Electronics Engineering Technology, Shuwaikh, Kuwait
2. The University of Northumbria, Department of Computer Sciences and Digital Technology, Faculty of Engineering and Environment Newcastle, Newcastle Upon-Tyne, United Kingdom
2. The University of Northumbria, Department of Computer Sciences and Digital Technology, Faculty of Engineering and Environment Newcastle, Newcastle Upon-Tyne, United Kingdom
Abstract —Identifying the temperature class of proteins in prokaryotic organisms is one of the vital problems in enzyme and protein engineering. In this work, an efficient K-NN predictive models have been developed to discriminate hyperthermophilic, thermophilic, psychrophilic, and mesophilic proteins using Amino acid and Pseudo amino acid compositions. The two predictive models were built and tested with a large dataset consisting of 6631 hyperthermophiles, 11,700 thermophiles, 6267 psychrophiles, and 67,037 mesophiles. Implementation and analysis results showed that the proposed K-NN based predictive models were capable of discriminating the four classes efficiently and with high accuracies, whereby the Amino acid composition model achieved 94% accuracy when using 10-fold cross-validation, and 98% when using hold-out test. on the other hand, the Pseud amino acid composition based model achieved an accuracy of 99% using hold-out test.
Index Terms—amino acid composition, data mining, k-nearest neighbors, machine learning, optimal growth temperature, predictive model, proteins, proteomics, pseudo amino acid composition, thermostability
Cite: Hanaa M. Hussain, Huseyin Seker, and Malde Gorania, "Bioinformatics Approach to Classification of Four Classes of Organism in Relation to Their Optimal Growth Temperature," International Journal of Pharma Medicine and Biological Sciences, Vol. 7, No. 4, pp. 78-83, October 2018. doi: 10.18178/ijpmbs.7.4.78-83
Index Terms—amino acid composition, data mining, k-nearest neighbors, machine learning, optimal growth temperature, predictive model, proteins, proteomics, pseudo amino acid composition, thermostability
Cite: Hanaa M. Hussain, Huseyin Seker, and Malde Gorania, "Bioinformatics Approach to Classification of Four Classes of Organism in Relation to Their Optimal Growth Temperature," International Journal of Pharma Medicine and Biological Sciences, Vol. 7, No. 4, pp. 78-83, October 2018. doi: 10.18178/ijpmbs.7.4.78-83