Oversampling Negative Class Improves Contact Map Prediction
Grzegorz Markowski, Krzysztof Grabczewski, and Rafal Adamczak
Department of Informatics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland
Abstract—In this paper we present a contact map predictor that has been trained using unbalanced training. The training set has been built based on typical, for this problem, feature space: predicted solvent accessibilities and predicted secondary structures. To show that oversampling negative class improves prediction accuracy we have built two predictors that are based on neural networks and decision trees, respectively. The influence of the size of the non-contact class in the training set has been analyzed. We have observed that significantly better results are obtained when the size of the non-contact class is at least 4 times larger than contact class, while the optimal oversampling depends on the type of contacts and learning algorithm used. Our final predictor - PLCT – took part in CASP11 where in one of the category took 3th place. PLCT is available at http://promap.is.umk.pl/.
Index Terms—neural networks, decision trees, contact maps, contact maps prediction
Cite: Grzegorz Markowski, Krzysztof Grabczewski, and Rafal Adamczak, "Oversampling Negative Class Improves Contact Map Prediction," International Journal of Pharma Medicine and Biological Sciences, Vol. 5, No. 4, pp. 211-216, Octorber 2016. doi: 10.18178/ijpmbs.5.4.211-216
Cite: Grzegorz Markowski, Krzysztof Grabczewski, and Rafal Adamczak, "Oversampling Negative Class Improves Contact Map Prediction," International Journal of Pharma Medicine and Biological Sciences, Vol. 5, No. 4, pp. 211-216, Octorber 2016. doi: 10.18178/ijpmbs.5.4.211-216