首页 | 本学科首页   官方微博 | 高级检索  
     


Margin-based ensemble classifier for protein fold recognition
Authors:Tao Yang  Vojislav Kecman  Longbing Cao  Chengqi Zhang  Joshua Zhexue Huang
Affiliation:1. Department of Psychosis Studies, Institute of Psychiatry, Psychology & Neuroscience, King''s College London, London, UK;2. Department of Psychiatry, School of Medicine and Medical Sciences, University College Dublin, St Vincent''s Hospital, Dublin, Ireland;3. MRC Social, Genetic & Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King''s College London, London, UK;4. National Institute for Health Research (NIHR) Mental Health Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King''s College London, UK;5. Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, UK;6. Data Science & Soft Computing Lab, and Department of Computing, Goldsmiths College, University of London, London, UK;1. College of Computer Science & Technology, Nanjing University of Aeronautics and Astronautics, China;2. Collaborative Innovation Center of Novel Software Technology and Industrialization, China;3. School of Computer & Software, Nanjing University of Information Science & Technology, China;4. College of Technological Innovation, Zayed University, United Arab Emirates;1. Imagerie et Stratégies Thérapeutiques de la Schizophrénie (ISTS), Normandie Univ, UNICAEN, Faculté de médecine, Caen F-14000, France;2. Normandie Univ, UNICAEN, CNRS UMS 3408, GIP Cyceron, Caen F-14000, France;3. CHU de Caen, Service d''Explorations Fonctionnelles du Système Nerveux, Caen F-14000, France;4. CHU de Caen, Service de psychiatrie, Centre Esquirol, Caen F-14000, France
Abstract:Recognition of protein folding patterns is an important step in protein structure and function predictions. Traditional sequence similarity-based approach fails to yield convincing predictions when proteins have low sequence identities, while the taxonometric approach is a reliable alternative. From a pattern recognition perspective, protein fold recognition involves a large number of classes with only a small number of training samples, and multiple heterogeneous feature groups derived from different propensities of amino acids. This raises the need for a classification method that is able to handle the data complexity with a high prediction accuracy for practical applications. To this end, a novel ensemble classifier, called MarFold, is proposed in this paper which combines three margin-based classifiers for protein fold recognition.The effectiveness of our method is demonstrated with the benchmark D-B dataset with 27 classes. The overall prediction accuracy obtained by MarFold is 71.7%, which surpasses the existing fold recognition methods by 3.1–15.7%. Moreover, one component classifier for MarFold, called ALH, has obtained a prediction accuracy of 65.5%, which is 4.7–9.5% higher than the prediction accuracies for the published methods using single classifiers. Additionally, the feature set of pairwise frequency information about the amino acids, which is adopted by MarFold, is found to be important for discriminating folding patterns. These results imply that the MarFold method and its operation engine ALH might become useful vehicles for protein fold recognition, as well as other bioinformatics tasks. The MarFold method and the datasets can be obtained from: (http://www-staff.it.uts.edu.au/~lbcao/publication/MarFold.7z).
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号