首页 | 本学科首页   官方微博 | 高级检索  
     


Genetic Sequence Classification and its Application to Cross-Species Homology Detection
Authors:Snehasis Mukhopadhyay  Changhong Tang  Jeffery Huang and Mathew Palakal
Affiliation:(1) Department of Computer and Information Science, Indiana University, Purdue University Indianapolis, 723 W. Michigan St., SL280, Indianapolis, IN 46202, USA
Abstract:Although large-scale classification studies of genetic sequence data are in progress around the world, very few studies compare different classification approaches, e.g. unsupervised and supervised, in terms of objective criteria such as classification accuracy and computational complexity. In this paper, we study such criteria for both unsupervised and supervised classification of a relatively large sequence data set. The unsupervised approach involves use of different sequence alignment algorithms (e.g., Smith-Waterman, FASTA and BLAST) followed by clustering using the Maximin algorithm. The supervised approach uses a suitable numeric encoding (relative frequencies of tuples of nucleotides followed by principal component analysis) which is fed to a Multi-layer Backpropagation Neural Network. Classification experiments conducted on IBM-SP parallel computers show that FASTA with unsupervised Maximin leads to best trade-off between accuracy and speed among all methods, followed by supervised neural networks as the second best approach. Finally, the different classifiers are applied to the problem of cross-species homology detection.
Keywords:sequence classification  sequence alignment  artificial neural network  clustering  unsupervised and supervised learning  cross-species homology detection
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号