Genetic Sequence Classification and its Application to Cross-Species Homology Detection期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Genetic Sequence Classification and its Application to Cross-Species Homology Detection

Authors:	Snehasis Mukhopadhyay Changhong Tang Jeffery Huang Mathew Palakal

Affiliation:	(1) Department of Computer and Information Science, Indiana University, Purdue University Indianapolis, 723 W. Michigan St., SL280, Indianapolis, IN 46202, USA

Abstract:	Although large-scale classification studies of genetic sequence data are in progress around the world, very few studies compare different classification approaches, e.g. unsupervised and supervised, in terms of objective criteria such as classification accuracy and computational complexity. In this paper, we study such criteria for both unsupervised and supervised classification of a relatively large sequence data set. The unsupervised approach involves use of different sequence alignment algorithms (e.g., Smith-Waterman, FASTA and BLAST) followed by clustering using the Maximin algorithm. The supervised approach uses a suitable numeric encoding (relative frequencies of tuples of nucleotides followed by principal component analysis) which is fed to a Multi-layer Backpropagation Neural Network. Classification experiments conducted on IBM-SP parallel computers show that FASTA with unsupervised Maximin leads to best trade-off between accuracy and speed among all methods, followed by supervised neural networks as the second best approach. Finally, the different classifiers are applied to the problem of cross-species homology detection.

Keywords:	sequence classification sequence alignment artificial neural network clustering unsupervised and supervised learning cross-species homology detection
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏