首页 | 本学科首页   官方微博 | 高级检索  
     


n-gram-based classification and unsupervised hierarchical clustering of genome sequences
Authors:Tomović Andrija  Janicić Predrag  Keselj Vlado
Affiliation:Friedrich Miescher Institute for Biomedical Research, Part of Novartis Research Foundation, Maulbeerstrasse 66, CH-4058 Basel, Switzerland.
Abstract:In this paper we address the problem of automated classification of isolates, i.e., the problem of determining the family of genomes to which a given genome belongs. Additionally, we address the problem of automated unsupervised hierarchical clustering of isolates according only to their statistical substring properties. For both of these problems we present novel algorithms based on nucleotide n-grams, with no required preprocessing steps such as sequence alignment. Results obtained experimentally are very positive and suggest that the proposed techniques can be successfully used in a variety of related problems. The reported experiments demonstrate better performance than some of the state-of-the-art methods. We report on a new distance measure between n-gram profiles, which shows superior performance compared to many other measures, including commonly used Euclidean distance.
Keywords:
本文献已被 PubMed 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号