首页 | 本学科首页   官方微博 | 高级检索  
     


CONTOUR: an efficient algorithm for discovering discriminating subsequences
Authors:Jianyong Wang  Yuzhou Zhang  Lizhu Zhou  George Karypis  Charu C. Aggarwal
Affiliation:(1) Tsinghua University, Beijing, 100084, China;(2) University of Minnesota, Minneapolis, MN 55455, USA;(3) IBM T.J. Watson Research Center, Hawthorne, NY 10532, USA
Abstract:In recent years we have witnessed several applications of frequent sequence mining, such as feature selection for protein sequence classification and mining block correlations in storage systems. In typical applications such as clustering, it is not the complete set but only a subset of discriminating frequent subsequences which is of interest. One approach to discovering the subset of useful frequent subsequences is to apply any existing frequent sequence mining algorithm to find the complete set of frequent subsequences. Then, a subset of interesting subsequences can be further identified. Unfortunately, it is very time consuming to mine the complete set of frequent subsequences for large sequence databases. In this paper, we propose a new algorithm, CONTOUR, which efficiently mines a subset of high-quality subsequences directly in order to cluster the input sequences. We mainly focus on how to design some effective search space pruning methods to accelerate the mining process and discuss how to construct an accurate clustering algorithm based on the result of CONTOUR. We conducted an extensive performance study to evaluate the efficiency and scalability of CONTOUR, and the accuracy of the frequent subsequence-based clustering algorithm.
Keywords:Sequence mining  Discriminating subsequence  Summarization subsequence  Clustering
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号