首页 | 本学科首页   官方微博 | 高级检索  
     

带通配符和One-Off条件的序列模式挖掘
引用本文:吴信东,谢飞,黄咏明,胡学钢,高隽.带通配符和One-Off条件的序列模式挖掘[J].软件学报,2013,24(8):1804-1815.
作者姓名:吴信东  谢飞  黄咏明  胡学钢  高隽
作者单位:1. 合肥工业大学 计算机与信息学院,安徽 合肥 230009; Department of Computer Science,University of Vermont,Burlington,VT 05405,USA
2. 合肥师范学院 计算机科学与技术系,安徽 合肥,230601
3. 合肥工业大学 计算机与信息学院,安徽 合肥,230009
基金项目:国家自然科学基金,美国国家科学基金,国家高技术研究发展计划(863),国家重点基础研究发展计划(973)
摘    要:很多应用领域产生大量的序列数据。如何从这些序列数据中挖掘具有重要价值的模式,已成为序列模式挖掘研究的主要任务。研究这样一个问题:给定序列S、支持度阈值和间隔约束,从序列S中挖掘所有出现次数不小于给定支持度阈值的频繁序列模式,并且要求模式中任意两个相邻元素在序列中的出现位置满足用户定义的间隔约束。设计了一种有效的带有通配符的模式挖掘算法One-Off Mining,模式在序列中的出现满足One-Off条件,即模式的任意两次出现都不共享序列中同一位置的字符。在生物DNA序列上的实验结果表明,One-Off Mining比相关的序列模式挖掘算法具有更好的时间性能和完备性。

关 键 词:数据挖掘  序列模式挖掘  频繁模式  通配符  One-Off条件
收稿时间:8/5/2011 12:00:00 AM
修稿时间:2012/9/12 0:00:00

Mining Sequential Patterns with Wildcards and the One-Off Condition
WU Xin-Dong,XIE Fei,HUANG Yong-Ming,HU Xue-Gang and GAO Jun.Mining Sequential Patterns with Wildcards and the One-Off Condition[J].Journal of Software,2013,24(8):1804-1815.
Authors:WU Xin-Dong  XIE Fei  HUANG Yong-Ming  HU Xue-Gang and GAO Jun
Affiliation:School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China;Department of Computer Science, University of Vermont, Burlington, VT 05405, USA;Department of Computer Science and Technology, Hefei Normal University, Hefei 230601, China;School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China;School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China;School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China
Abstract:There is a huge wealth of sequence data available in real-world applications. The task of sequential pattern mining serves to mine important patterns from the sequence data. Given a sequence S, a certain threshold, and gap constraints, this paper aims to discover frequent patterns whose supports in S are no less than the given threshold value. There are flexible wildcards in pattern P, and the number of the wildcards between any two successive elements of P fulfills the user-specified gap constraints. The study designs an efficient mining algorithm: One-Off Mining, whose mining process satisfies the One-Off condition under which each character in the given sequence can be used at most once in all occurrences of a pattern. Experiments on DNA sequences show that this method performs better in time and completeness than the related sequential pattern mining algorithms.
Keywords:data mining  sequential pattern mining  frequent pattern  wildcard  One-Off condition
本文献已被 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号