首页 | 本学科首页   官方微博 | 高级检索  
     

分布式序列模式发现算法的研究
引用本文:邹翔,张巍,刘洋,蔡庆生.分布式序列模式发现算法的研究[J].软件学报,2005,16(7):1262-1269.
作者姓名:邹翔  张巍  刘洋  蔡庆生
作者单位:1. 公安部第三研究所,科研中心,上海,200031;中国科学技术大学,计算机科学系,安徽,合肥,230027
2. 中国科学技术大学,计算机科学系,安徽,合肥,230027
基金项目:Supported by the National Natural Science Foundation of China under Grant Nos.70171052, 90104030 (国家自然科学基金)
摘    要:提出算法FDMSP(fast distributed mining of sequential patterns),以解决分布式环境下的序列模式挖掘问题.首先对分布式环境下序列模式的性质进行了分析.算法采用前缀投影技术划分模式搜索空间,利用序列模式前缀指定选举站点统计序列的全局支持计数,利用局部约减、选举约减、计数约减等方法减少候选序列数,同时将算法分为3个子过程异步运行,使得算法具有较低的I/O开销、内存开销和通信开销,从而高效地生成全局序列模式.实验结果显示,在具有海量数据的局域网环境中,FDMSP算法的性能优于将数据集中后采用GSP算法68.5%~99.5%,并且FDMSP算法具有良好的可伸缩性.

关 键 词:数据挖掘  序列模式  分布式算法
收稿时间:2003/11/13 0:00:00
修稿时间:2/3/2005 12:00:00 AM

Study on Distributed Sequential Pattern Discovery Algorithm
ZOU Xiang,ZHANG Wei,LIU Yang and CAI Qing-Sheng.Study on Distributed Sequential Pattern Discovery Algorithm[J].Journal of Software,2005,16(7):1262-1269.
Authors:ZOU Xiang  ZHANG Wei  LIU Yang and CAI Qing-Sheng
Abstract:Algorithm FDMSP (fast distributed mining of sequential patterns) is proposed in order to deal with mining sequential patterns in distributed environment and its properties are analyzed. The algorithm utilizes prefix-projected technique to divide the pattern searching space, utilizes polling site associated with prefix to get a global support, and utilizes local pruning, poll pruning and count pruning to decrease candidate sequences. It is divided into three sub-procedures which run asynchronously. As a result, the algorithm has lower I/O cost, memory cost and communication cost, and global sequential patterns are generated with higher efficiency. The experiments show that it outperforms the algorithm GSP after centralizing data by 68.5% to 99.5% and scaleable over LAN with huge amount of data.
Keywords:data mining  sequential pattern  distributed algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号