首页 | 本学科首页   官方微博 | 高级检索  
     

在线挖掘数据流闭频繁项集的高效算法
引用本文:毛伊敏,陈志刚.在线挖掘数据流闭频繁项集的高效算法[J].计算机科学,2013,40(2):229-234.
作者姓名:毛伊敏  陈志刚
作者单位:(江西理工大学应科院 赣州 341000) (中南大学信息学院 长沙 410083)
摘    要:数据流闭频繁项集挖掘算法得到了广泛的研究,其中一个典型的工作就是NewMomen、算法。针对New- Moment算法存在搜索空间大而造成算法时间效率低的问题,提出了一种改进的数据流闭频繁项集挖掘算法A-Ncw- Moment。它设计了一个二进制位表示项目与扩展的频繁项目列表相结合的数据结构,来记录数据流信息及闭频繁项 集。在窗体初始阶段,首先挖掘频繁1一项集所产生的支持度为最大的最长闭频繁项集,接着提出新的“不需扩展策略” 和“向下扩展策略”来避免生成大量中间结果,快速发现其余闭频繁项集,达到极大缩小搜索空间的目的。在窗体滑动 阶段,提出“动态不频繁剪枝策略”来从已生成的闭频繁项集中快速删除非闭频繁项集,并提出“动态不搜索策略”来动 态维护所有闭频繁项集的生成,以降低闭频繁项集的维护代价,提高算法的效率。理论分析与实验结果表明,A-New- Moment算法具有较好的性能。

关 键 词:数据挖掘,数据流,频繁项集,闭频繁项集

Efficient Algorithm for Online Mining Closed Frequent Itemsets over Data Streams
Abstract:Mining closed frequent itemsets from data streams has been extensively studied, in which NewMoment is re- garded as a typical algorithm. However, there is the problem of its big search space causing bad performance in time in NewMoment, This paper presented an algorithm called A-NewMoment which ameliorates NewMoment to mine closed frequent itemsets. Firstly, it designed a combinative data structure which uses an effective bit victor to represent items and an extended frequent item list to record the current closed frequent information in streams. Secondly, the new pru- ning strategics called WSS and CSS were proposed to avoid a large number of intermediate results generated, so the search space is reduced greatly. Finally, the pruning strategy called DNFIPS was also proposed to delete no closed fre- qucnt itemsets from HTC. At the same time, it also designed a novel strategy called DHSS to efficiently and dynamically maintain these operations that all closed frequent itemsedts are added and deleted. Theoretical analysis and experimental results show that the proposed method is efficient.
Keywords:Data mining  Data strcams  Frcqucnt itcmscts  Closcd frequent item
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号