首页 | 本学科首页   官方微博 | 高级检索  
     

挖掘数据流任意滑动时间窗口内频繁模式
引用本文:李国徽,陈 辉. 挖掘数据流任意滑动时间窗口内频繁模式[J]. 软件学报, 2008, 19(10): 2585-2596
作者姓名:李国徽  陈 辉
作者单位:华中科技大学,计算机科学与技术学院,湖北,武汉,430074
基金项目:国家自然科学基金,国家高技术研究发展计划(863计划)
摘    要:由于数据流的流动性与连续性,数据流所蕴含的知识会随着时间的推移而发生变化.因此,在绝大多数数据流的应用中,用户往往对新产生的流数据所包含的知识要比对历史流数据所包含的知识感兴趣得多.提出了一种挖掘数据流任意大小滑动时间窗口内频繁模式的方法MSW(mining sliding window).当数据流流过时,该方法使用滑动窗口树SW-tree在单遍扫描流数据的条件下及时捕获数据流上最新的模式信息.同时,该方法还周期性地删除滑动窗口树上过期的及不频繁的模式分支,从而降低滑动窗口树的空间复杂度与维护代价.此外,该方法还应用时间衰减模型逐步降低历史事务模式支持数的权重,并由此来区分最近产生事务与历史事务的模式.大量仿真实验的结果表明,算法MSS具有较高的效率与优良的可扩展性,同时也优于其他同类算法.

关 键 词:数据流  频繁模式挖掘  滑动时间窗口  时间衰减模型
收稿时间:2007-11-08
修稿时间:2008-01-08

Mining the Frequent Patterns in an Arbitrary Sliding Window over Online Data Streams
LI Guo-Hui and CHEN Hui. Mining the Frequent Patterns in an Arbitrary Sliding Window over Online Data Streams[J]. Journal of Software, 2008, 19(10): 2585-2596
Authors:LI Guo-Hui and CHEN Hui
Abstract:Because of the fluidity and continuity of data stream,the knowledge embedded in stream data is most likely to be changed as time goes by.Thus,in most data stream applications,people are more interested in the information of the recent transactions than that of the old.This paper proposes a method for mining the frequent patterns in an arbitrary sliding window of data streams.As data stream flows,the contents of the data stream are captured with a compact prefix-tree by scanning the stream only once.And the obsolete and infrequent items are deleted by periodically pruning the tree.To differentiate the patterns of recently generated transactions from those of historic transactions,a time decaying model is also applied.Extensive simulations are conducted and the experimental results show that the proposed method is efficient and scalable,and also superior to other analogous algorithms.
Keywords:data stream  frequent pattern mining  sliding window  time decaying model
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号