首页 | 本学科首页   官方微博 | 高级检索  
     

基于计数的数据流频繁项挖掘算法
引用本文:祝然威,王鹏,刘马金. 基于计数的数据流频繁项挖掘算法[J]. 计算机研究与发展, 2011, 48(10)
作者姓名:祝然威  王鹏  刘马金
作者单位:复旦大学计算机科学技术学院 上海 201203
基金项目:高等学校博士学科点专项科研基金项目(20090071120092); IBMCRLUR基金项目(JSA201007005)
摘    要:挖掘数据流的频繁项已受到广泛关注,经典的频繁项挖掘算法尽管能够比较好地找到频繁项,但对频繁项频数的估计往往存在较大误差.SRoEC(segment rotative efficient count),SReEC(segment reserve efficient count)和RFreq(reserve frequent)算法针对该问题,继承基于计数的算法思想,将计数器进行划分并定义相应的操作,以期提高频数统计准确度并减小噪音影响.实验和数据分析表明,这些算法不仅能够保证频数超过阈值的数据项都能被找到,而且大大提高了频繁项频数统计的准确性.在同样空间代价下,算法无论在模拟数据集和真实数据集实验中,都表现出较高的频数准确率、较低的频数偏差率和较高的频数保有率,尤其是数据分布较平缓时,算法优势更加明显.

关 键 词:频繁项  Top-K  数据流  数据挖掘  频数估计  

Algorithm Based on Counting for Mining Frequent Items over Data Stream
Zhu Ranwei,Wang Peng,I Liu Majin. Algorithm Based on Counting for Mining Frequent Items over Data Stream[J]. Journal of Computer Research and Development, 2011, 48(10)
Authors:Zhu Ranwei  Wang Peng  I Liu Majin
Affiliation:Zhu Ranwei,Wang Peng,and Liu Majin(School of Computer Science,Fudan University,Shanghai 201203)
Abstract:Mining frequent items over data stream has drawn great attention,and large amount of efficient algorithms have been proposed by many researchers over the past decades.Although the classical algorithms are well suited to find frequent items,usually they do not perform well when estimating items' approximate frequency.To solve this problem,we introduce a series of counter-based algorithms called SRoEC(segment rotative efficient count),SReEC(segment reserve efficient count) and RFreq(reserve frequent).They div...
Keywords:words frequent item  Top-K  data stream  data mining  frequency estimation  
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号