在分布式数据流中查找近期频繁项方法的研究 Finding Recently Frequent Item in Distributed Data Stream期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

在分布式数据流中查找近期频繁项方法的研究

引用本文：	任家东,李可,冯佳音,杨楠.在分布式数据流中查找近期频繁项方法的研究[J].计算机科学,2008,35(3):206-208.

作者姓名：	任家东李可冯佳音杨楠

作者单位：	1. 燕山大学信息科学与工程学院,秦皇岛,066004 2. 燕山大学电气工程学院,秦皇岛,066004

摘要：	传统的分布式数据流挖掘模型是一种挖掘结果中逐层进行的层次模型,通信带宽是一个瓶颈.为了减少分布式数据流结点的通信,本文采用一种基于数据密度的偏倚抽样方法对分布式数据流组中的每个流进行抽样,只维护抽样数据中最近期的元素.在频繁项挖掘过程中,设计了一种哈希计数方法(不同于传统哈希计数算法),可以同时对数据的计数进行增加和删减,计数的值是有一定误差保证的近似值,算法称为FFIDDS算法.实验结果证明,通信负担和处理时间均明显比传统HCS模型的算法优秀.
关键词：	分布式数据流频繁项算法
Finding Recently Frequent Item in Distributed Data Stream

REN Jia-Dong,LIKe,FENG Jia-Yin,YANG Nan.Finding Recently Frequent Item in Distributed Data Stream[J].Computer Science,2008,35(3):206-208.

Authors:	REN Jia-Dong LIKe FENG Jia-Yin YANG Nan

Affiliation:	REN Jia-Dong1 LI Ke1 FENG Jia-Yin1 YANG Nan2(College of Information Science , Engineering,Yanshan University,Qinhuangdao 066004)1(Institute of Electrical Engineering,Qinhuangdao 066004)2

Abstract:	Traditional method of mining frequent elements in distributed data stream tends to result in excessively communication within layers,and bandwidth is bottleneck. To minimize communication requirements,we propose a method of sampling from distributed data stream basing on data density. We mine frequent items in this data stream that are composed of sampled data. In the aggregated data stream,we only deal with the recent data. The proposed method counts the elements with hash-based approach and can handle bot...

Keywords:	Distributed data stream Frequent items Algorithm
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏