首页 | 本学科首页   官方微博 | 高级检索  
     

基于概率数据流的有效聚类算法
引用本文:戴东波,赵杠,孙圣力.基于概率数据流的有效聚类算法[J].软件学报,2009,20(5):1313-1328.
作者姓名:戴东波  赵杠  孙圣力
作者单位:复旦大学计算机与信息技术系,上海,200433
基金项目:Supported by the National Basic Research Program of China under Grant No.2005CB321905 (国家重点基础研究发展计划(973))
摘    要:提出一种在概率数据流上进行聚类的有效方法P-Stream.P-Stream针对数据流上的概率元组提出强簇、过渡簇和弱簇的概念,设计一种有效的在线候选簇选择策略,为每个不断到达的数据元组合理地找到可能归属的簇,并在每个检查点存储微簇快照,以便离线进一步高层聚类和演化分析.最后设计一个“积极”的二层聚类模型来判断现有的第1层聚类模型是否还适应数据流中最近到达的概率元组.实验采用KDD-CUP’98和KDD-CUP’99真实数据集以及变换高斯分布的人工数据集构造概率数据流.实验结果表明,P-Stream具有良好的聚类质量、较快的处理速度,能够有效地适应数据演化情况.

关 键 词:概率数据流  聚类  演化分析
收稿时间:2007/11/13 0:00:00
修稿时间:3/6/2008 12:00:00 AM

Effective Clustering Algorithm for Probabilistic Data Stream
DAI Dong-Bo,ZHAO Gang and SUN Sheng-Li.Effective Clustering Algorithm for Probabilistic Data Stream[J].Journal of Software,2009,20(5):1313-1328.
Authors:DAI Dong-Bo  ZHAO Gang and SUN Sheng-Li
Affiliation:Department of Computing and Information Technology;Fudan University;Shanghai 200433;China
Abstract:An effective clustering algorithm called "P-Stream" for probabilistic data stream is developed in this paper for the first time.For the uncertain tuples in the data stream,the concepts of strong cluster,transitional clusters and weak cluster are proposed in the P-Stream.With these concepts,an effective strategy of choosing candidate cluster is designed,which can find the sound cluster for every continuously arriving data point.Then,in order to further cluster on the high level and analyze the evolving behav...
Keywords:probabilistic data stream  clustering  evolving analysis
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号