首页 | 本学科首页   官方微博 | 高级检索  
     

基于谱聚类的多数据流演化事件挖掘
引用本文:杨宁,唐常杰,王悦,陈瑜,郑皎凌.基于谱聚类的多数据流演化事件挖掘[J].软件学报,2010,21(10):2395-2409.
作者姓名:杨宁  唐常杰  王悦  陈瑜  郑皎凌
作者单位:四川大学,计算机学院,四川,成都,610065
基金项目:Supported by the National Natural Science Foundation of China under Grant No.600773169 (国家自然科学基金); the 11th Five Years Key Programs for Science & Technology Development of China under Grant No.2006BAI05A01 (国家“十一?五”科技支撑计划)
摘    要:为解决从多数据流挖掘演化事件这一难题,提出了一种多数据流上的谱聚类算法SCAM(spectral clustering algorithm of multi-streams),其相似矩阵基于耦合度构造,而耦合度衡量了两个数据流的动态相似性.提出了算法EEMA(evolutionary events mining algorithm),该算法基于聚类模型的演变挖掘多数据流的演化事件.定义了聚类模型凝聚度,用以衡量聚类的紧凑程度,并证明了凝聚度的上界.基于到上界的距离和规范化相似矩阵的特征间隙,定义了聚类模型质量,并作为EEMA的优化目标自动地确定聚簇数k.设计了O-EEMA作为EEMA的优化实现,其时间复杂度为O(cn2/2).在合成和真实数据集上的实验结果表明,EEMA和O-EEMA是有效的、可行的.

关 键 词:多数据流  耦合聚类  演化事件  矩阵扰动
收稿时间:2009/4/22 0:00:00
修稿时间:2009/10/10 0:00:00

Mining Evolutionary Events from Multi-Streams Based on Spectral Clustering
YANG Ning,TANG Chang-Jie,WANG Yue,CHEN Yu and ZHENG Jiao-Ling.Mining Evolutionary Events from Multi-Streams Based on Spectral Clustering[J].Journal of Software,2010,21(10):2395-2409.
Authors:YANG Ning  TANG Chang-Jie  WANG Yue  CHEN Yu and ZHENG Jiao-Ling
Abstract:To solve the problem of mining evolutionary events from multi-streams, this paper proposes a spectral clustering algorithm, SCAM (spectral clustering algorithm of multi-streams), to generate the clustering models of Multi-Streams. The similarity matrix in the clustering models of Multi-Streams are based on Coupling Degree, which measures the dynamic similarity between two streams. In addition, this paper also proposes an algorithm, EEMA (evolutionary events mining algorithm), to discover the evolutionary event points based on the drift of clustering models. EEMA takes the index of Clustering Model Quality as the optimization objective in determing the number of clusters automatically. The Clustering Model Quality combines the matrix perturbation theory and the Clustering Cohesion, which has a sound upper bound and is used to measure the compactness of a clustering model. Finally, this paper presents O-EEMA (optimized-EEMA) as the optimization of EEMA with the temporal complexity of O(cn2/2), and the results of extensive experiments on the synthetic and real data set show that EEMA and O-EEMA are effective and practicable.
Keywords:multi-streams  spectral clustering  evolutionary event  matrix perturbation
本文献已被 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号