首页 | 本学科首页   官方微博 | 高级检索  
     

面向海量数据流的基于密度的簇结构挖掘算法
引用本文:于彦伟,王欢,王沁,赵金东.面向海量数据流的基于密度的簇结构挖掘算法[J].软件学报,2015,26(5):1113-1128.
作者姓名:于彦伟  王欢  王沁  赵金东
作者单位:烟台大学 计算机与控制工程学院, 山东 烟台 264005,Department of Computer Science, University of California, San Diego, USA,北京科技大学 计算机与通信工程学院, 北京 100083,烟台大学 计算机与控制工程学院, 山东 烟台 264005
基金项目:国家自然科学基金(61403328, 61302065, 61172049); 山东省自然科学基金(ZR2013FM011); 山东省高等学校科技计划(J14LN24); 吉林大学符号计算与知识工程教育部重点实验室开放基金(93K172014K13)
摘    要:提出一种基于密度的簇结构挖掘算法(mining density-based clustering structure over data streams,简称MCluStream),以解决数据流密度聚类中输入参数选择困难和重叠簇识别等问题.首先,设计了一种树拓扑CR-Tree索引结构,将直接核心可达的一对数据点映射成树结构中的父子关系,蕴含了数据点依赖关系的CR-Tree涵盖了一系列subEps参数下的基于密度的簇结构;其次,MCluStream算法采用滑动窗口的方式更新CR-Tree,在线维护当前窗口上的簇结构,实现了对海量数据流的快速演化聚类分析;再次,设计了一种快速从CR-Tree提取簇结构的方法,根据可视化的簇结构,选择合理的聚类结果;最后,在真实和合成海量数据上的实验验证了MCluStream算法具有有效的挖掘效果、较高的聚类效率和较小的空间开销.MCluStream可适用于海量数据流应用中自适应的密度聚类演化 分析.

关 键 词:聚类分析  密度聚类  簇结构  数据流  滑动窗口
收稿时间:2014/5/16 0:00:00
修稿时间:2014/9/12 0:00:00

Density-Based Cluster Structure Mining Algorithm for High-Volume Data Streams
YU Yan-Wei,WANG Huan,WANG Qin and ZHAO Jin-Dong.Density-Based Cluster Structure Mining Algorithm for High-Volume Data Streams[J].Journal of Software,2015,26(5):1113-1128.
Authors:YU Yan-Wei  WANG Huan  WANG Qin and ZHAO Jin-Dong
Affiliation:School of Computer and Control Engineering, Yantai University, Yantai 264005, China,Department of Computer Science, University of California, San Diego, USA,School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China and School of Computer and Control Engineering, Yantai University, Yantai 264005, China
Abstract:This paper proposes a mining algorithm of density-based cluster-structure, named MCluStream, to resolve the problems of input parameter selection and overlapping cluster identification in evolving data stream. First, a tree topology index, named CR-Tree, is designed to map a pair of data points with directly core reachable into relationship of father and child node. The CR-Tree that record relationships among points represents cluster-structure under a series of subEps settings. Second, the online update of cluster-structure on CR-Tree is completed by MCluStream under sliding window environments, which effectively maintains clusters over massive evolving data streams. Third, a fast cluster-structure extraction method is implemented from the CR-Tree. Users can easily select reasonable clustering results according to the visualized cluster-structure. Finally, experimental evaluations on massive-scale real and synthetic data demonstrate the effective mining result and better performance of the proposed algorithm compared against state-of-the-art methods. MCluStream is desirable to be applied to self-adaptive density-based clustering over high-volume data streams.
Keywords:cluster analysis  density-based clustering  cluster structure  data stream  sliding window
本文献已被 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号