首页 | 本学科首页   官方微博 | 高级检索  
     

基于关联函数的数据流聚类算法
引用本文:潘丽娜,王治和,党辉.基于关联函数的数据流聚类算法[J].计算机应用,2013,33(1):202-206.
作者姓名:潘丽娜  王治和  党辉
作者单位:西北师范大学 计算机科学与工程学院, 兰州 730070
基金项目:甘肃省科技支援计划项目(090GKCA075);2012年度教育部人文社会科学研究项目(12YJCZH282)
摘    要:传统数据流聚类算法大多基于距离或密度,聚类质量和处理效率都不高。针对以上问题,提出了一种基于关联函数的数据流聚类算法。首先,将数据点以物元的形式模型化,建立解决问题所需要的关联函数;其次,计算关联函数的值,以此值的大小来判断数据点属于某簇的程度;然后,将所提方法运用到数据流聚类的在线-离线框架中;最后,采用真实数据集KDD-CUP99和随机生成的人工数据集进行算法的测试。实验结果表明,所提方法的聚类纯度在92%以上,每秒能处理约6300条记录,与传统算法相比,处理效率有了较大的提高,在维度和簇数目方面的可扩展性较强,适用于处理大规模的动态数据集。

关 键 词:数据流  聚类  物元  关联函数  经典域  节域  
收稿时间:2012-07-24
修稿时间:2012-08-27

Data stream clustering algorithm based on dependent function
PAN Lina,WANG Zhihe,DANG Hui.Data stream clustering algorithm based on dependent function[J].journal of Computer Applications,2013,33(1):202-206.
Authors:PAN Lina  WANG Zhihe  DANG Hui
Affiliation:School of Computer Science and Engineering, Northwest Normal University, Lanzhou Gansu 730070, China
Abstract:The traditional data stream clustering algorithms are mostly based on distance or density, so their clustering quality and processing efficiency are weak. To address the above problems, this paper proposed a data stream clustering algorithm based on dependent function. Firstly, the data points were modeled in the form of matter-element and dependent function was established to solve the problem. Secondly, the value of the dependent function was calculated. According to this value, the degree that data point belongs to a certain cluster was judged. Then, the proposed method was applied to online-offline framework of the data stream clustering. Finally, the proposed algorithm was tested by using the real data set KDD-CUP99 and randomly generated artificial data sets. The experimental results show that clustering purity of the proposed method is over 92%, and it can deal with about 6300 records per second. Compared with the traditional algorithm, the processing efficiency of the algorithm is greatly improved. In the aspects of dimension and the number of cluster, the algorithm shows stronger scalability, and it is suitable for processing large dynamic data set.
Keywords:data stream  clustering  matter-element  dependent function  classical domain  joint domain  
本文献已被 CNKI 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号