首页 | 本学科首页   官方微博 | 高级检索  
     

面向位置大数据的快速密度聚类算法
引用本文:于彦伟,贾召飞,曹磊,赵金东,刘兆伟,刘惊雷.面向位置大数据的快速密度聚类算法[J].软件学报,2018,29(8):2470-2484.
作者姓名:于彦伟  贾召飞  曹磊  赵金东  刘兆伟  刘惊雷
作者单位:烟台大学计算机与控制工程学院, 山东烟台 264005,烟台大学计算机与控制工程学院, 山东烟台 264005,麻省理工学院计算机科学与人工智能实验室, 剑桥马萨诸塞州 02139,烟台大学计算机与控制工程学院, 山东烟台 264005,烟台大学计算机与控制工程学院, 山东烟台 264005,烟台大学计算机与控制工程学院, 山东烟台 264005
基金项目:Supported by the National Natural Science Foundation of China under Grant Nos.61403328,61572419,61502410(国家自然科学基金);the Key Research&Development Project of Shandong Province under Grant No.2015GSF115009(山东省重点研发计划项目),Shandong Provincial Natural Science Foundation under Grant Nos.ZR2013FM011,ZR2013FQ023,ZR2014FQ016(山东省自然科学基金项目).
摘    要:本文面向位置大数据聚类,提出了一种简单但高效的快速密度聚类算法CBSCAN,以快速发现位置大数据中任意形状的聚类簇模式和噪声.首先,定义了Cell网格概念,并提出了基于Cell的距离分析理论,利用该距离分析,无需距离计算,可快速确定高密度区域的核心点和密度相连关系;其次,给出了网格簇定义,将基于位置点的密度簇映射成基于网格的密度簇,利用排他网格与相邻网格的密度关系,可快速确定网格簇的包含网格;第三,利用基于Cell的距离分析理论和网格簇概念,实现了一个快速密度聚类算法,将DBSCAN基于数据点的密度扩展聚类转换成基于Cell的密度扩展聚类,大大减少高密度区域的距离计算,利用位置数据的内在特性提高了聚类效率;最后,在基准测试数据上验证了所提算法的聚类效果,在位置大数据上的实验结果统计显示,相比DBSCAN、PR-Tree索引和Grid索引优化的DBSCAN,CBSCAN分别平均提升了525倍、30倍和11倍效率.

关 键 词:聚类分析  密度聚类  位置大数据  Cell网格  网格簇
收稿时间:2016/9/3 0:00:00
修稿时间:2016/10/3 0:00:00

Fast Density-Based Clustering Algorithm for Location Big Data
YU Yan-Wei,JIA Zhao-Fei,CAO Lei,ZHAO Jin-Dong,LIU Zhao-Wei and LIU Jing-Lei.Fast Density-Based Clustering Algorithm for Location Big Data[J].Journal of Software,2018,29(8):2470-2484.
Authors:YU Yan-Wei  JIA Zhao-Fei  CAO Lei  ZHAO Jin-Dong  LIU Zhao-Wei and LIU Jing-Lei
Affiliation:School of Computer and Control Engineering, Yantai University, Yantai Shandong 264005, China,School of Computer and Control Engineering, Yantai University, Yantai Shandong 264005, China,CSAIL, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA,School of Computer and Control Engineering, Yantai University, Yantai Shandong 264005, China,School of Computer and Control Engineering, Yantai University, Yantai Shandong 264005, China and School of Computer and Control Engineering, Yantai University, Yantai Shandong 264005, China
Abstract:This paper proposes a simple but efficient density-based clustering, named CBSCAN, to fastdiscover cluster patterns with arbitrary shapes and noises from location big data effectively. Firstly, we define the notion of Cell and propose distance analysis principle based on Cell, which can quickly find core points in high density areas and density relationships with other points without distance computing. Secondly, we propose the Cell-based cluster that maps point-based density cluster to grid-based density cluster. By leveragingexclusion grids and relationships with their adjacent grids, we can fast determine all inclusion grids of Cell-based cluster. Furthermore, we implement a fast density-based algorithm based onthe distance analysis principle and Cell-base cluster, which transforms DBSCAN of point-based expansion to Cell-based expansion clustering.The proposed algorithm improves clustering efficiencysignificantly by using inherent property of location data to reduce huge number of distance calculations. Finally, our comprehensive experiments on benchmark datasets demonstrate the clustering effectiveness of the proposed algorithm.Experimental results onmassive-scale real and synthetic location datasets show that CBSCAN improves 525 fold, 30 fold and 11 fold of efficiencycompared with DBSCAN, DBSCAN with PR-Tree and Grid index optimization respectively.
Keywords:clustering analysis  density-based  location big data  Cell grid  cell-based cluster
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号