基于Spark的点排序识别聚类结构算法 Algorithm for Ordering Points to Identify Clustering Structure Based on Spark期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Spark的点排序识别聚类结构算法

引用本文：	瞿原,邓维斌,胡峰,张其龙,王鸿.基于Spark的点排序识别聚类结构算法[J].计算机科学,2018,45(1):97-102, 107.

作者姓名：	瞿原邓维斌胡峰张其龙王鸿

作者单位：	重庆邮电大学计算智能重庆市重点实验室重庆400065,重庆邮电大学计算智能重庆市重点实验室重庆400065,重庆邮电大学计算智能重庆市重点实验室重庆400065,重庆邮电大学计算智能重庆市重点实验室重庆400065,重庆邮电大学计算智能重庆市重点实验室重庆400065

基金项目：	本文受国家自然科学基金项目(61309014,61379114,61472056),教育部人文社科规划基金项目(15XJA630003),重庆市基础与前沿研究计划(cstc2013jcyjA40063,cstc2014jcyjA40049),重庆市教委科学技术研究项目(KJ1500416)资助

摘要：	点排序识别聚类结构(Ordering Points to Identify the Clustering Structure,OPTICS)的密度聚类算法能以可视化的方式导出数据集的内在聚类结构,并且可以通过簇排序提取基本的聚类信息。但是该算法由于时空复杂度较高,不能很好地适应当今社会出现的大型数据集。随着云计算和并行计算的发展,提供了一种解决OPTICS算法复杂度缺陷的方法和一种建立在基于Spark内存计算平台的点排序识别聚类结构并行算法。测试的实验结果表明,它能极大地降低OPTICS算法对时间和空间的需要。
关键词：	大数据 Spark OPTICS算法密度聚类
收稿时间：	2017/3/3 0:00:00
修稿时间：	2017/6/13 0:00:00
Algorithm for Ordering Points to Identify Clustering Structure Based on Spark

QU Yuan,DENG Wei-bin,HU Feng,ZHNG Qi-long and WANG Hong.Algorithm for Ordering Points to Identify Clustering Structure Based on Spark[J].Computer Science,2018,45(1):97-102, 107.

Authors:	QU Yuan DENG Wei-bin HU Feng ZHNG Qi-long and WANG Hong

Affiliation:	Chongqing Key Laboratory of Computational Intelligence,Chongqing University of Posts and Telecommunications,Chongqing 400065,China,Chongqing Key Laboratory of Computational Intelligence,Chongqing University of Posts and Telecommunications,Chongqing 400065,China,Chongqing Key Laboratory of Computational Intelligence,Chongqing University of Posts and Telecommunications,Chongqing 400065,China,Chongqing Key Laboratory of Computational Intelligence,Chongqing University of Posts and Telecommunications,Chongqing 400065,China and Chongqing Key Laboratory of Computational Intelligence,Chongqing University of Posts and Telecommunications,Chongqing 400065,China

Abstract:	Ordering points to identify the clustering Structure (OPTICS) is a hierarchical density-based data clustering algorithm,which can derive the intrinsic clustering structure of the dataset in a visual way,and can extract the basic clustering information by cluster sorting.However,due to its high temporal and spatial complexity,it can not adapt well to the large datasets in modern society.With the development of cloud computing and parallel computing,a method to solve the complexity of OPTICS algorithm was provided.This paper proposed a parallel OPTICS algorithm based on the Spark memory computing platform.The experimental results show that it can greatly reduce the time and space consumption of OPTICS algorithm.

Keywords:	Bigdata Spark OPTICS algorithm Density based clustering

	点击此处可从《计算机科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏