首页 | 本学科首页   官方微博 | 高级检索  
     

使用均值距离与关联性标记的并行OPTICS算法
引用本文:郑剑,余鑫.使用均值距离与关联性标记的并行OPTICS算法[J].计算机工程与应用,2023,59(5):232-244.
作者姓名:郑剑  余鑫
作者单位:江西理工大学 信息工程学院,江西 赣州 341000
基金项目:江西省教育厅科学技术研究项目(GJJ170517);
摘    要:针对大数据环境下传统并行密度聚类算法中存在的数据划分不合理,聚类结果准确度不高,结果受参数影响较大以及并行效率低等问题,提出一种MapReduce下使用均值距离与关联性标记的并行OPTICS算法——POMDRM-MR。算法使用一种基于维度稀疏度的减少边界点划分策略(DS-PRBP),划分数据集;针对各个分区,提出标记点排序识别簇算法(MOPTICS),构建数据点与核心点之间的关联性,并标记数据点迭代次数,在距离度量中,使用领域均值距离策略(FMD),计算数据点的领域均值距离,代替可达距离排序,输出关联性标记序列;最后结合重排序序列提取簇算法(REC),对输出序列进行二次排序并提取簇,提高算法局部聚类的准确性和稳定性;在合并全局簇时,算法提出边界密度筛选策略(BD-FLC),计算筛选密度相近局部簇;又基于n叉树的并集型合并与MapReduce模型,提出并行局部簇合并算法(MCNT-MR),加快局部簇收敛,并行合并局部簇,提升全局簇合并效率。对照实验表明,POMDRM-MR算法聚类效果更佳,且在大规模数据集下算法的并行化性能更好。

关 键 词:大数据  密度聚类  MapReduce  OPTICS  PRBP

Parallel OPTICS by Using Mean Distance and Relevance Marks
ZHENG Jian,YU Xin.Parallel OPTICS by Using Mean Distance and Relevance Marks[J].Computer Engineering and Applications,2023,59(5):232-244.
Authors:ZHENG Jian  YU Xin
Affiliation:School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China
Abstract:The main target of this paper is to design a parallel optics algorithm by using mean distance and relevance marks based on MapReduce, noted as POMDRM-MR, to deal with the problems of unreasonable data division, low accuracy of clustering results, the results are greatly affected by parameters and low efficiency of parallelization in parallel density-based clustering algorithm in big data. In POMDRM-MR, an approach called partition with reduced boundary points based on dimension sparsity(DS-PRBP) is proposed to divide the dataset. For each partition, the algorithm called marking and ordering points to identify the cluster structure(MOPTICS) is proposed to construct the correlations between data points and core points and mark the number of iterations, the field mean distance strategy(FMD) is proposed to calculate the field mean distance of data points instead of the reachable distance in measuring distance. After outputting sequence, combined with reordering and extracting clusters algorithm(REC), the sequence is sorted twice which improves the accuracy and stability. In merging global clusters, an approach called using boundary density to filter local cluster(BD-FLC) is used to calculate and filter local clusters with similar density. And based on the union-type merging of n-ary trees and MapReduce, the parallel local cluster merging algorithm(MCNT-MR) is proposed to get the clustering results faster and merge local clusters in parallel which improves efficiency of merging local clusters. The experiments show that POMDRM-MR algorithm has better effect, and better parallelization performance on large-scale datasets.
Keywords:big data  density-based clustering  MapReduce  OPTICS  partition with reduce boundary points(PRBP)  
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号