首页 | 本学科首页   官方微博 | 高级检索  
     

基于区间数的多维不确定性数据UID-DBSCAN聚类算法
引用本文:魏方圆,黄德才.基于区间数的多维不确定性数据UID-DBSCAN聚类算法[J].计算机科学,2017,44(Z11):442-447.
作者姓名:魏方圆  黄德才
作者单位:浙江工业大学计算机科学与技术学院 杭州310023,浙江工业大学计算机科学与技术学院 杭州310023
基金项目:本文受水利部公益性行业科研专项(201401044)资助
摘    要:不确定性数据聚类方法的研究日益受到广泛关注,其中UIDK-means算法与U-PAM算法继承了基于划分算法无法识别任意形状簇和对噪声点敏感的缺陷。FDBSCAN算法事先假定不确定性数据的概率分布函数或概率密度函数是已知的,然而这些信息在实际应用中往往难以获取。针对上述算法的不足,提出一种基于区间数的多维不确定性数据聚类UID-DBSCAN算法。该算法利用区间数结合数据的统计信息合理地表示不确定性数据,采用低计算复杂度的区间数距离函数衡量不确定性数据对象间的相似度,首次提出区间数的密度、密度可达与密度相连等概念,并将其用于扩展簇中,同时结合数据集的统计特征自适应地选取算法的密度参数来实现自动聚类。实验结果表明,UID-DBSCAN算法能够有效识别噪声,处理任意形状簇,具有较高的聚类精度和较低的计算复杂度。

关 键 词:不确定性数据  区间数  聚类算法  DBSCAN

UID-DBSCAN Clustering Algorithm of Multi-dimensional Uncertain Data Based on Interval Number
WEI Fang-yuan and HUANG De-cai.UID-DBSCAN Clustering Algorithm of Multi-dimensional Uncertain Data Based on Interval Number[J].Computer Science,2017,44(Z11):442-447.
Authors:WEI Fang-yuan and HUANG De-cai
Affiliation:College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China and College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China
Abstract:The researches on clustering methods of uncertain data have been paid more and more attention,among them,the UIDK-means algorithm and U-PAM algorithm inherit the partition-based algorithm defects that can not identify any shape clusters and is sensitive to noise.FDBSCAN algorithm assumes that the probability distribution function or probability density function of uncertain data is known,however this information is hard to acquire.For the shortage of the above algorithms,a new multi-dimensional uncertain data clustering algorithm namely UID-DBSCAN based on interval numbers was proposed.It uses interval data combined with statistic information to describe uncertain data reaso-nably.And it utilizes the intervals distance function of low computing complexity to measure the similarity of different uncertain data.The concepts of interval density,interval density-reachable and interval density connected were firstly proposed and applied to expand clusters.Meanwhile in order to realize automatic clustering,combining with statistical features of the data,the parameters of density can be adaptively selected.Experiment results show that UID-DBSCAN algorithm can identify noise effectively,process arbitrary shape clusters and obtain better clustering precision with low computing complexity.
Keywords:Uncertain data  Interval number  Clustering algorithm  DBSCAN
点击此处可从《计算机科学》浏览原始摘要信息
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号