首页 | 本学科首页   官方微博 | 高级检索  
     

不确定数据聚类的U-PAM算法和UM-PAM算法的研究
引用本文:何云斌,张志超,万静,李松.不确定数据聚类的U-PAM算法和UM-PAM算法的研究[J].计算机科学,2016,43(6):263-269.
作者姓名:何云斌  张志超  万静  李松
作者单位:哈尔滨理工大学计算机科学与技术学院 哈尔滨150080,哈尔滨理工大学计算机科学与技术学院 哈尔滨150080,哈尔滨理工大学计算机科学与技术学院 哈尔滨150080,哈尔滨理工大学计算机科学与技术学院 哈尔滨150080
基金项目:本文受黑龙江省教育厅科学技术研究项目(12511100),黑龙江省自然科学基金项目(F201302,F201134)资助
摘    要:UK-means算法在处理不确定数据时对孤立点非常敏感,而且事先必须已知不确定数据的分布函数或概率密度,然而这在实际中往往很难获得。因此,针对UK-means在处理不确定测量数据时的不足,首先提出了基于区间数的PAM不确定聚类算法——U-PAM,该算法用区间数和标准差合理地描述了不确定测量数据的不确定性,进而完成有效的聚类;其次,针对海量不确定测量数据难以聚类的问题,基于U-PAM聚类算法,采用抽样技术提出了处理海量不确定测量数据的算法——UM-PAM算法,该算法先抽样,对样本数据聚类,然后再总体聚类;最后,基于U-PAM算法和CH聚类的有效性指标函数对聚类结果进行分析,以确定最佳聚类数。实验理论表明,所提算法聚类效果明显。

关 键 词:不确定数据  区间数  聚类算法  PAM
收稿时间:2015/5/13 0:00:00
修稿时间:8/5/2015 12:00:00 AM

Research for Uncertain Data Clustering Algorithm:U-PAM and UM-PAM Algorithm
HE Yun-bin,ZHANG Zhi-chao,WAN Jing and LI Song.Research for Uncertain Data Clustering Algorithm:U-PAM and UM-PAM Algorithm[J].Computer Science,2016,43(6):263-269.
Authors:HE Yun-bin  ZHANG Zhi-chao  WAN Jing and LI Song
Affiliation:School of Computer Science and Technology,Harbin University of Science and Technology,Harbin 150080,China,School of Computer Science and Technology,Harbin University of Science and Technology,Harbin 150080,China,School of Computer Science and Technology,Harbin University of Science and Technology,Harbin 150080,China and School of Computer Science and Technology,Harbin University of Science and Technology,Harbin 150080,China
Abstract:UK-means algorithm is very sensitive to outliers in dealing with uncertain data,and the probability density or distribution function of uncertain data must be acquired in advance.However,it is often difficult to obtain in practice.For the shortage of UK-means in dealing with uncertainty measurement data,this paper firstly proposed a new algorithm namely U-PAM,based on PAM algorithm and intervals.It describes the uncertainty of measurement data with intervals reasonably and standard deviation so as to complete clustering effectively.Secondly,it is often difficult to cluster for the massive of data.For this regard,according to sampling techniques,this paper proposed the UM-PAM algorithm so as to deal with massive of uncertainty measurement data efficiently.It primary clusters sample data,and then clusters overall.Finally,the U-PAM algorithm can analyze the clustering result by combining with the CH validity index to determine the optimal clustering number.Experimental results show that the proposed algorithm can give effective clustering result obviously.
Keywords:Uncertain data  Intervals  Clustering  PAM
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号