首页 | 官方网站   微博 | 高级检索  
     

聚类混合型数据的密度峰值改进算法
引用本文:谭阳,唐德权,曹守富.聚类混合型数据的密度峰值改进算法[J].计算机工程与应用,2020,56(12):47-53.
作者姓名:谭阳  唐德权  曹守富
作者单位:1.湖南师范大学 数学与统计学院,长沙 410081 2.湖南广播电视大学 网络技术系,长沙 410004 3.湖南警察学院 信息技术系,长沙 410138
基金项目:湖南省自然科学基金;湖南省教育厅科学研究项目;国家自然科学基金
摘    要:聚类混合型数据,通常是依据样本属性类别的不同分别进行评价。但这种将样本属性划分到不同子空间中分别度量的方式,割裂了样本属性原有的统一性;导致对样本个体的相似性评价产生了非一致的度量偏差。针对这一问题,提出以二进制编码样本属性,再由海明差异对属性编码施行统一度量的新的聚类算法。新算法通过在统一的框架内对混合型数据实施相似性度量,避免了对样本属性的切割,在此基础上又根据不同属性的性质赋予其不同的权重,并以此评价样本个体之间的相似程度。实验结果表明,新算法能够有效地聚类混合型数据;与已有的其他聚类算法相比较,表现出更好的聚类准确率及稳定性。

关 键 词:聚类  混合型数据  密度峰值  属性编码  海明度量

Density Peak Improvement Algorithm for Clustering Hybrid Data
TAN Yang,TANG Dequan,CAO Shoufu.Density Peak Improvement Algorithm for Clustering Hybrid Data[J].Computer Engineering and Applications,2020,56(12):47-53.
Authors:TAN Yang  TANG Dequan  CAO Shoufu
Affiliation:1.College of Mathematics and Statistics, Hunan Normal University, Changsha 410081, China 2.Department of Network Technology, Hunan Radio and Television University, Changsha 410004, China 3.Department of Information Technology, Hunan Police Academy, Changsha 410138, China
Abstract:Clustering mixed data is usually evaluated according to the difference of sample attribute categories. However, this way of dividing the sample attributes into different subspaces separately separates the original unity of the sample attributes, and leads to the non-consistent metric deviation for the similarity evaluation of the sample individual. Concerning this issue, a new clustering algorithm based on binary coded sample attributes is proposed, and then unified metrics for attribute coding are carried out by Hamming’s difference. The new algorithm avoids the cutting of sample attributes by performing similarity measures on mixed data within a unified framework. Based on this, it also assigns different weights based on the properties of different attributes and to evaluate the similarity between the samples. The experimental results show that the new algorithm can effectively cluster mixed data, and compared with other existing clustering algorithms, it shows better clustering accuracy and stability.
Keywords:clustering  hybrid data  density peak  attribute coding  Hamming metric  
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号