首页 | 本学科首页   官方微博 | 高级检索  
     

非均匀数据的变异系数聚类算法
引用本文:杨天鹏,徐鲲鹏,陈黎飞.非均匀数据的变异系数聚类算法[J].山东大学学报(工学版),2018,48(3):140-145.
作者姓名:杨天鹏  徐鲲鹏  陈黎飞
作者单位:1. 福建师范大学数学与信息学院, 福建 福州 350117;2. 数字福建环境监测物联网实验室, 福建 福州 350117
基金项目:国家自然科学基金资助项目(61175123);福建省自然科学基金资助项目(2015J01238);福建师范大学创新团队资助项目(IRTL1704)
摘    要:针对现有基于划分的聚类算法无法有效聚类簇大小和簇密度有较大差异的非均匀数据的问题,提出一种基于变异系数聚类算法。从聚类优化目标的角度出发,分析了以K-means为代表的划分聚类算法引发“均匀效应”的成因;提出以变异系数度量非均匀数据的分布散度,并基于变异系数定义一种非均匀数据的相异度公式;基于相异度公式定义了聚类目标优化函数,并根据局部优化方法给出聚类算法过程。在合成和真实数据集上的试验结果表明,与K-means、Verify2、ESSC聚类算法相比,本研究提出的非均匀数据的变异系数聚类算法(coefficient of variation clustering for non-uniform data, CVCN)聚类精度提升5%~40%。

关 键 词:基于划分聚类  非均匀数据  均匀效应  聚类  K-means  变异系数  
收稿时间:2017-08-24

Coefficient of variation clustering algorithm for non-uniform data
YANG Tianpeng,XU Kunpeng,CHEN Lifei.Coefficient of variation clustering algorithm for non-uniform data[J].Journal of Shandong University of Technology,2018,48(3):140-145.
Authors:YANG Tianpeng  XU Kunpeng  CHEN Lifei
Affiliation:1. College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350117, Fujian, China;2. Digit Fujian Internet-of-Things Laboratory of Environmental Monitoring, Fujian Normal University, Fuzhou 350117, Fujian, China
Abstract:Affected by the “uniform effect”, a problem existed in the partition-based algorithms remained on open and challenging taskdue to handling. To solve this problem, a clustering algorithm based on coefficient of variation was proposed. The “uniform effect” caused by K-means-type partitioning clustering algorithm from the view of clustering optimization was analyzed. Instead of the squared error, a new measure of dispersion for non-uniform data was proposed relied on the coefficient of variation. The clustering objective optimization function was defined using a new non-uniform data dissimilarity formula, which was proposed based on the coefficient of variation. According to the local optimization method, the clustering algorithm process was given. The experimental results on real and synthetic non-uniform datasets showed that the clustering accuracy of CVCN was better than K-means, Verify2, ESSC.
Keywords:clustering  partition-based clustering  coefficient of variation  K-means  uniform effect  non-uniform data  
本文献已被 CNKI 等数据库收录!
点击此处可从《山东大学学报(工学版)》浏览原始摘要信息
点击此处可从《山东大学学报(工学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号