首页 | 本学科首页   官方微博 | 高级检索  
     

基于残差分析的混合属性数据聚类算法
引用本文:邱保志,张瑞霖,李向丽.基于残差分析的混合属性数据聚类算法[J].自动化学报,2020,46(7):1420-1432.
作者姓名:邱保志  张瑞霖  李向丽
作者单位:1.郑州大学信息工程学院 郑州 450001
基金项目:河南省基础与前沿技术研究项目152300410191
摘    要:针对混合属性数据聚类结果精度不高、聚类结果对参数敏感等问题, 提出了基于残差分析的混合属性数据聚类算法(Clustering algorithm for mixed data based on residual analysis) RA-Clust.算法以改进的熵权重混合属性相似性度量对象间的相似性, 以提出的基于KNN和Parzen窗的局部密度计算方法计算每个对象的密度, 通过线性回归和残差分析进行聚类中心预选取, 然后以提出的聚类中心目标优化模型确定真正的聚类中心, 最后将其他数据对象按照距离高密度对象的最小距离划分到相应的簇中, 形成最终聚类.在合成数据集和UCI数据集上的实验结果验证了算法的有效性.与同类算法相比, RA-Clust具有较高的聚类精度.

关 键 词:聚类    残差分析    线性回归    混合属性数据集    聚类中心
收稿时间:2018-01-12

Clustering Algorithm for Mixed Data Based on Residual Analysis
Affiliation:1.School of Information Engineering, Zhengzhou University, Zhengzhou 450001
Abstract:For the existing mixed data clustering algorithm, there are some problems such as low clustering accuracy and parameters sensitive, a clustering algorithm for mixed data based on residual analysis (RA-Clust) is proposed. We use entropy weight to measure the similarity between objects with mixed attributes. Based on KNN and Parzen windows, we propose a method to calculate the local density of objects. Pre-selected cluster centers is conducted by linear regression and residual analysis. Then, the true cluster centers are selected according to objective optimization model proposed in this paper. Finally, the remaining objects are assigned into corresponding clusters according to the minimum distance from the high density objects. The experimental results on synthetic datasets and UCI datasets verify the effectiveness. Compared with similar algorithms, RA-Clust has a higher clustering accuracy.
Keywords:
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号