首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于多属性权重的分类数据子空间聚类算法
引用本文:庞宁,张继福,秦啸.一种基于多属性权重的分类数据子空间聚类算法[J].自动化学报,2018,44(3):517-532.
作者姓名:庞宁  张继福  秦啸
作者单位:1.太原科技大学计算机科学与技术学院 太原 030024 中国
基金项目:国家自然科学基金61572343
摘    要:采用多属性频率权重以及多目标簇集质量聚类准则,提出一种分类数据子空间聚类算法.该算法利用粗糙集理论中的等价类,定义了一种多属性权重计算方法,有效地提高了属性的聚类区分能力;在多目标簇集质量函数的基础上,采用层次凝聚策略,迭代合并子簇,有效地度量了各类尺度的聚类簇;利用区间离散度,解决了使用阈值删除噪音点所带来的参数问题;利用属性对簇的依附程度,确定了聚类簇的属性相关子空间,提高了聚类簇的可理解性.最后,采用人工合成、UCI和恒星光谱数据集,实验验证了该聚类算法的可行性和有效性.

关 键 词:分类数据聚类    多属性频率    多目标簇集质量    属性相关子空间    区间离散度
收稿时间:2016-10-19

A Subspace Clustering Algorithm of Categorical Data Using Multiple Attribute Weights
Affiliation:1.School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China2.Department of Computer Science and Software Engineering, Auburn University, Auburn 36849, USA
Abstract:In this paper, we propose a subspace clustering algorithm using frequencies of multiple attributes of categorical data. An attribute-value weight is calculated on the basis of equivalence class in rough set theory by using frequencies of multiple attributes. Attribute-value weights offer ample opportunities to improve classiffication ability of clustering. The well-known parameter problem, which is caused by using a threshold to delete noise points, is solved by the virtue of interval dispersion degrees. By adopting the hierarchical clustering method to iteratively merge sub-clusters, we effectively measure various scale clusters on the basis of a multi-objective clusters quality function. An attribute subspace is determined with the relevance degree of dimension to a cluster, so as to improve the cluster's interpretability. Finally, we validate the feasibility and effectiveness of our algorithm through extensive experiments using synthetic data as well as UCI and stellar spectral data sets.
Keywords:
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号