一种基于多属性权重的分类数据子空间聚类算法 |
| |
引用本文: | 庞宁,张继福,秦啸.一种基于多属性权重的分类数据子空间聚类算法[J].自动化学报,2018,44(3):517-532. |
| |
作者姓名: | 庞宁 张继福 秦啸 |
| |
作者单位: | 1.太原科技大学计算机科学与技术学院 太原 030024 中国 |
| |
基金项目: | 国家自然科学基金61572343 |
| |
摘 要: | 采用多属性频率权重以及多目标簇集质量聚类准则,提出一种分类数据子空间聚类算法.该算法利用粗糙集理论中的等价类,定义了一种多属性权重计算方法,有效地提高了属性的聚类区分能力;在多目标簇集质量函数的基础上,采用层次凝聚策略,迭代合并子簇,有效地度量了各类尺度的聚类簇;利用区间离散度,解决了使用阈值删除噪音点所带来的参数问题;利用属性对簇的依附程度,确定了聚类簇的属性相关子空间,提高了聚类簇的可理解性.最后,采用人工合成、UCI和恒星光谱数据集,实验验证了该聚类算法的可行性和有效性.
|
关 键 词: | 分类数据聚类 多属性频率 多目标簇集质量 属性相关子空间 区间离散度 |
收稿时间: | 2016-10-19 |
A Subspace Clustering Algorithm of Categorical Data Using Multiple Attribute Weights |
| |
Affiliation: | 1.School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China2.Department of Computer Science and Software Engineering, Auburn University, Auburn 36849, USA |
| |
Abstract: | In this paper, we propose a subspace clustering algorithm using frequencies of multiple attributes of categorical data. An attribute-value weight is calculated on the basis of equivalence class in rough set theory by using frequencies of multiple attributes. Attribute-value weights offer ample opportunities to improve classiffication ability of clustering. The well-known parameter problem, which is caused by using a threshold to delete noise points, is solved by the virtue of interval dispersion degrees. By adopting the hierarchical clustering method to iteratively merge sub-clusters, we effectively measure various scale clusters on the basis of a multi-objective clusters quality function. An attribute subspace is determined with the relevance degree of dimension to a cluster, so as to improve the cluster's interpretability. Finally, we validate the feasibility and effectiveness of our algorithm through extensive experiments using synthetic data as well as UCI and stellar spectral data sets. |
| |
Keywords: | |
|
| 点击此处可从《自动化学报》浏览原始摘要信息 |
|
点击此处可从《自动化学报》下载全文 |
|