首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于多属性权重的分类数据子空间聚类算法
引用本文:庞宁, 张继福, 秦啸. 一种基于多属性权重的分类数据子空间聚类算法. 自动化学报, 2018, 44(3): 517-532. doi: 10.16383/j.aas.2018.c160726
作者姓名:庞宁  张继福  秦啸
作者单位:1.太原科技大学计算机科学与技术学院 太原 030024 中国;;2.奥本大学计算机科学与软件工程学院 奥本 36849 美国
基金项目:国家自然科学基金61572343
摘    要:采用多属性频率权重以及多目标簇集质量聚类准则,提出一种分类数据子空间聚类算法.该算法利用粗糙集理论中的等价类,定义了一种多属性权重计算方法,有效地提高了属性的聚类区分能力;在多目标簇集质量函数的基础上,采用层次凝聚策略,迭代合并子簇,有效地度量了各类尺度的聚类簇;利用区间离散度,解决了使用阈值删除噪音点所带来的参数问题;利用属性对簇的依附程度,确定了聚类簇的属性相关子空间,提高了聚类簇的可理解性.最后,采用人工合成、UCI和恒星光谱数据集,实验验证了该聚类算法的可行性和有效性.

关 键 词:分类数据聚类   多属性频率   多目标簇集质量   属性相关子空间   区间离散度
收稿时间:2016-10-19

A Subspace Clustering Algorithm of Categorical Data Using Multiple Attribute Weights
PANG Ning, ZHANG Ji-Fu, QIN Xiao. A Subspace Clustering Algorithm of Categorical Data Using Multiple Attribute Weights. ACTA AUTOMATICA SINICA, 2018, 44(3): 517-532. doi: 10.16383/j.aas.2018.c160726
Authors:PANG Ning  ZHANG Ji-Fu  QIN Xiao
Affiliation:1. School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China;;2. Department of Computer Science and Software Engineering, Auburn University, Auburn 36849, USA
Abstract:In this paper, we propose a subspace clustering algorithm using frequencies of multiple attributes of categorical data. An attribute-value weight is calculated on the basis of equivalence class in rough set theory by using frequencies of multiple attributes. Attribute-value weights offer ample opportunities to improve classiffication ability of clustering. The well-known parameter problem, which is caused by using a threshold to delete noise points, is solved by the virtue of interval dispersion degrees. By adopting the hierarchical clustering method to iteratively merge sub-clusters, we effectively measure various scale clusters on the basis of a multi-objective clusters quality function. An attribute subspace is determined with the relevance degree of dimension to a cluster, so as to improve the cluster's interpretability. Finally, we validate the feasibility and effectiveness of our algorithm through extensive experiments using synthetic data as well as UCI and stellar spectral data sets.
Keywords:Categorical data clustering  multiple attribute frequency  multi-objective cluster quality  attributes related subspace  interval dispersion degree
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号