首页 | 本学科首页   官方微博 | 高级检索  
     

采用属性聚类的高维子空间聚类算法
引用本文:牛琨,张舒博,陈俊亮. 采用属性聚类的高维子空间聚类算法[J]. 北京邮电大学学报, 2007, 30(3): 1-5
作者姓名:牛琨  张舒博  陈俊亮
作者单位:北京邮电大学,网络与交换技术国家重点实验室,北京,100876;中国电信北京研究院,决策研究部,北京,100035
基金项目:国家重点基础研究发展计划(973计划),国家自然科学基金
摘    要:为了解决现有子空间聚类算法时间复杂度偏高以及对输入参数敏感的问题,提出了一种基于属性聚类方法的高效子空间聚类算法.算法首先通过计算每个属性的基尼值来过滤冗余属性,而后通过基于二维联合基尼值的关系函数建立非冗余属性的关系矩阵,以衡量任意两个非冗余属性的相关度, 进而在关系矩阵上应用可产生交叠的聚类算法,聚类结果即为所有兴趣度子空间的候选集合,最后调用聚类算法得到所有存在于这些子空间内的簇.在人工数据集和真实数据集上的实验表明,新算法不仅在时间复杂度和子空间簇的寻找能力方面均有较优表现,而且对输入参数的取值不甚敏感.

关 键 词:子空间聚类  高维数据  属性聚类
文章编号:1007-5321(2007)03-0001-05
收稿时间:2006-08-07
修稿时间:2006-08-07

Subspace Clustering through Attribute Clustering
NIU Kun,ZHANG Shu-bo,CHEN Jun-liang. Subspace Clustering through Attribute Clustering[J]. Journal of Beijing University of Posts and Telecommunications, 2007, 30(3): 1-5
Authors:NIU Kun  ZHANG Shu-bo  CHEN Jun-liang
Affiliation:1. State Key Laboratory of Networking and Switching Technology, Beijing 100876, China;
2. Dept. of Strategy Research, China Telecom Beijing Research Institute, Beijing 100035, China
Abstract:Many recently proposed subspace clustering methods suffer from two severe problems: First,the algorithms typically scale exponentially with the data dimensionality or the subspace dimensionality of clusters.Second,the clustering results are often sensitive to input parameters.A fast algorithm of subspace clustering using attribute clustering is proposed to overcome these limitations.This algorithm first filters out redundant attributes by computing the gini coefficient.To evaluate the correlation of each two non-redundant attributes,the relation matrix of non-redundant attributes is constructed based on the relation function of two dimensional united gini coefficients.After applying overlapping clustering algorithm on relation matrix,the candidate of all interesting subspaces is achieved.Finally,all subspace clusters can be gotten by clustering on interesting subspaces.Experiments on both synthesis and real datasets show that the new algorithm not only achieves a significant gain of runtime and quality to find subspace clusters but also is insensitive to input parameters.
Keywords:subspace clustering   high dimensional data   attribute clustering
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《北京邮电大学学报》浏览原始摘要信息
点击此处可从《北京邮电大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号