首页 | 本学科首页   官方微博 | 高级检索  
     

不平衡数据的软子空间聚类算法
引用本文:程铃钫,杨天鹏,陈黎飞.不平衡数据的软子空间聚类算法[J].计算机应用,2017,37(10):2952-2957.
作者姓名:程铃钫  杨天鹏  陈黎飞
作者单位:1. 福建农林大学 金山学院, 福州 350002;2. 福建师范大学 数学与计算机科学学院, 福州 350117
基金项目:国家自然科学基金资助项目(61672157);福建省自然科学基金资助项目(2015J01238)。
摘    要:针对受均匀效应的影响,当前K-means型软子空间算法不能有效聚类不平衡数据的问题,提出一种基于划分的不平衡数据软子空间聚类新算法。首先,提出一种双加权方法,在赋予每个属性一个特征权重的同时,赋予每个簇反映其重要性的一个簇类权重;其次,提出一种混合型数据的新距离度量,以平衡不同类型属性及具有不同符号数目的类属型属性间的差异;第三,定义了基于双加权方法的不平衡数据子空间聚类目标优化函数,给出了优化簇类权重和特征权重的表达式。在实际应用数据集上进行了系列实验,结果表明,新算法使用的双权重方法能够为不平衡数据中的簇类学习更准确的软子空间;与现有的K-means型软子空间算法相比,所提算法提高了不平衡数据的聚类精度,在其中的生物信息学数据上可以取得近50%的提升幅度。

关 键 词:软子空间聚类  不平衡数据  特征权重  簇类权重  
收稿时间:2017-05-15
修稿时间:2017-07-10

Soft subspace clustering algorithm for imbalanced data
CHENG Lingfang,YANG Tianpeng,CHEN Lifei.Soft subspace clustering algorithm for imbalanced data[J].journal of Computer Applications,2017,37(10):2952-2957.
Authors:CHENG Lingfang  YANG Tianpeng  CHEN Lifei
Affiliation:1. Jinshan College, Fujian Agriculture and Forestry University, Fuzhou Fujian 350002, China;2. School of Mathematics and Computer Science, Fujian Normal University, Fuzhou Fujian 350117, China
Abstract:Aiming at the problem that the current K-means-type soft-subspace algorithms cannot effectively cluster imbalanced data due to uniform effect, a new partition-based algorithm was proposed for soft subspace clustering on imbalanced data. First, a bi-weighting method was proposed, where each attribute was assigned a feature-weight and each cluster was assigned a cluster-weight to measure its importance for clustering. Second, in order to make a trade-off between attributes with different types or those categorical attributes having various numbers of categories, a new distance measurement was then proposed for mixed-type data. Third, an objective function was defined for the subspace clustering algorithm on imbalanced data based on the bi-weighting method, and the expressions for optimizing both the cluster-weights and feature-weights were derived. A series of experiments were conducted on some real-world data sets and the results demonstrated that the bi-weighting method used in the new algorithm can learn more accurate soft-subspace for the clusters hidden in the imbalanced data. Compared with the existing K-means-type soft-subspace clustering algorithms, the proposed algorithm yields higher clustering accuracy on imbalanced data, achieving about 50% improvements on the bioinformatic data used in the experiments.
Keywords:soft subspace clustering                                                                                                                        imbalanced data                                                                                                                        feature weight                                                                                                                        cluster weight
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号