基于聚类分析的不均衡数据标注技术研究 Research on Unbalanced Data Labeling Technology Based on Clustering Analysis期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于聚类分析的不均衡数据标注技术研究

引用本文：	赵俊杰,黄四牛,吴正午,王帅. 基于聚类分析的不均衡数据标注技术研究[J]. 计算机仿真, 2020, 0(2): 476-480

作者姓名：	赵俊杰黄四牛吴正午王帅

作者单位：	北京控制与电子技术研究所

基金项目：	国防科技创新特区项目支持。

摘要：	分布不均衡的数据在通过传统聚类分析的方式进行标注时,聚类效果容易偏向于样本数多的类,从而造成标注出现误差的问题。针对此问题提出改进的含有均衡约束聚类算法的标注方法,对不均衡数据的聚类标注准确率实现了比较有效的提高,方法包含数据初始聚类、专家知识调整,数据均衡化处理,含均衡约束聚类等步骤。通过初始聚类对不均衡数据进行初始类标签分配,专家知识调整对部分数据错误标注进行标签调整修改,对数据进行均衡化处理得到均衡数据集,通过均衡约束聚类对均衡数据进行标签最终精确分配。经仿真验证表明,上述方法比较有效的提高了不均衡数据标注准确率。
关键词：	不均衡数据数据标注聚类分析均衡化处理仿真验证
Research on Unbalanced Data Labeling Technology Based on Clustering Analysis

ZHAO Jun-jie,HUANG Si-niu,WU Zheng-wu,WANG Shuai. Research on Unbalanced Data Labeling Technology Based on Clustering Analysis[J]. Computer Simulation, 2020, 0(2): 476-480

Authors:	ZHAO Jun-jie HUANG Si-niu WU Zheng-wu WANG Shuai

Affiliation:	(Science and Technology on Information System Engineering Laboratory,Beijing 100038,China)

Abstract:	When labeling on unbalanced datasets based on clustering analysis, it has a problem that clustering effect favors in ‘big’ cluster causing the errors. Focus on the problem, we proposed a labeling method based on a new clustering algorithm, the method includes initial clustering, expert knowledge modifying the error, balanced processing of the unbalanced datasets and re-clustering on balanced datasets. We got the initial clusters by the initial clustering. Then we modified the errors for a part of the data under the guidance of the expert knowledge. After the balanced processing of the unbalanced data, we proposed and used a new clustering algorithm with balancing constraint, and the data are re-labeled based on the clustering method, which finally improves the accuracy of the labeled results. Through simulation, it is proved that the proposed method can improve the accuracy of clustering and labeling.

Keywords:	Imbalanced data Data labeling Clustering analysis Balance processing Simulation verification
本文献已被维普等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏