基于单边选择链和样本分布密度融合机制的非平衡数据挖掘方法 A Data Mining Method for Imbalanced Datasets Based on One-Sided Link and Distribution Density of Instances期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于单边选择链和样本分布密度融合机制的非平衡数据挖掘方法

引用本文：	翟云,王树鹏,马楠,杨炳儒,张德政. 基于单边选择链和样本分布密度融合机制的非平衡数据挖掘方法[J]. 电子学报, 2014, 42(7): 1311-1319. DOI: 10.3969/j.issn.0372-2112.2014.07.011

作者姓名：	翟云王树鹏马楠杨炳儒张德政

作者单位：	1. 国家行政学院电子政务研究中心, 北京 100089;2. 北京科技大学计算机与通信工程学院, 北京 100083;3. 中国科学院信息工程研究所, 北京 100093;4. 北京联合大学信息学院, 北京 100101

基金项目：	国家自然科学基金(No ．61300078，No ．61271275)；国家行政学院科研招标课题

摘要：	非平衡数据集分类问题是机器学习领域的重大挑战性难题.针对该难题，传统的少数类样本合成技术（Synthetic Minority Over-Sampling Technique，SMOTE）已成为一种有力手段并得到广泛采用.但在新样本生成过程中，SMOTE利用所有少数类样本合成新样本，由此产生过拟合瓶颈.为更好地解决该问题，提出了一种基于单边选择链和样本分布密度的非平衡数据挖掘新方法（One-Sided Link & Distribution Density-SMOTE，OSLDD-SMOTE）.OSLDD-SMOTE通过单边选择链遴选出处于分类边界的少数类样本，根据这些样本的动态分布密度生成新样本.进而分析了样本合成度对节点数目和对少数类精度的影响；基于G-mean、F-measure和AUC三个指标综合比较了OSLDD-SMOTE与其他同类方法的分类性能.实验结果表明，OSLDD-SMOTE有效提高了少数类样本的分类准确率.
关键词：	非平衡数据分类单边选择链分布密度重采样
收稿时间：	2013-11-03
A Data Mining Method for Imbalanced Datasets Based on One-Sided Link and Distribution Density of Instances

ZHAI Yun,WANG Shu-peng,MA Nan,YANG Bing-ru,ZHANG De-zheng. A Data Mining Method for Imbalanced Datasets Based on One-Sided Link and Distribution Density of Instances[J]. Acta Electronica Sinica, 2014, 42(7): 1311-1319. DOI: 10.3969/j.issn.0372-2112.2014.07.011

Authors:	ZHAI Yun WANG Shu-peng MA Nan YANG Bing-ru ZHANG De-zheng

Affiliation:	1. E-Government Research Center, Chinese Academy of Governance, Beijing, 100089, China;2. School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China;3. Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;4. College of Information Technology, Beijing Union University, Beijing 100101, China

Abstract:	Classification in imbalanced datasets poses a great challenge to machine learning region,where the synthetic minority over-sampling technique(SMOTE)has become a powerful means and widely adopted as an effective method.But in generating new instances,SMOTE uses all instances in minority class such that it takes with over-generalization.To better solve the problem,a data mining method for imbalanced datasets based on one-sided link and distribution density of the minority(OSLDD-SMOTE)is proposed in this paper.OSLDD-SMOTE firstly selects the minority near the classification boundary using the one-sided link,then generates new instances with SMOTE based on the dynamic distribution density of these instances.Effects of synthetic degree on new generated instances and accuracy of the minority are respectively compared with the OSLDD-SMOTE,SMOTE,Borderline-SMOTE and Surrounding-SMOTE method.Furthermore,from the simulation results with 8 UCI datasets,our proposed method has the most accurate and robust performance on the G-mean,F-measure and AUC metrics.

Keywords:	classification in imbalanced datasets one-sided link distribution density resample
本文献已被 CNKI 等数据库收录！
	点击此处可从《电子学报》浏览原始摘要信息
	点击此处可从《电子学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏