首页 | 本学科首页   官方微博 | 高级检索  
     

不均衡问题中的特征选择新算法:Im-IG
引用本文:尤鸣宇,陈燕,李国正.不均衡问题中的特征选择新算法:Im-IG[J].山东大学学报(工学版),2010,40(5):123-128.
作者姓名:尤鸣宇  陈燕  李国正
作者单位:同济大学电子与信息工程学院控制科学与工程系, 上海 201804
基金项目:国家自然科学基金资助项目,上海市青年科技启明星计划资助项目 
摘    要:机器学习中各类别样本数目不等是普遍存在且备受关注的不均衡问题。广泛用于特征选择的信息增益IG(information gain)算法,在这类不均衡问题中的表现却极少被研究。本文在讨论IG算法在不同均衡度数据集上性能的基础上,提出了一种新的解决不均衡问题的特征选择算法Im-IG(imbalanced information gain)。Im-IG通过提高小类分布在信息熵计算中的权重,优先选入有利于小类正确分离的特征。在提升整体分类性能的同时,着眼于提高小类的正确率。在多个不均衡数据集上的实验结果表明,Im-IG算法能较好地解决IG算法在不均衡问题中的不适应性,是一种有效的不均衡问题特征选择算法。

关 键 词:Im-IG算法  不均衡问题  特征选择  
收稿时间:2010-05-10

Im-IG: A novel feature selection method for imbalanced problems
YOU Ming-yu,CHEN Yan,LI Guo-zheng.Im-IG: A novel feature selection method for imbalanced problems[J].Journal of Shandong University of Technology,2010,40(5):123-128.
Authors:YOU Ming-yu  CHEN Yan  LI Guo-zheng
Affiliation:College of Electronic and Information, Tongji University, Shanghai 201804, China
Abstract:Imbalanced data set is a ubiquitous problem in machine learning field, which attracts much attention from related scientists. Information Gain (IG) method is widely used in feature selection, but it is seldom researched in imbalanced problem. Based on the performance discussion of IG on imbalanced data sets, a new method Im-IG was proposed for imbalanced problem in feature selection. Im-IG increased the weight of minor class in the entropy calculation, in order to select features which were better for minor class. Im-IG focused on improving the classification accuracy of minor class, based on the performance improvement of the whole data set. Experimental results on several imbalanced data sets showed that Im-IG can solve the imbalanced predicament IG met and it was an effective feature selection method for imbalanced problem.
Keywords:Im-IG method  imbalance problem  feature selection
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《山东大学学报(工学版)》浏览原始摘要信息
点击此处可从《山东大学学报(工学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号