首页 | 本学科首页   官方微博 | 高级检索  
     

基于后验概率的不平衡数据集特征选择算法
引用本文:曹苏群,王士同,陈晓峰.基于后验概率的不平衡数据集特征选择算法[J].计算机工程,2008,34(19):1-3.
作者姓名:曹苏群  王士同  陈晓峰
作者单位:1. 江南大学信息学院,无锡,214122;淮阴工学院机械工程系,淮安,223001
2. 江南大学信息学院,无锡,214122
基金项目:国家部委资助项目,教育部科学技术研究项目,教育部优秀人才支持计划基金,国家重点实验室基金,南京大学软件新技术国家重点实验室开放课题基金
摘    要:针对不平衡数据集,提出一种基于后验概率的特征选择算法。该算法引入基于Parzen-window方法估算的不均衡因子,并以Tomek links中点为初始值进行迭代,找出满足后验概率相等的判别边界点,通过对这些点法向量进行投影计算得到各特征的权值。实验表明,对于不平衡数据集,该算法在不降低分类器总体性能的基础上,不仅可以有效降低维度,节省计算开销,而且能够避免常规特征选择算法用于不平衡数据时忽视小类的缺点。

关 键 词:不平衡数据集  特征选择  后验概率
修稿时间: 

Posterior-probability-based Feature Selection Algorithm for Imbalanced Datasets
CAO Su-qun,WANG Shi-tong,CHEN Xiao-feng.Posterior-probability-based Feature Selection Algorithm for Imbalanced Datasets[J].Computer Engineering,2008,34(19):1-3.
Authors:CAO Su-qun  WANG Shi-tong  CHEN Xiao-feng
Affiliation:(1. School of Information, Jiangnan University, Wuxi 214122; 2. Department of Mechanical Engineering, Huaiyin Institute of Technology, Huaian 223001)
Abstract:In this paper, a posterior-probability-based feature selection algorithm is proposed for imbalanced datasets. In the proposed algorithm, an imbalanced factor is introduced and computed by Parzen-window estimation. The middle point of Tomek links is chosen as the initial point. Accordingly, this algorithm is iterated to find out the boundary points which have the equality of posterior probability. Through the project computation on the normal vectors of these points, the weight of each feature can be obtained, which actually indicates the importance degree of each feature. The experimental results on three real-word datasets demonstrate that this proposed algorithm can not only reduce the computational cost but also overcome the shortcoming that the majority class may be detected well but the minority class may be ignored in the conventional feature selection algorithm.
Keywords:imbalanced datasets  feature selection  posterior probability
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号