首页 | 本学科首页   官方微博 | 高级检索  
     

不平衡数据集上的Relief特征选择算法
引用本文:菅小艳 韩素青 崔彩霞. 不平衡数据集上的Relief特征选择算法[J]. 数据采集与处理, 2016, 31(4): 838-844
作者姓名:菅小艳 韩素青 崔彩霞
作者单位:太原师范学院计算机系,晋中,030619
摘    要:Relief算法为系列特征选择方法,包括最早提出的Relief算法和后来拓展的ReliefF算法,核心思想是对分类贡献大的特征赋予较大的权值;特点是算法简单,运行效率高,因此有着广泛的应用。但直接将Relief算法应用于有干扰的数据集或不平衡数据集,效果并不理想。基于Relief算法,提出一种干扰数据特征选择算法,称为阈值-Relief算法,有效消除了干扰数据对分类结果的影响。结合K-means算法,提出两种不平衡数据集特征选择算法,分别称为K-means-ReliefF算法和K-means-Relief抽样算法,有效弥补了Relief算法在不平衡数据集上表现出的不足。实验证明了本文算法的有效性。

关 键 词:特征选择;Relief算法;ReliefF算法;不平衡数据集

Relief Feature Selection Algorithm on Unbalanced Datasets
Jian Xiaoyan,Han Suqing,Cui Caixia. Relief Feature Selection Algorithm on Unbalanced Datasets[J]. Journal of Data Acquisition & Processing, 2016, 31(4): 838-844
Authors:Jian Xiaoyan  Han Suqing  Cui Caixia
Affiliation:Department of Computer Science, Taiyuan Normal University, Jinzhong, 030619, China
Abstract:Relief algorithm is a series of feature selection method. It includes the basic principle of Relief algorithm and its later extensions reliefF algotithm. Its core concept is to weight more on features that have essential contributions to classification. Relief algorithm is simple and efficient, thus being widely used. However, algorithm performance is not satisfied when applying the algorithm to noisy and unbalanced datasets. In this paper, based on the Relief algorithm, a feature selection method is proposed, called threshold-Relief algorithm, which eliminates the influence of noisy data on classification results. Combining with the K-means algorithm, two unbalanced datasets feature selection methods are proposed, called K-means-ReliefF algorithm and K-means-relief sampling algorithm, respectively, which can compensate for the poor performance of Relief algorithm in unbalanced datasets. Experiments show the effectiveness of the proposed algorithms.
Keywords:feature selection   Relief algorithm   ReliefF algorithm   unbalanced datasets
点击此处可从《数据采集与处理》浏览原始摘要信息
点击此处可从《数据采集与处理》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号