首页 | 本学科首页   官方微博 | 高级检索  
     

基于差别矩阵和mRMR的分步优化特征选择算法
引用本文:樊鑫,陈红梅.基于差别矩阵和mRMR的分步优化特征选择算法[J].计算机科学,2020,47(1):87-95.
作者姓名:樊鑫  陈红梅
作者单位:西南交通大学信息科学与技术学院 成都 611756;西南交通大学云计算与智能技术高校重点实验室 成都 611756
摘    要:分类问题普遍存在于现代工业生产中。在进行分类任务之前,利用特征选择筛选有用的信息,能够有效地提高分类效率和分类精度。最小冗余最大相关算法(mRMR)考虑最大化特征与类别的相关性和最小化特征之间的冗余性,能够有效地选择特征子集;但该算法存在中后期特征重要度偏差大以及无法直接给出特征子集的问题。针对该问题,文中提出了结合邻域粗糙集差别矩阵和mRMR原理的特征选择算法。根据最大相关性和最小冗余性原则,利用邻域熵和邻域互信息定义了特征的重要度,以更好地处理混合数据类型。基于差别矩阵定义了动态差别集,利用差别集的动态演化有效去除冗余属性,缩小搜索范围,优化特征子集,并根据差别矩阵判定迭代截止条件。实验选取SVM,J48,KNN和MLP作为分类器来评价该特征选择算法的性能。在公共数据集上的实验结果表明,与已有算法相比,所提算法的平均分类精度提升了2%左右,同时在特征较多的数据集上能够有效地缩短特征选择时间。所提算法继承了差别矩阵和mRMR的优点,能够有效地处理特征选择问题。

关 键 词:特征选择  邻域粗糙集  差别矩阵  mRMR

Stepwise Optimized Feature Selection Algorithm Based on Discernibility Matrix and mRMR
FAN Xin,CHEN Hong-mei.Stepwise Optimized Feature Selection Algorithm Based on Discernibility Matrix and mRMR[J].Computer Science,2020,47(1):87-95.
Authors:FAN Xin  CHEN Hong-mei
Affiliation:(School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China;Key Laboratory of Cloud Computing and Intelligent Technology,Southwest Jiaotong University,Chengdu 611756,China)
Abstract:Classification is a common problem in modern industrial production.The classification efficiency and classification accuracy can be improved effectively by using feature selection to filter useful information before classifying tasks.Considering the maximum correlation between features and class and the minimum redundancy among features,the minimal-redundancy-maximal-relevance(mRMR)algorithm can effectively select feature subset.However,two problems exist in the algorithm mRMR,i.e.,the relatively large deviation of the importance of the features in the middle and later stages of mRMR,and the feature subset is not given directly.A novel algorithm based on the principle of mRMR and discernibility matrix of neighborhood rough set was proposed,to solve these problems.The significance of the feature is defined by employing neighborhood entropy and neighborhood mutual entropy based on the principle of the minimal redundancy and maximal relevance,which can deal the mixed type date better.Dynamic discernibility set is defined based on the discernibility matrix.The dynamic evolution of the discernibility set is utilized as the policy to delete redundant features and narrows search range.The optimized feature subset is given when the iteration is stop by the stop condition given by discernibility matrix.In this paper,SVM,J48,KNN and MLP were selected as classi-fiers to evaluate the performance of the feature selection algorithm.The experimental results on the public datasets show that the average classification accuracy of the proposed algorithm is about 2%more than that of previous algorithm,and the proposed algorithm can effectively shorten the feature selection time on the data set with more features.Therefore,the proposed algorithm inherits the advantages of discernibility matrix and MRMR,and can effectively deal with feature selection problems.
Keywords:Feature selection  Neighborhood rough set  Discernibility matrix  mRMR
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号