首页 | 本学科首页   官方微博 | 高级检索  
     

S2R2:基于相关性与冗余性分析的半监督特征选择
引用本文:张东方,陈海燕,袁立罡.S2R2:基于相关性与冗余性分析的半监督特征选择[J].计算机与现代化,2021,0(9):113-120.
作者姓名:张东方  陈海燕  袁立罡
作者单位:南京航空航天大学计算机科学与技术学院,江苏 南京 211106;南京航空航天大学计算机科学与技术学院,江苏 南京 211106;软件新技术与产业化协同创新中心,江苏 南京 210093;南京航空航天大学民航学院,江苏 南京 211106
基金项目:国家自然科学基金资助项目(61501229); 中央高校基本科研业务费专项资金资助项目(NS2019054,NS2020045)
摘    要:特征选择是模式识别与数据挖掘的关键问题之一,它可以移除数据集中的冗余和不相关特征以提升学习性能。基于最大相关最小冗余准则,提出一种新的基于相关性与冗余性分析的半监督特征选择方法(S2R2),S2R2方法独立于任何分类学习算法。该方法首先对无监督相关度信息度量进行分析与扩充,然后结合信息增益,设计一种半监督特征相关性与冗余性度量,可以有效识别与移除不相关和冗余特征,最后采用增量搜索技术贪婪地构建特征子集,避免搜索指数级大小的解空间,提高算法的运行效率。本文还提出S2R2方法的快速过滤版本,FS2R2,以更好地应对大规模特征选择问题。多个标准数据集上的实验结果表明了所提方法的有效性和优越性。

关 键 词:半监督学习  特征选择  信息论  最大相关最小冗余  
收稿时间:2021-09-14

S2 R2:Semi-supervised Feature Selection Based on Analysis of Relevance and Redundancy
ZHANG Dong-fang,CHEN Hai-yan,YUAN Li-gang.S2 R2:Semi-supervised Feature Selection Based on Analysis of Relevance and Redundancy[J].Computer and Modernization,2021,0(9):113-120.
Authors:ZHANG Dong-fang  CHEN Hai-yan  YUAN Li-gang
Abstract:Feature selection is one of the key problems of pattern recognition and data mining, which can be removed dataset redundant and irrelevant features to improve learning performance. Based on the max-relevance and min-redundancy criteria, a novel semi-supervised feature selection method based on relevance and redundancy analysis is proposed. This new method is independent of any classification learning algorithm. Firstly, unsupervised relevance is analyzed and expanded. Then it is combined with information gain to form a semi-supervised feature relevance and redundancy measures, which can effectively identify and remove irrelevant and redundant features. Finally, an incremental forward search is used to construct feature subset in a greedy manner, which avoiding the search for exponential solution spaces and improving algorithm efficiency. This article also proposes the FS2R2 method as a fast version of the S2R2 method to deal with large-scale problems. The experimental results on standard data sets illustrate the effectiveness and superiority of  the proposed approaches.
Keywords:semi-supervised learning  feature selection  information theory  max-relevance and min-redundancy  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号