首页 | 本学科首页   官方微博 | 高级检索  
     

集成特征选择的最优化支持向量机分类器模型研究
引用本文:赵宇,陈锐,刘蔚. 集成特征选择的最优化支持向量机分类器模型研究[J]. 计算机科学, 2016, 43(8): 177-182, 215
作者姓名:赵宇  陈锐  刘蔚
作者单位:中国科学院科技政策与管理科学研究所 北京100190,中国科学院科技政策与管理科学研究所 北京100190,中国科学院科技政策与管理科学研究所 北京100190;中国科学院大学 北京100049
基金项目:本文受2013质检公益性行业科研专项课题(201310118),2015国家质检公益性行业科研专项课题(201510041),中科院重大任务专项课题(Y201161Z04)资助
摘    要:考虑将特征选择集成到支持向量机分类器中,提出集成特征选择的最优化支持向量机分类器——FS-SDP-SVM(Feature Selection in Semi-definite Program for Support Vector Machine)。该模型将每个特征分别在核空间中做特征映射,然后通过参数组合构成新的核矩阵,将特征选择过程与机器分类过程统一在一个优化目标下,同时达到特征选择与分类最优。在特征筛选方面,根据模型参数提出用于特征筛选的特征支持度和特征贡献度,通过控制二者的上下限可以在最优分类和最少特征之间灵活取舍。实证中分别将最优分类(FS-SDP-SVM1)和最少特征(FS-SDP-SVM2)两类集成化特征选择算法与Relief-F、SFS、SBS算法在UCI机器学习数据和人造数据中进行对比实验。结果表明,提出的FS-SDP-SVM算法在保持较好泛化能力的基础上,在多数实验数据集中实现了最大分类准确率或最少特征数量;在人工数据中,该方法可以准确地选出真正的特征,去除噪声特征。

关 键 词:特征选择  集成化方法  支持向量机分类器  特征核子空间  半正定规划
收稿时间:2015-07-21
修稿时间:2015-10-16

Research on Optimal Support Vector Classifier Model Integrating Feature Selection
ZHAO Yu,CHEN Rui and LIU Wei. Research on Optimal Support Vector Classifier Model Integrating Feature Selection[J]. Computer Science, 2016, 43(8): 177-182, 215
Authors:ZHAO Yu  CHEN Rui  LIU Wei
Affiliation:Institute of Policy and Management,Chinese Academy of Sciences,Beijing 100190,China,Institute of Policy and Management,Chinese Academy of Sciences,Beijing 100190,China and Institute of Policy and Management,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China
Abstract:Considering taking the feature selection process into the support vector machine classifier,a new model called feature selection in semi-definite program for support vector machine(FS-SDP-SVM) was proposed in this paper for integrating the target of feature selection and machine classifier.The key to this model is to split the kernel space into several subspace by each feature.With the linear combination of these subspaces,the new kernel matrix was constructed and optimized with the support vector classifier by semi-definite programing.Two parameters for the feature choosing are announced,namely feature supporter and feature contributor,which can be flexibly adjusted for the need of maximizing accurate rate (FS-SDP-SVM1) or minimizing feature quantity (FS-SDP-SVM2).The empirical study analyzed the difference between two model types and other feature selection algorithms Relief-F,SFS and SBS on the UCI machine learning data and man-made data.Results show that FS-SDP-SVM can achieve maximum accurate rate or minimum feature quantity in majority of UCI data in consistent with the good ability of generalization.This method precisely gets rid of the noise data and preserves the real features in man-made data test.
Keywords:Feature selection  Ensemble method  Support vector classifier  Sub-kernel space  Semi-definite programming
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号