融合PLS监督特征提取和虚假最近邻点的数据分类特征选择 Feature selection for data classification based on pls supervised feature extraction and false nearest neighbors期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

融合PLS监督特征提取和虚假最近邻点的数据分类特征选择

引用本文：	颜克胜,李太福,魏正元,苏盈盈,姚立忠. 融合PLS监督特征提取和虚假最近邻点的数据分类特征选择[J]. 计算机与应用化学, 2012, 0(7): 817-821

作者姓名：	颜克胜李太福魏正元苏盈盈姚立忠

作者单位：	重庆理工大学数学与统计学院;重庆科技学院;西安石油大学电子工程学院

基金项目：	国家自然科学基金(61174015、51075418);重庆市自然科学基金(CSTC2010BB2285);重庆市教委科技项目(KJ111417)

摘要：	在高维数据分类中,针对多重共线性、冗余特征及噪声易导致分类器识别精度低和时空开销大的问题,提出融合偏最小二乘(Partial Least Squares,PLS)有监督特征提取和虚假最近邻点(False Nearest Neighbors,FNN)的特征选择方法:首先利用偏最小二乘对高维数据提取主元,消除特征之间的多重共线性,得到携带监督信息的独立主元空间;然后通过计算各特征选择前后在此空间的相关性,建立基于虚假最近邻点的特征相似性测度,得到原始特征对类别变量解释能力强弱排序;最后,依次剔除解释能力弱的特征,构造出各种分类模型,并以支持向量机(Support Vector Machine,SVM)分类识别率为模型评估准则,搜索出识别率最高但含特征数最少的分类模型,此模型所含的特征即为最佳特征子集。3个数据集模型仿真结果:均表明,由此法选择出的最佳特征子集与各数据集的本质分类特征吻合,说明该方法:有良好的特征选择能力,为数据分类特征选择提供了一条新途径。
关键词：	偏最小二乘虚假最近邻点相似性测度特征选择
Feature selection for data classification based on pls supervised feature extraction and false nearest neighbors

Yan Kesheng,Li Taifu,Wei Zhengyuan,Su Yingying,and Yao Lizhong. Feature selection for data classification based on pls supervised feature extraction and false nearest neighbors[J]. Computers and Applied Chemistry, 2012, 0(7): 817-821

Authors:	Yan Kesheng Li Taifu Wei Zhengyuan Su Yingying and Yao Lizhong

Affiliation:	1.School of Mathematics and Statistics,Chongqing University of Technology,Chongqing,400054,China) (2.School of Electric and Information Engineering,Chongqing University of Science and Technology,Chongqing,401331,China) (3.School of Electronic Engineering,Xi’an Shiyou University,Xi’an,710065,Shanxi,China)

Abstract:	The classifier is often led to the problem of low recognition accuracy and time and space overhead,due to the multicoUinearity and redundant features and noise in the classification of high dimensional data.A feature selection method based on partial least squares(PLS) and false nearest neighbors(FNN) is proposed.Firstly,the partial least squares method is employed to extract the principal components of high-dimensional data and overcome difficulties encountered with the existing multicoUinearity between the original features,and the independent principal components space which carries supervision information could be obtained.Then,the similarity measure based on FNN would be established by calculating the correlation in this space before and after each feature selection,furthermore,gets the original features ranking of interpretation to the dependent variable.Finally,the features which have weak explanatory ability could be removed in turn to construct various classification models,and uses recognition rate of Support Vector Machine(SVM) as a evaluation criterion of models to search out the classification model which not only has the highest recognition rate,but also contains the least number of features,the best feature subset is the just model.A series of experiments from different data models have been conducted.The simulation results show that this method has a good capability to select the best feature subset which is consistent with the nature of classification feature for the data set. Therefore,the research provides a new approach to the feature selection of data classification.

Keywords:	partial least squares false nearest neighbors similarity measure feature selection
本文献已被 CNKI 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏