首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A new improved forward floating selection (IFFS) algorithm for selecting a subset of features is presented. Our proposed algorithm improves the state-of-the-art sequential forward floating selection algorithm. The improvement is to add an additional search step called “replacing the weak feature” to check whether removing any feature in the currently selected feature subset and adding a new one at each sequential step can improve the current feature subset. Our method provides the optimal or quasi-optimal (close to optimal) solutions for many selected subsets and requires significantly less computational load than optimal feature selection algorithms. Our experimental results for four different databases demonstrate that our algorithm consistently selects better subsets than other suboptimal feature selection algorithms do, especially when the original number of features of the database is large.  相似文献   

2.
Pattern recognition generally requires that objects be described in terms of a set of measurable features. The selection and quality of the features representing each pattern affect the success of subsequent classification. Feature extraction is the process of deriving new features from original features to reduce the cost of feature measurement, increase classifier efficiency, and allow higher accuracy. Many feature extraction techniques involve linear transformations of the original pattern vectors to new vectors of lower dimensionality. While this is useful for data visualization and classification efficiency, it does not necessarily reduce the number of features to be measured since each new feature may be a linear combination of all of the features in the original pattern vector. Here, we present a new approach to feature extraction in which feature selection and extraction and classifier training are performed simultaneously using a genetic algorithm. The genetic algorithm optimizes a feature weight vector used to scale the individual features in the original pattern vectors. A masking vector is also employed for simultaneous selection of a feature subset. We employ this technique in combination with the k nearest neighbor classification rule, and compare the results with classical feature selection and extraction techniques, including sequential floating forward feature selection, and linear discriminant analysis. We also present results for the identification of favorable water-binding sites on protein surfaces  相似文献   

3.
针对传统的谱特征选择算法只考虑单特征的重要性,将特征之间的统计相关性引入到传统谱分析中,构造了基于特征相关的谱特征选择模型。首先利用Laplacian Score找出最核心的一个特征作为已选特征,然后设计了新的特征组区分能力目标函数,采用前向贪心搜索策略依次评价候选特征,并选中使目标函数最小的候选特征加入到已选特征。该算法不仅考虑了特征重要性,而且充分考虑了特征之间的关联性,最后在2个不同分类器和8个UCI数据集上的实验结果表明:该算法不仅提高了特征子集的分类性能,而且获得较高的分类精度下所需特征子集的数量较少。  相似文献   

4.
一种基于关联性的特征选择算法   总被引:1,自引:0,他引:1  
目前在文本分类领域较常用到的特征选择算法中,仅仅考虑了特征与类别之间的关联性,而对特征与特征之间的关联性没有予以足够的重视.提出一种新的基于关联分析的特征选择算法,该方法以信息论量度为基本工具,综合考虑了计算代价以及特征评估的客观性等问题.算法在保留类别相关特征的同时识别并摒弃了冗余特征,取得了较好的约简效果.  相似文献   

5.
针对标签排序问题的特点,提出一种面向标签排序数据集的特征选择算法(Label Ranking Based Feature Selection, LRFS)。该算法首先基于邻域粗糙集定义了新的邻域信息测度,能直接度量连续型、离散型以及排序型特征间的相关性、冗余性和关联性。然后,在此基础上提出基于邻域关联权重因子的标签排序特征选择算法。实验结果表明,LRFS算法能够在不降低排序准确率的前提下,有效剔除标签排序数据集中的无关特征或冗余特征。  相似文献   

6.
基于类信息的文本聚类中特征选择算法   总被引:2,自引:0,他引:2  
文本聚类属于无监督的学习方法,由于缺乏类信息还很难直接应用有监督的特征选择方法,因此提出了一种基于类信息的特征选择算法,此算法在密度聚类算法的聚类结果上使用信息增益特征选择法重新选择最有分类能力的特征,实验验证了算法的可行性和有效性。  相似文献   

7.
In classification, feature selection is an important data pre-processing technique, but it is a difficult problem due mainly to the large search space. Particle swarm optimisation (PSO) is an efficient evolutionary computation technique. However, the traditional personal best and global best updating mechanism in PSO limits its performance for feature selection and the potential of PSO for feature selection has not been fully investigated. This paper proposes three new initialisation strategies and three new personal best and global best updating mechanisms in PSO to develop novel feature selection approaches with the goals of maximising the classification performance, minimising the number of features and reducing the computational time. The proposed initialisation strategies and updating mechanisms are compared with the traditional initialisation and the traditional updating mechanism. Meanwhile, the most promising initialisation strategy and updating mechanism are combined to form a new approach (PSO(4-2)) to address feature selection problems and it is compared with two traditional feature selection methods and two PSO based methods. Experiments on twenty benchmark datasets show that PSO with the new initialisation strategies and/or the new updating mechanisms can automatically evolve a feature subset with a smaller number of features and higher classification performance than using all features. PSO(4-2) outperforms the two traditional methods and two PSO based algorithm in terms of the computational time, the number of features and the classification performance. The superior performance of this algorithm is due mainly to both the proposed initialisation strategy, which aims to take the advantages of both the forward selection and backward selection to decrease the number of features and the computational time, and the new updating mechanism, which can overcome the limitations of traditional updating mechanisms by taking the number of features into account, which reduces the number of features and the computational time.  相似文献   

8.
目前的文本特征生成算法一般采用加权的文本向量空间模型,该模型使用TF-IDF评价函数来计算单个特征的权值,这种算法生成的文本特征冗余度往往都比较高。针对这一问题,采用了一种基于聚类加权的文本特征生成算法,首先对特征候选集进行初始加权处理;然后通过语义和信息熵对特征进行进一步加权处理;最后使用特征聚类对冗余特征进行剔除。实验表明该算法比传统的TF-IDF算法的平均分类准确率高出5%左右。  相似文献   

9.
This paper proposes a new method to weight subspaces in feature groups and individual features for clustering high-dimensional data. In this method, the features of high-dimensional data are divided into feature groups, based on their natural characteristics. Two types of weights are introduced to the clustering process to simultaneously identify the importance of feature groups and individual features in each cluster. A new optimization model is given to define the optimization process and a new clustering algorithm FG-k-means is proposed to optimize the optimization model. The new algorithm is an extension to k-means by adding two additional steps to automatically calculate the two types of subspace weights. A new data generation method is presented to generate high-dimensional data with clusters in subspaces of both feature groups and individual features. Experimental results on synthetic and real-life data have shown that the FG-k-means algorithm significantly outperformed four k-means type algorithms, i.e., k-means, W-k-means, LAC and EWKM in almost all experiments. The new algorithm is robust to noise and missing values which commonly exist in high-dimensional data.  相似文献   

10.
基于分类间隔的特征选择算法   总被引:3,自引:0,他引:3  
对于二类目标特征选择问题,首先讨论了特征空间的线性可分性问题,并给出了其判别条件;其次,通过借鉴支撑矢量机原理,分析了特征可分性判据的基本性质;最后,依据各特征对分类间隔的贡献大小定义了特征有效率,并以此进行特征选择和特征空间降维.实测数据与网络公开UCI(University of california,Irvine)数据库的实验结果表明,与经典的Relief特征选择算法相比,该算法在识别性能和推广能力上明显有所提高.  相似文献   

11.
提出了一种新的基于组合特征和PSO-BP(particle swarm optimization-backpropagation)算法的数字识别方法,将网格特征、投影特征和欧拉数表示的结构特征按照不同的特征权重系数构成数字图像的组合特征向量,利用PSO-BP神经网络进行识别,充分发挥了粒子群算法的全局寻优能力和BP算法的局部搜索优势.实验表明,该方法识别率高、网络收敛速度快、精度高.  相似文献   

12.
A new algorithm for ranking the input features and obtaining the best feature subset is developed and illustrated in this paper. The asymptotic formula for mutual information and the expectation maximisation (EM) algorithm are used to developing the feature selection algorithm in this paper. We not only consider the dependence between the features and the class, but also measure the dependence among the features. Even for noisy data, this algorithm still works well. An empirical study is carried out in order to compare the proposed algorithm with the current existing algorithms. The proposed algorithm is illustrated by application to a variety of problems.  相似文献   

13.
提出了特征及特征空间的类别鉴别能力的概念,并对PCA、ICA、LDA提取的特征空间的鉴别能力作了评价;然后采用ICA提取的特征作为初始解集,以特征空间的鉴别能力作为评价准则,用遗传算法来进行人脸特征选择;为了避免“过训练”现象,对遗传算法进行了改进。采用该特征提取方法,可以得到既高阶独立或近似独立又可以达到类内差异最小、类间差异最大的特征子空间。实验结果表明,该特征提取方法可以取得良好效果。  相似文献   

14.
基于多特征融合和Boosting RBF神经网络的人脸识别   总被引:2,自引:0,他引:2  
提出一种多特征信息融合的人脸识别方法.应用Zernike矩方法和非负矩阵分解法(NMF)分别提取具有旋转不变性的人脸几何特征和人脸子空间投影系数特征,将这两种具有一定互补性的特征串行融合,得到一个分类能力更强的特征.在此基础上,采用RBF神经网络进行人脸识别.为了提高神经网络的分类准确率和泛化能力,采用Boosting方法进行网络集成.实验结果表明,提出的算法利用较少样本数据即可快速地进行人脸识别.  相似文献   

15.
This paper proposes three feature selection algorithms with feature weight scheme and dynamic dimension reduction for the text document clustering problem. Text document clustering is a new trend in text mining; in this process, text documents are separated into several coherent clusters according to carefully selected informative features by using proper evaluation function, which usually depends on term frequency. Informative features in each document are selected using feature selection methods. Genetic algorithm (GA), harmony search (HS) algorithm, and particle swarm optimization (PSO) algorithm are the most successful feature selection methods established using a novel weighting scheme, namely, length feature weight (LFW), which depends on term frequency and appearance of features in other documents. A new dynamic dimension reduction (DDR) method is also provided to reduce the number of features used in clustering and thus improve the performance of the algorithms. Finally, k-mean, which is a popular clustering method, is used to cluster the set of text documents based on the terms (or features) obtained by dynamic reduction. Seven text mining benchmark text datasets of different sizes and complexities are evaluated. Analysis with k-mean shows that particle swarm optimization with length feature weight and dynamic reduction produces the optimal outcomes for almost all datasets tested. This paper provides new alternatives for text mining community to cluster text documents by using cohesive and informative features.  相似文献   

16.
人脸特征自适应选取技术   总被引:3,自引:1,他引:3  
如何选取并提取稳定可靠的人脸特征是人脸识别技术中迫切需要解决的问题,文中在对现有特征提取方法的优缺点进行详细分析的基础上,提出人脸特征自适应选取算法框架,详细论述了如何建立并提取人脸特征自适应选取的准则,该算法在100多幅人脸图像实验中效果理想。  相似文献   

17.
对候选特征进行降维在机器学习领域,如分类、聚类问题中占有很重要的地位.现有的方法大多数是基于单一特征对目标T的依赖性或特征与特征之间对Y影响的关联性,互补性和冗余性进行特征选择.然而这些方法几乎都没有考虑到组合特征,如属性A,B仅包含Y中的极少量信息,甚至与Y完全独立,但A&B能提供关于Y的大量信息,甚至完全决定Y.基于此,提出了一种能够从特征集合中挖掘到组合特征与单一特征的特征选择算法,首先对不显著特征进行组合并按照条件概率分布表生成新的候选特征;然后,对单一特征和组合特征利用基于最大相关性和最小冗余度的准则进行选择.最后分别在虚拟和真实数据集上进行实验,实验结果表明该特征选择算法能够较好的挖掘数据集的组合特征信息,一定程度上提高了相应的机器学习算法的准确率.  相似文献   

18.
Algorithms for feature selection in predictive data mining for classification problems attempt to select those features that are relevant, and are not redundant for the classification task. A relevant feature is defined as one which is highly correlated with the target function. One problem with the definition of feature relevance is that there is no universally accepted definition of what it means for a feature to be ‘highly correlated with the target function or highly correlated with the other features’. A new feature selection algorithm which incorporates domain specific definitions of high, medium and low correlations is proposed in this paper. The proposed algorithm conducts a heuristic search for the most relevant features for the prediction task.  相似文献   

19.
一种基于PCA和ReliefF的特征选择方法   总被引:4,自引:0,他引:4       下载免费PDF全文
如何减少样本的训练测试时间、提高分类精度是有效特征选择方法研究的重要方面。提出了一种结合PCA和ReliefF的特征选择算法。该算法选择出了最具有代表性的特征,构成有效特征子集,实现了特征降维。同时,较PCA-GA方法,该算法具有简单、快速等优点。利用标准数据集进行的实验结果表明,文中算法是可行的、有效的,为模式识别的信息特征压缩提供了一种新的研究方法。  相似文献   

20.
一种鲁棒的多特征融合目标跟踪新算法   总被引:3,自引:0,他引:3       下载免费PDF全文
仅利用单一的目标特征进行跟踪是大多数跟踪算法鲁棒性不高的重要原因。提出了一种新的多特征融合目标跟踪算法,该算法将目标的颜色、纹理、边缘、运动特征统一使用直方图模型进行描述,以降低算法受目标形变和部分遮挡的影响,在Auxiliary粒子滤波框架内将所有特征观测进行概率融合,以突出状态后验分布中目标真实状态对应的峰值,从而有效避免了复杂背景的干扰,并给出了一种有效的融合系数计算方法,使融合结果更加准确可靠。实验结果表明,该算法能同时处理刚性与非刚性目标的跟踪,较单一特征的跟踪算法具有明显的优势,对复杂背景下的跟踪具有较高的鲁棒性。与现有多特征融合算法的比较也证明了本文算法的有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号