首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Pixel-based texture classifiers and segmenters are typically based on the combination of texture feature extraction methods that belong to a single family (e.g., Gabor filters). However, combining texture methods from different families has proven to produce better classification results both quantitatively and qualitatively. Given a set of multiple texture feature extraction methods from different families, this paper presents a new texture feature selection scheme that automatically determines a reduced subset of methods whose integration produces classification results comparable to those obtained when all the available methods are integrated, but with a significantly lower computational cost. Experiments with both Brodatz and real outdoor images show that the proposed selection scheme is more advantageous than well-known general purpose feature selection algorithms applied to the same problem.  相似文献   

2.
He  Jinrong  Bi  Yingzhou  Ding  Lixin  Li  Zhaokui  Wang  Shenwen 《Neural computing & applications》2017,28(10):3047-3059

In applications of algorithms, feature selection has got much attention of researchers, due to its ability to overcome the curse of dimensionality, reduce computational costs, increase the performance of the subsequent classification algorithm and output the results with better interpretability. To remove the redundant and noisy features from original feature set, we define local density and discriminant distance for each feature vector, wherein local density is used for measuring the representative ability of each feature vector, and discriminant distance is used for measuring the redundancy and similarity between features. Based on the above two quantities, the decision graph score is proposed as the evaluation criterion of unsupervised feature selection. The method is intuitive and simple, and its performances are evaluated in the data classification experiments. From statistical tests on the averaged classification accuracies over 16 real-life dataset, it is observed that the proposed method obtains better or comparable ability of discriminant feature selection in 98% of the cases, compared with the state-of-the-art methods.

  相似文献   

3.
The evaluation of feature selection methods for text classification with small sample datasets must consider classification performance, stability, and efficiency. It is, thus, a multiple criteria decision-making (MCDM) problem. Yet there has been few research in feature selection evaluation using MCDM methods which considering multiple criteria. Therefore, we use MCDM-based methods for evaluating feature selection methods for text classification with small sample datasets. An experimental study is designed to compare five MCDM methods to validate the proposed approach with 10 feature selection methods, nine evaluation measures for binary classification, seven evaluation measures for multi-class classification, and three classifiers with 10 small datasets. Based on the ranked results of the five MCDM methods, we make recommendations concerning feature selection methods. The results demonstrate the effectiveness of the used MCDM-based method in evaluating feature selection methods.  相似文献   

4.
针对监督分类中的特征选择问题, 提出一种基于量子进化算法的包装式特征选择方法. 首先分析了现有子集评价方法存在过度偏好分类精度的缺点, 进而提出基于固定阈值和统计检验的两种子集评价方法. 然后改进了量子进化算法的进化策略, 即将整个进化过程分为两个阶段, 分别选用个体极值和全局极值作为种群的进化目标. 在此基础上, 按照包装式特征选择遵循的一般框架设计了特征选择算法. 最后, 通过15个UCI数据集分别验证了子集评价方法和进化策略的有效性, 以及新方法相较于其它6种特征选择方法的优越性. 结果表明, 新方法在80%以上的数据集上取得相似甚至更好的分类精度, 在86.67%的数据集上选择了特征个数更小的子集.  相似文献   

5.
武妍  杨洋 《计算机应用》2006,26(2):433-0435
为了获得重要的特征集合,提出了一种基于判别式分析算法和神经网络的特征选择方法。通过最小化扩展互熵误差函数来训练神经网络,这一误差函数的使用减小了神经网络传输函数的导数,降低了输出敏感度。该方法首先利用判别式分析算法得到一个有序的特征队列,然后通过正则化神经网络进行特征的选择,特征选择过程是基于单个特征的移除带来验证数据集上分类误差变化这一原理。与其他基于不同原理的四种方法进行了比较,实验结果表明,利用该算法训练的网络能够获得较高分类准确率。  相似文献   

6.
Feature selection is crucial, particularly for processing high‐dimensional data. Existing selection methods generally compute a discriminant value for a feature with respect to class variable to indicate its classification ability. However, a scalar value can hardly reveal the multifaceted classification abilities of a feature for different subproblems of a complicated multiclass problem. In view of this, we propose to select features based on discrimination structure complementarity. To this end, the classification abilities of a feature for different subproblems are evaluated individually. Consequently, a discrimination structure vector can be obtained to indicate if the feature is discriminative respectively for different subproblems. Based on discrimination structure, indispensable and dispensable features (ID‐features for short) are defined. In selection process, the ID‐features, which are complementary in discrimination structure to the selected ones, are selected. The proposed method tries to equally treat all subproblems and hence can avoid falling into the pitfall that the discriminative features for difficult subproblems are prone to be covered by the features for easy ones in multi‐class classification. Two algorithms are developed and compared with several feature selection methods using some open data sets. Experimental results demonstrate the effectiveness of the proposed method.  相似文献   

7.
Feature selection has always been a critical step in pattern recognition, in which evolutionary algorithms, such as the genetic algorithm (GA), are most commonly used. However, the individual encoding scheme used in various GAs would either pose a bias on the solution or require a pre-specified number of features, and hence may lead to less accurate results. In this paper, a tribe competition-based genetic algorithm (TCbGA) is proposed for feature selection in pattern classification. The population of individuals is divided into multiple tribes, and the initialization and evolutionary operations are modified to ensure that the number of selected features in each tribe follows a Gaussian distribution. Thus each tribe focuses on exploring a specific part of the solution space. Meanwhile, tribe competition is introduced to the evolution process, which allows the winning tribes, which produce better individuals, to enlarge their sizes, i.e. having more individuals to search their parts of the solution space. This algorithm, therefore, avoids the bias on solutions and requirement of a pre-specified number of features. We have evaluated our algorithm against several state-of-the-art feature selection approaches on 20 benchmark datasets. Our results suggest that the proposed TCbGA algorithm can identify the optimal feature subset more effectively and produce more accurate pattern classification.  相似文献   

8.
一种新的快速特征选择和数据分类方法   总被引:1,自引:0,他引:1  
针对数据分类问题提出一种新型高效的特征选择和规则提取方法.首先通过减少初始区间数量改进Chi-Merge离散化方法,再采用改进的Chi-Merge离散化连续型特征变量;特征离散化后,统计样本数据在每个特征子集划分下的频数表,并根据频数表计算数据不一致率,再利用顺序前向最优搜索的方法,快速确定特征数量由小到大的每一个最优特征子集;根据特征子集对应的数据不一致率差异最小化原则,完成特征个数最小化的最优特征子集筛选;根据最优特征子集的数据频数表,可直接提取数据分类规则.实验表明,快速提取的规则可获得较好的分类效果.基于该特征选择方法,提出一种面向分布式同构数据的快速分类模型,不但具有良好的分类效果,还支持对样本数据内容的隐私保护.  相似文献   

9.
针对文本分类中传统特征选择方法卡方统计量和信息增益的不足进行了分析,得出文本分类中的特征选择关键在于选择出集中分布于某类文档并在该类文档中均匀分布且频繁出现的特征词。因此,综合考虑特征词的文档频、词频以及特征词的类间集中度、类内分散度,提出一种基于类内类间文档频和词频统计的特征选择评估函数,并利用该特征选择评估函数在训练集每个类别中选取一定比例的特征词组成该类别的特征词库,而训练集的特征词库则为各类别特征词库的并集。通过基于SVM的中文文本分类实验表明,该方法与传统的卡方统计量和信息增益相比,在一定程度上提高了文本分类的效果。  相似文献   

10.
针对传统的拉普拉斯评分特征选择算法只适应单标记学习,无法直接应用于多标记学习的问题,提出一种应用于多标记任务的拉普拉斯评分特征选择算法。首先,考虑样本在整体标记空间中共同关联和共同不关联的相关性,重新构建样本相似度矩阵;然后,将特征之间的相关性及冗余性判定引入拉普拉斯评分算法中,采用前向贪心搜索策略依次评价候选特征与已选特征的联合作用能力,用于评价特征的重要性;最后,在5个不同评价指标和6个多标记数据集上实验。实验结果表明:相比基于最大依赖的多标记维数约简方法(MDDM)、基于贝叶斯分类器的多标记特征选择算法(MLNB)及基于多元互信息的多标记分类特征选择算法(PMU),所提算法不仅分类性能最优,且存在显著性优异达65%。  相似文献   

11.
Abstract: The success of automatic classification is intricately linked with an effective feature selection. Previous studies on the use of genetic programming (GP) to solve classification problems have highlighted its benefits, principally its inherent feature selection (a process that is often performed independent of a learning method). In this paper, the problem of classification is recast as a feature generation problem, where GP is used to evolve programs that allow non‐linear combination of features to create superFeatures, from which classification tasks can be achieved fairly easily. In order to generate superFeatures robustly, the binary string fitness characterization along with the comparative partner selection strategy is introduced with the aim of promoting optimal convergence. The techniques introduced are applied to two illustrative problems first and then to the real‐world problem of audio source classification, with competitive results.  相似文献   

12.
Detailed land use/land cover classification at ecotope level is important for environmental evaluation. In this study, we investigate the possibility of using airborne hyperspectral imagery for the classification of ecotopes. In particular, we assess two tree-based ensemble classification algorithms: Adaboost and Random Forest, based on standard classification accuracy, training time and classification stability. Our results show that Adaboost and Random Forest attain almost the same overall accuracy (close to 70%) with less than 1% difference, and both outperform a neural network classifier (63.7%). Random Forest, however, is faster in training and more stable. Both ensemble classifiers are considered effective in dealing with hyperspectral data. Furthermore, two feature selection methods, the out-of-bag strategy and a wrapper approach feature subset selection using the best-first search method are applied. A majority of bands chosen by both methods concentrate between 1.4 and 1.8 μm at the early shortwave infrared region. Our band subset analyses also include the 22 optimal bands between 0.4 and 2.5 μm suggested in Thenkabail et al. [Thenkabail, P.S., Enclona, E.A., Ashton, M.S., and Van Der Meer, B. (2004). Accuracy assessments of hyperspectral waveband performance for vegetation analysis applications. Remote Sensing of Environment, 91, 354-376.] due to similarity of the target classes. All of the three band subsets considered in this study work well with both classifiers as in most cases the overall accuracy dropped only by less than 1%. A subset of 53 bands is created by combining all feature subsets and comparing to using the entire set the overall accuracy is the same with Adaboost, and with Random Forest, a 0.2% improvement. The strategy to use a basket of band selection methods works better. Ecotopes belonging to the tree classes are in general classified better than the grass classes. Small adaptations of the classification scheme are recommended to improve the applicability of remote sensing method for detailed ecotope mapping.  相似文献   

13.
In this paper, we tackle the problem of model selection when misclassification costs are unknown and/or may evolve. Unlike traditional approaches based on a scalar optimization, we propose a generic multi-model selection framework based on a multi-objective approach. The idea is to automatically train a pool of classifiers instead of one single classifier, each classifier in the pool optimizing a particular trade-off between the objectives. Within the context of two-class classification problems, we introduce the “ROC front concept” as an alternative to the ROC curve representation. This strategy is applied to the multi-model selection of SVM classifiers using an evolutionary multi-objective optimization algorithm. The comparison with a traditional scalar optimization technique based on an AUC criterion shows promising results on UCI datasets as well as on a real-world classification problem.  相似文献   

14.
This paper proposed two psychophysiological-data-driven classification frameworks for operator functional states (OFS) assessment in safety-critical human-machine systems with stable generalization ability. The recursive feature elimination (RFE) and least square support vector machine (LSSVM) are combined and used for binary and multiclass feature selection. Besides typical binary LSSVM classifiers for two-class OFS assessment, two multiclass classifiers based on multiclass LSSVM-RFE and decision directed acyclic graph (DDAG) scheme are developed, one used for recognizing the high mental workload and fatigued state while the other for differentiating overloaded and base-line states from the normal states. Feature selection results have revealed that different dimensions of OFS can be characterized by specific set of psychophysiological features. Performance comparison studies show that reasonable high and stable classification accuracy of both classification frameworks can be achieved if the RFE procedure is properly implemented and utilized.  相似文献   

15.
Gender recognition has been playing a very important role in various applications such as human–computer interaction, surveillance, and security. Nonlinear support vector machines (SVMs) were investigated for the identification of gender using the Face Recognition Technology (FERET) image face database. It was shown that SVM classifiers outperform the traditional pattern classifiers (linear, quadratic, Fisher linear discriminant, and nearest neighbour). In this context, this paper aims to improve the SVM classification accuracy in the gender classification system and propose new models for a better performance. We have evaluated different SVM learning algorithms; the SVM‐radial basis function with a 5% outlier fraction outperformed other SVM classifiers. We have examined the effectiveness of different feature selection methods. AdaBoost performs better than the other feature selection methods in selecting the most discriminating features. We have proposed two classification methods that focus on training subsets of images among the training images. Method 1 combines the outcome of different classifiers based on different image subsets, whereas method 2 is based on clustering the training data and building a classifier for each cluster. Experimental results showed that both methods have increased the classification accuracy.  相似文献   

16.
This study presents a new intelligent diagnosis system for classification of different machine conditions using data obtained from infrared thermography. In the first stage of this proposed system, two-dimensional discrete wavelet transform is used to decompose the thermal image. However, the data attained from this stage are ordinarily high dimensionality which leads to the reduction of performance. To surmount this problem, feature selection tool based on Mahalanobis distance and relief algorithm is employed in the second stage to select the salient features which can characterize the machine conditions for enhancing the classification accuracy. The data received from the second stage are subsequently utilized to intelligent diagnosis system in which support vector machines and linear discriminant analysis methods are used as classifiers. The results of the proposed system are able to assist in diagnosing of different machine conditions.  相似文献   

17.
满意特征选择及其应用   总被引:2,自引:0,他引:2  
实际应用中的特征选择是一个满意优化问题.针对已有特征选择方法较少考虑特征获取代价和特征集维数的自动确定问题,提出一种满意特征选择方法(SFSM),将样本分类性能、特征集维数和特征提取复杂性等多种因素综合考虑.给出特征满意度和特征集满意度定义,设计出满意度函数,导出满意特征集评价准则,详细描述了特征选择算法.雷达辐射源信号特征选择与识别的实验结果显示,SFSM在计算效率和选出特征的质量方面明显优于顺序前进法、新特征选择法和多目标遗传算法.证实了SFSM的有效性和实用性.  相似文献   

18.
在当前的大数据时代,互联网上的博客、论坛产生了海量的主观性评论信息,这些评论信息表达了人们的各种情感色彩和情感倾向性。如果仅仅用人工的方法来对网络上海量的评论信息进行分类和处理实在是太难了,那么,如何高效地挖掘出网络上大量的具有褒贬倾向性观点的信息就成为目前亟待解决的问题,中文文本褒贬倾向性分类技术研究正是解决这一问题的一个方法。文章介绍了常用的文本特征选择算法,分析了文档频率和互信息算法的不足,通过对两个算法的对比和研究,结合文本特征与文本类型的相关度和文本褒贬特征的出现概率,提出了改进的文本特征选择算法(MIDF)。实验结果表明,MIDF算法对文本褒贬倾向性分类是有效的。  相似文献   

19.
特征选择是文本分类中一种重要的文本预处理技术,它能够有效地提高分类器的精度和效率。文本分类中特征选择的关键是寻求有效的特征评价指标。一般来说,同一个特征评价指标对不同的分类器,其效果不同,由此,一个好的特征评价指标应当考虑分类器的特点。由于朴素贝叶斯分类器简单、高效而且对特征选择很敏感,因此,对用于该种分类器的特征选择方法的研究具有重要的意义。有鉴于此,提出了一种有效的用于贝叶斯分类器的多类别文本特征评价指标:CDM。利用贝叶斯分类器在两个多类别的文本数据集上进行了实验。实验结果表明提出的CDM指标具有比其它特征评价指标更好的特征选择效果。  相似文献   

20.
A conventional discriminant problem is to determine a discriminant function, which maps a point in a multi-dimensional feature space to a point in a one-dimensional decision space, using a set of labeled (known classification) samples. In many cases, attribute values of each sample are not constant but fluctuating with time. In this paper, we represent the fluctuating attribute values of each sample by an interval vector in the feature space, and propose a discriminant method for a set of interval vectors. The proposed method is based on a linear interval model which maps an interval vector in the feature space to an interval in the decision space. A mathematical programming problem is formulated to determine the coefficients of this model. We also propose a set of discriminant rules to discriminate unknown samples. The proposed method is applied to a smell sensing problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号