首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper we present a new method for Joint Feature Selection and Classifier Learning using a sparse Bayesian approach. These tasks are performed by optimizing a global loss function that includes a term associated with the empirical loss and another one representing a feature selection and regularization constraint on the parameters. To minimize this function we use a recently proposed technique, the Boosted Lasso algorithm, that follows the regularization path of the empirical risk associated with our loss function. We develop the algorithm for a well known non-parametrical classification method, the relevance vector machine, and perform experiments using a synthetic data set and three databases from the UCI Machine Learning Repository. The results show that our method is able to select the relevant features, increasing in some cases the classification accuracy when feature selection is performed.  相似文献   

2.
This paper presents an online feature selection algorithm using genetic programming (GP). The proposed GP methodology simultaneously selects a good subset of features and constructs a classifier using the selected features. For a c-class problem, it provides a classifier having c trees. In this context, we introduce two new crossover operations to suit the feature selection process. As a byproduct, our algorithm produces a feature ranking scheme. We tested our method on several data sets having dimensions varying from 4 to 7129. We compared the performance of our method with results available in the literature and found that the proposed method produces consistently good results. To demonstrate the robustness of the scheme, we studied its effectiveness on data sets with known (synthetically added) redundant/bad features.  相似文献   

3.
An efficient procedure which integrates feature selection and binary decision tree construction is presented. The nonparametric approach is based on the Kolmogorov-Smirnov criterion which yields an optimal classification decision at each node. By combining the feature selection with the design of the classifier, only the most informative features are retained for classification.  相似文献   

4.
The design of tree classifiers is considered from the statistical point of view. The procedure for calculating the a posteriori probabilities is decomposed into a sequence of steps. In every step the a posteriori probabilities for a certain subtask of the given pattern recognition task are calculated. The resulting tree classifier realizes a soft-decision strategy in contrast to the hard-decision strategy of the conventional decision tree. At the different nonterminal nodes, mean square polynomial classifiers are applied having the property of estimating the desired a posteriori probabilities together with an integrated feature selection capability.  相似文献   

5.
The process of placing a separating hyperplane for data classification is normally disconnected from the process of selecting the features to use. An approach for feature selection that is conceptually simple but computationally explosive is to simply apply the hyperplane placement process to all possible subsets of features, selecting the smallest set of features that provides reasonable classification accuracy. Two ways to speed this process are (i) use a faster filtering criterion instead of a complete hyperplane placement, and (ii) use a greedy forward or backwards sequential selection method. This paper introduces a new filtering criterion that is very fast: maximizing the drop in the sum of infeasibilities in a linear-programming transformation of the problem. It also shows how the linear programming transformation can be applied to reduce the number of features after a separating hyperplane has already been placed while maintaining the separation that was originally induced by the hyperplane. Finally, a new and highly effective integrated method that simultaneously selects features while placing the separating hyperplane is introduced.  相似文献   

6.
The optimum finite set of linear observables for discriminating two Gaussian stochastic processes is derived using classical methods and distribution function theory. The results offer a new, accurate information-theoretic strategy and are superior to well-known conventional methods using statistical distance measures.  相似文献   

7.
Multimedia Tools and Applications - In this paper, a novel technique for image classification is proposed with the three main contributions. First, we give the texture extraction technique for each...  相似文献   

8.
《Pattern recognition letters》1999,20(11-13):1149-1156
Nearest neighbor classifiers demand significant computational resources (time and memory). Editing of the reference set and feature selection are two different approaches to this problem. Here we encode the two approaches within the same genetic algorithm (GA) and simultaneously select features and reference cases. Two data sets were used: the SATIMAGE data and a generated data set. The GA was found to be an expedient solution compared to editing followed by feature selection, feature selection followed by editing, and the individual results from feature selection and editing.  相似文献   

9.
针对大量电子文档需要准确地进行多层次自动分类管理的现实需求,提出基于多重特征选择和多分类器融合技术的层次分类方法。通过引入可信度函数对单分类器效果进行评价,适时采用辅助分类器对较难分类的文档进行分类投票判决。实验结果表明,相对于单分类器,该方法无论在平面分类和层次分类语料上都获得了更好的分类精度,且具有较好的时间复杂性,有很好的实际应用前景。  相似文献   

10.
Features selection is the process of choosing the relevant subset of features from the high-dimensional dataset to enhance the performance of the classifier. Much research has been carried out in the present world for the process of feature selection. Algorithms such as Naïve Bayes (NB), decision tree, and genetic algorithm are applied to the high-dimensional dataset to select the relevant features and also to increase the computational speed. The proposed model presents a solution for selection of features using ensemble classifier algorithms. The proposed algorithm is the combination of minimum redundancy and maximum relevance (mRMR) and forest optimization algorithm (FOA). Ensemble-based algorithms such as support vector machine (SVM), K-nearest neighbor (KNN), and NB is further used to enhance the performance of the classifier algorithm. The mRMR-FOA is used to select the relevant features from the various datasets and 21% to 24% improvement is recorded in the feature selection. The ensemble classifier algorithms further improves the performance of the algorithm and provides accuracy of 96%.  相似文献   

11.
针对入侵检测系统在实时检测能力和自适应能力方面的不足,提出了一个改进的贝叶斯分类器,通过引入滑动窗口技术改善入侵检测的实时性.同时通过所设计的性能调节器对贝叶斯分类器中参数的动态设置,实现了入侵检测系统的自适应性.改进后的贝叶斯分类器有效地实现了入侵检测的实时性、主动性和自适应性.  相似文献   

12.
In this paper, we propose a two-stage multiobjective-simulated annealing (MOSA)-based technique for named entity recognition (NER). At first, MOSA is used for feature selection under two statistical classifiers, viz. conditional random field (CRF) and support vector machine (SVM). Each solution on the final Pareto optimal front provides a different classifier. These classifiers are then combined together by using a new classifier ensemble technique based on MOSA. Several different versions of the objective functions are exploited. We hypothesize that the reliability of prediction of each classifier differs among the various output classes. Thus, in an ensemble system, it is necessary to find out the appropriate weight of vote for each output class in each classifier. We propose a MOSA-based technique to determine the weights for votes automatically. The proposed two-stage technique is evaluated for NER in Bengali, a resource-poor language, as well as for English. Evaluation results yield the highest recall, precision and F-measure values of 93.95, 95.15 and 94.55 %, respectively for Bengali and 89.01, 89.35 and 89.18 %, respectively for English. Experiments also suggest that the classifier ensemble identified by the proposed MOO-based approach optimizing the F-measure values of named entity (NE) boundary detection outperforms all the individual classifiers and four conventional baseline models.  相似文献   

13.
A dynamic classifier ensemble selection approach for noise data   总被引:2,自引:0,他引:2  
Dynamic classifier ensemble selection (DCES) plays a strategic role in the field of multiple classifier systems. The real data to be classified often include a large amount of noise, so it is important to study the noise-immunity ability of various DCES strategies. This paper introduces a group method of data handling (GMDH) to DCES, and proposes a novel dynamic classifier ensemble selection strategy GDES-AD. It considers both accuracy and diversity in the process of ensemble selection. We experimentally test GDES-AD and six other ensemble strategies over 30 UCI data sets in three cases: the data sets do not include artificial noise, include class noise, and include attribute noise. Statistical analysis results show that GDES-AD has stronger noise-immunity ability than other strategies. In addition, we find out that Random Subspace is more suitable for GDES-AD compared with Bagging. Further, the bias-variance decomposition experiments for the classification errors of various strategies show that the stronger noise-immunity ability of GDES-AD is mainly due to the fact that it can reduce the bias in classification error better.  相似文献   

14.
A novel feature selection approach: Combining feature wrappers and filters   总被引:2,自引:0,他引:2  
Feature selection is one of the most important issues in the research fields such as system modelling, data mining and pattern recognition. In this study, a new feature selection algorithm that combines feature wrapper and feature filter approaches is proposed in order to identify the significant input variables in systems with continuous domains. The proposed method utilizes functional dependency concept, correlation coefficients and K-nearest neighbourhood (KNN) method to implement the feature filter and feature wrappers. Four feature selection methods independently select the significant input variables and the input variable combination, which yields best result with respect to their corresponding evaluation function, is selected as the winner. This is similar to the basic information fusion notion of integrating the information collected from different sources. All of the four feature selection methods are performed in two stages: (i) pre-selection, (ii) selection. Two of the four feature selection methods utilize KNN method for evaluating the candidates. These two methods use sequential forward and sequential backward search mechanism, respectively, in pre-selection stage. Whereas, the third feature selection method uses correlation coefficients in the pre-selection stage. It is common to have outliers and noise in real-life data. In order to make the proposed feature selection algorithm noise and outlier resistant, approximate functional dependencies are used by utilizing membership values that inherently cope with uncertainty in the data. Thus, the fourth feature selection method makes use of approximate functional dependencies to evaluate candidates in pre-selection stage. All of these four methods apply KNN method with exhaustive search strategy in order to find the most suitable input variable combination with respect to a performance measure.  相似文献   

15.
Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.  相似文献   

16.
A novel facial expression classification (FEC) method is presented and evaluated. The classification process is decomposed into multiple two-class classification problems, a choice that is analytically justified, and unique sets of features are extracted for each classification problem. Specifically, for each two-class problem, an iterative feature selection process that utilizes a class separability measure is employed to create salient feature vectors (SFVs), where each SFV is composed of a selected feature subset. Subsequently, two-class discriminant analysis is applied on the SFVs to produce salient discriminant hyper-planes (SDHs), which are used to train the corresponding two-class classifiers. To properly integrate the two-class classification results and produce the FEC decision, a computationally efficient and fast classification scheme is developed. During each step of this scheme, the most reliable classifier is identified and utilized, thus, a more accurate final classification decision is produced. The JAFFE and the MMI databases are used to evaluate the performance of the proposed salient-feature-and-reliable-classifier selection (SFRCS) methodology. Classification rates of 96.71% and 93.61% are achieved under the leave-one-sample-out evaluation strategy, and 85.92% under the leave-one-subject-out evaluation strategy.  相似文献   

17.
The choice of packaging type is important to the process of researching and developing an integrated circuit (IC). Indeed, for an IC chip designer, the importance can be compared to an architect’s choice of construction design. Since there are considerable variations in characteristics and in the types of products available, collecting information about packaging technologies and products can be difficult and time-consuming. Therefore, finding the means to provide packaging information to designers quickly and efficiently is necessary and important, as this will not only help designers accurately decide on design methods for an IC, but also significantly reduce processing risks. In this study, existing product information, such as the dimensions, characteristics and design and application criteria, of a product was analyzed. One of the biggest issues when data from multi-dimensional measurements are represented as a feature vector is that the feature space of the raw data often has very large dimensions. This study explores the use of rough set attribute reduction (RSAR) to reduce attributes of the IC package family dataset, and artificial neural networks, to construct an efficient IC package type classifier model. The experimental results show that the features produced by RSAR improve on generalization accuracy: the training and testing set classification accuracy rates were 96.9% and 98.2%, respectively.  相似文献   

18.
基于类别的特征选择算法的文本分类系统   总被引:1,自引:0,他引:1  
蒋伟贞  陶宏才 《计算机应用》2005,25(11):2658-2660
目前的索引词选择算法大多是基于词频的,没有利用训练样本中的类别信息,为此提出了一种新的基于类别的特征选择算法。该算法根据某个词是否存在于文档中导致该类文档相似度的区别,来确定该词区分不同文档的分辨力,以此分辨力作为选取关键词的重要度。以该算法为基础,设计了一个英文文本自动分类系统,并对该系统进行了测试和结果分析。  相似文献   

19.
Throughout the history human–machine systems design has had a technological bias in the sense that design for technology came first and design for humans as a distant second. Over the years this situation became untenable because the growing system complexity made a decomposition approach to design inadequate. Seeing that technology-centered design had failed, the pendulum swung to the other side taking the human as the center of things. Yet human-centered design is just as inadequate as machine-centered design, since it implies a dichotomy where one part of the system is seen as opposed to the other. This applies not least to the case of automotive environments, where the interaction has a clear purpose, namely safely to negotiate the traffic. Design should therefore embrace a function-centered view where the focus is the joint driver-vehicle system. Design should serve to further the purposes or goals of the joint system, i.e., to be in control vis-à-vis the dynamic traffic environment, by taking the relative strengths and limitations of the components into account and by describing the system on multiple levels.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号