首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 57 毫秒
1.
OFFSS: optimal fuzzy-valued feature subset selection   总被引:1,自引:0,他引:1  
Feature subset selection is a well-known pattern recognition problem, which aims to reduce the number of features used in classification or recognition. This reduction is expected to improve the performance of classification algorithms in terms of speed, accuracy and simplicity. Most existing feature selection investigations focus on the case that the feature values are real or nominal, very little research is found to address the fuzzy-valued feature subset selection and its computational complexity. This paper focuses on a problem called optimal fuzzy-valued feature subset selection (OFFSS), in which the quality-measure of a subset of features is defined by both the overall overlapping degree between two classes of examples and the size of feature subset. The main contributions of this paper are that: 1) the concept of fuzzy extension matrix is introduced; 2) the computational complexity of OFFSS is proved to be NP-hard; 3) a simple but powerful heuristic algorithm for OFFSS is given; and 4) the feasibility and simplicity of the proposed algorithm are demonstrated by applications of OFFSS to fuzzy decision tree induction and by comparisons with three different feature selection techniques developed recently.  相似文献   

2.
基于遗传算法的特征选择方法   总被引:6,自引:0,他引:6  
特征提取广泛应用于模式识别、知识发现、机器学习等诸多领域,并受到了越来越多的关注犤1犦。对于一个给定的待分类模式,特征提取要求人们从大量的特征中选取一个最优特征子集,以代表被分类的模式。该文对特征提取这一组合优化及多目标优化问题提出了基于遗传算法的解决方法,把遗传算法作为识别或分类系统的“前端”,找出代表问题空间的最优特征子集,以大大降低分类系统的搜索空间,从而提高搜索效率。  相似文献   

3.
Pattern recognition generally requires that objects be described in terms of a set of measurable features. The selection and quality of the features representing each pattern affect the success of subsequent classification. Feature extraction is the process of deriving new features from original features to reduce the cost of feature measurement, increase classifier efficiency, and allow higher accuracy. Many feature extraction techniques involve linear transformations of the original pattern vectors to new vectors of lower dimensionality. While this is useful for data visualization and classification efficiency, it does not necessarily reduce the number of features to be measured since each new feature may be a linear combination of all of the features in the original pattern vector. Here, we present a new approach to feature extraction in which feature selection and extraction and classifier training are performed simultaneously using a genetic algorithm. The genetic algorithm optimizes a feature weight vector used to scale the individual features in the original pattern vectors. A masking vector is also employed for simultaneous selection of a feature subset. We employ this technique in combination with the k nearest neighbor classification rule, and compare the results with classical feature selection and extraction techniques, including sequential floating forward feature selection, and linear discriminant analysis. We also present results for the identification of favorable water-binding sites on protein surfaces  相似文献   

4.
Feature selection (attribute reduction) from large-scale incomplete data is a challenging problem in areas such as pattern recognition, machine learning and data mining. In rough set theory, feature selection from incomplete data aims to retain the discriminatory power of original features. To address this issue, many feature selection algorithms have been proposed, however, these algorithms are often computationally time-consuming. To overcome this shortcoming, we introduce in this paper a theoretic framework based on rough set theory, which is called positive approximation and can be used to accelerate a heuristic process for feature selection from incomplete data. As an application of the proposed accelerator, a general feature selection algorithm is designed. By integrating the accelerator into a heuristic algorithm, we obtain several modified representative heuristic feature selection algorithms in rough set theory. Experiments show that these modified algorithms outperform their original counterparts. It is worth noting that the performance of the modified algorithms becomes more visible when dealing with larger data sets.  相似文献   

5.
Feature selection is a challenging problem in areas such as pattern recognition, machine learning and data mining. Considering a consistency measure introduced in rough set theory, the problem of feature selection, also called attribute reduction, aims to retain the discriminatory power of original features. Many heuristic attribute reduction algorithms have been proposed however, quite often, these methods are computationally time-consuming. To overcome this shortcoming, we introduce a theoretic framework based on rough set theory, called positive approximation, which can be used to accelerate a heuristic process of attribute reduction. Based on the proposed accelerator, a general attribute reduction algorithm is designed. Through the use of the accelerator, several representative heuristic attribute reduction algorithms in rough set theory have been enhanced. Note that each of the modified algorithms can choose the same attribute reduct as its original version, and hence possesses the same classification accuracy. Experiments show that these modified algorithms outperform their original counterparts. It is worth noting that the performance of the modified algorithms becomes more visible when dealing with larger data sets.  相似文献   

6.
Feature subset selection is a substantial problem in the field of data classification tasks. The purpose of feature subset selection is a mechanism to find efficient subset retrieved from original datasets to increase both efficiency and accuracy rate and reduce the costs of data classification. Working on high-dimensional datasets with a very large number of predictive attributes while the number of instances is presented in a low volume needs to be employed techniques to select an optimal feature subset. In this paper, a hybrid method is proposed for efficient subset selection in high-dimensional datasets. The proposed algorithm runs filter-wrapper algorithms in two phases. The symmetrical uncertainty (SU) criterion is exploited to weight features in filter phase for discriminating the classes. In wrapper phase, both FICA (fuzzy imperialist competitive algorithm) and IWSSr (Incremental Wrapper Subset Selection with replacement) in weighted feature space are executed to find relevant attributes. The new scheme is successfully applied on 10 standard high-dimensional datasets, especially within the field of biosciences and medicine, where the number of features compared to the number of samples is large, inducing a severe curse of dimensionality problem. The comparison between the results of our method and other algorithms confirms that our method has the most accuracy rate and it is also able to achieve to the efficient compact subset.  相似文献   

7.
8.
Facial expression recognition generally requires that faces be described in terms of a set of measurable features. The selection and quality of the features representing each face have a considerable bearing on the success of subsequent facial expression classification. Feature selection is the process of choosing a subset of features in order to increase classifier efficiency and allow higher classification accuracy. Many current dimensionality reduction techniques, used for facial expression recognition, involve linear transformations of the original pattern vectors to new vectors of lower dimensionality. In this paper, we present a methodology for the selection of features that uses nondominated sorting genetic algorithm-II (NSGA-II), which is one of the latest genetic algorithms developed for resolving problems with multiobjective approach with high accuracy. In the proposed feature selection process, NSGA-II optimizes a vector of feature weights, which increases the discrimination, by means of class separation. The proposed methodology is evaluated using 3D facial expression database BU-3DFE. Classification results validates the effectiveness and the flexibility of the proposed approach when compared with results reported in the literature using the same experimental settings.  相似文献   

9.
在高维数据分类中,针对多重共线性、冗余特征及噪声易导致分类器识别精度低和时空开销大的问题,提出融合偏最小二乘(Partial Least Squares,PLS)有监督特征提取和虚假最近邻点(False Nearest Neighbors,FNN)的特征选择方法:首先利用偏最小二乘对高维数据提取主元,消除特征之间的多重共线性,得到携带监督信息的独立主元空间;然后通过计算各特征选择前后在此空间的相关性,建立基于虚假最近邻点的特征相似性测度,得到原始特征对类别变量解释能力强弱排序;最后,依次剔除解释能力弱的特征,构造出各种分类模型,并以支持向量机(Support Vector Machine,SVM)分类识别率为模型评估准则,搜索出识别率最高但含特征数最少的分类模型,此模型所含的特征即为最佳特征子集。3个数据集模型仿真结果:均表明,由此法选择出的最佳特征子集与各数据集的本质分类特征吻合,说明该方法:有良好的特征选择能力,为数据分类特征选择提供了一条新途径。  相似文献   

10.
特征选择方法主要包括过滤方法和绕封方法。为了利用过滤方法计算简单和绕封方法精度高的优点,提出一种组合过滤和绕封方法的特征选择新方法。该方法首先利用基于互信息准则的过滤方法得到满足一定精度要求的子集后,再采用绕封方法找到最后的优化特征子集。由于遗传算法在组合优化问题上的成功应用,对特征子集寻优采用了遗传算法。在数值仿真和轴承故障特征选择中,采用新方法在保证诊断精度的同时,可以节省大量选择时间。组合特征选择方法有较好的寻优特征子集的能力,能够节省选择时间,具有高效、高精度的双重优点。  相似文献   

11.
Feature selection is a key issue in pattern recognition, specially when prior knowledge of the most discriminant features is not available. Moreover, in order to perform the classification task with reduced complexity and acceptable performance, usually features that are irrelevant, redundant, or noisy are excluded from the problem representation. This work presents a multi-objective wrapper, based on genetic algorithms, to select the most relevant set of features for face recognition tasks. The proposed strategy explores the space of multiple feasible selections in order to minimize the cardinality of the feature subset, and at the same time to maximize its discriminative capacity. Experimental results show that, in comparison with other state-of-the-art approaches, the proposed approach allows to improve the classification performance, while reducing the representation dimensionality.  相似文献   

12.
Ultrasound imaging is the most suitable method for early detection of prostate cancer. It is very difficult to distinguish benign and malignant nature of the affliction in the early stage of cancer. This is reflected in the high percentage of unnecessary biopsies that are performed and many deaths caused by late detection or misdiagnosis. A computer based classification system can provide a second opinion to the radiologists. Generally, objects are described in terms of a set of measurable features in pattern recognition. The selection and quality of the features representing each pattern will have a considerable bearing on the success of subsequent pattern classification. Feature selection is a process of selecting the most wanted or dominating features set from the original features set in order to reduce the cost of data visualization and increasing classification efficiency and accuracy. The region of interest (ROI) is identified from transrectal ultrasound (TRUS) images using DBSCAN clustering with morphological operators after image enhancement using M3-filter. Then the 22 grey level co-occurrence matrix features are extracted from the ROIs. Soft computing model based feature selection algorithms genetic algorithm (GA), ant colony optimization (ACO) and QR are studied. In this paper, QR-ACO (hybridization of rough set based QR and ACO) and GA-ACO (hybridization GA and ACO) are proposed for reducing feature set in order to increase the accuracy and efficiency of the classification with regard to prostate cancer. The selected features may have the best discriminatory power for classifying prostate cancer based on TRUS images. Support vector machine is tailored for evaluation of the proposed feature selection methods through classification. Then, the comparative analysis is performed among these methods. Experimental results show that the proposed method QR-ACO produces significant results. Number of features selected using QR-ACO algorithm is minimal, is successful and has high detection accuracy.  相似文献   

13.
黄莉莉  汤进  孙登第  罗斌 《计算机应用》2012,32(10):2888-2890
针对传统特征选择算法局限于单标签数据问题,提出一种多标签数据特征选择算法——多标签ReliefF算法。该算法依据多标签数据类别的共现性,假设样本各类标签的贡献值是相等的,结合三种贡献值计算方法,改进特征权值更新公式,最终获得有效的分类特征。分类实验结果表明,在特征维数相同的情况下,多标签ReliefF算法的分类正确率明显高于传统特征选择算法。  相似文献   

14.
The problem of selecting a subset of relevant features is classic and found in many branches of science including—examples in pattern recognition. In this paper, we propose a new feature selection criterion based on low-loss nearest neighbor classification and a novel feature selection algorithm that optimizes the margin of nearest neighbor classification through minimizing its loss function. At the same time, theoretical analysis based on energy-based model is presented, and some experiments are also conducted on several benchmark real-world data sets and facial data sets for gender classification to show that the proposed feature selection method outperforms other classic ones.  相似文献   

15.
Feature selection plays a vital role in many areas of pattern recognition and data mining. The effective computation of feature selection is important for improving the classification performance. In rough set theory, many feature selection algorithms have been proposed to process static incomplete data. However, feature values in an incomplete data set may vary dynamically in real-world applications. For such dynamic incomplete data, a classic (non-incremental) approach of feature selection is usually computationally time-consuming. To overcome this disadvantage, we propose an incremental approach for feature selection, which can accelerate the feature selection process in dynamic incomplete data. We firstly employ an incremental manner to compute the new positive region when feature values with respect to an object set vary dynamically. Based on the calculated positive region, two efficient incremental feature selection algorithms are developed respectively for single object and multiple objects with varying feature values. Then we conduct a series of experiments with 12 UCI real data sets to evaluate the efficiency and effectiveness of our proposed algorithms. The experimental results show that the proposed algorithms compare favorably with that of applying the existing non-incremental methods.  相似文献   

16.
维度灾难是机器学习任务中的常见问题,特征选择算法能够从原始数据集中选取出最优特征子集,降低特征维度.提出一种混合式特征选择算法,首先用卡方检验和过滤式方法选择重要特征子集并进行标准化缩放,再用序列后向选择算法(SBS)与支持向量机(SVM)包裹的SBS-SVM算法选择最优特征子集,实现分类性能最大化并有效降低特征数量.实验中,将包裹阶段的SBS-SVM与其他两种算法在3个经典数据集上进行测试,结果表明,SBS-SVM算法在分类性能和泛化能力方面均具有较好的表现.  相似文献   

17.
针对特征选择算法——relief在训练个别属性权值时的盲目性缺点,提出了一种基于自适应划分实例集的新算法——Q-relief,该算法改正了原算法属性选择时的盲目性缺点,选择出表达图像信息最优的特征子集来进行模式识别。将该算法应用于列车运行故障动态图像监测系统(TFDS)的故障识别,经实验验证,与其他算法相比,Q-relief算法明显提高了故障图像识别的准确率。  相似文献   

18.
Finding an optimal subset of features that maximizes classification accuracy is still an open problem. In this paper, we exploit the speed of the Harmony Search algorithm and the Optimum-Path Forest classifier in order to propose a new fast and accurate approach for feature selection. Comparisons to some other pattern recognition and feature selection techniques showed that the proposed hybrid algorithm for feature selection outperformed them. The experiments were carried out in the context of identifying non-technical losses in power distribution systems.  相似文献   

19.
Past work on object detection has emphasized the issues of feature extraction and classification, however, relatively less attention has been given to the critical issue of feature selection. The main trend in feature extraction has been representing the data in a lower dimensional space, for example, using principal component analysis (PCA). Without using an effective scheme to select an appropriate set of features in this space, however, these methods rely mostly on powerful classification algorithms to deal with redundant and irrelevant features. In this paper, we argue that feature selection is an important problem in object detection and demonstrate that genetic algorithms (GAs) provide a simple, general, and powerful framework for selecting good subsets of features, leading to improved detection rates. As a case study, we have considered PCA for feature extraction and support vector machines (SVMs) for classification. The goal is searching the PCA space using GAs to select a subset of eigenvectors encoding important information about the target concept of interest. This is in contrast to traditional methods selecting some percentage of the top eigenvectors to represent the target concept, independently of the classification task. We have tested the proposed framework on two challenging applications: vehicle detection and face detection. Our experimental results illustrate significant performance improvements in both cases.  相似文献   

20.
Neighborhood rough set based heterogeneous feature subset selection   总被引:6,自引:0,他引:6  
Feature subset selection is viewed as an important preprocessing step for pattern recognition, machine learning and data mining. Most of researches are focused on dealing with homogeneous feature selection, namely, numerical or categorical features. In this paper, we introduce a neighborhood rough set model to deal with the problem of heterogeneous feature subset selection. As the classical rough set model can just be used to evaluate categorical features, we generalize this model with neighborhood relations and introduce a neighborhood rough set model. The proposed model will degrade to the classical one if we specify the size of neighborhood zero. The neighborhood model is used to reduce numerical and categorical features by assigning different thresholds for different kinds of attributes. In this model the sizes of the neighborhood lower and upper approximations of decisions reflect the discriminating capability of feature subsets. The size of lower approximation is computed as the dependency between decision and condition attributes. We use the neighborhood dependency to evaluate the significance of a subset of heterogeneous features and construct forward feature subset selection algorithms. The proposed algorithms are compared with some classical techniques. Experimental results show that the neighborhood model based method is more flexible to deal with heterogeneous data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号