首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In feature selection problems, strong relevant features may be misjudged as redundant by the approximate Markov blanket. To avoid this, a new concept called strong approximate Markov blanket is proposed. It is theoretically proved that no strong relevant feature will be misjudged as redundant by the proposed concept. To reduce computation time, we propose the concept of modified strong approximate Markov blanket, which still performs better than the approximate Markov blanket in avoiding misjudgment of strong relevant features. A new filter-based feature selection method that is applicable to high-dimensional datasets is further developed. It first groups features to remove redundant features, and then uses a sequential forward selection method to remove irrelevant features. Numerical results on four benchmark and seven real datasets suggest that it is a competitive feature selection method with high classification accuracy, moderate number of selected features, and above-average robustness.  相似文献   

2.
Today, feature selection is an active research in machine learning. The main idea of feature selection is to choose a subset of available features, by eliminating features with little or no predictive information, as well as redundant features that are strongly correlated. There are a lot of approaches for feature selection, but most of them can only work with crisp data. Until now there have not been many different approaches which can directly work with both crisp and low quality (imprecise and uncertain) data. That is why, we propose a new method of feature selection which can handle both crisp and low quality data. The proposed approach is based on a Fuzzy Random Forest and it integrates filter and wrapper methods into a sequential search procedure with improved classification accuracy of the features selected. This approach consists of the following main steps: (1) scaling and discretization process of the feature set; and feature pre-selection using the discretization process (filter); (2) ranking process of the feature pre-selection using the Fuzzy Decision Trees of a Fuzzy Random Forest ensemble; and (3) wrapper feature selection using a Fuzzy Random Forest ensemble based on cross-validation. The efficiency and effectiveness of this approach is proved through several experiments using both high dimensional and low quality datasets. The approach shows a good performance (not only classification accuracy, but also with respect to the number of features selected) and good behavior both with high dimensional datasets (microarray datasets) and with low quality datasets.  相似文献   

3.
This work proposes a novel method for template matching in the wild. Different from the previous methods to search the matching position, our method obtains further ability on the angle location by dynamically updating corresponding feature pairs and rigid transformation parameters, which result in mutual enhancement of both feature extraction and template location. We propose a robust objective function with a valid feature selection for template matching against to noise disturbance, background changing, object deformation and partial occlusion. A hierarchical search strategy is used by adjusting the size of feature patch to improve the matching effectiveness. In addition, we extend the proposed method to match image sequences. It is beneficial to propagate reasonable feature pairs to a sequential object as initialization, recalling a stable tracking result. Extensive experiments are tested on the public database with challenging images and sequences. Experimental results demonstrate the merits of the proposed method compared to some state-of-the-art matching methods.  相似文献   

4.
Financially distressed prediction (FDP) has been a widely and continually studied topic in the field of corporate finance. One of the core problems to FDP is to design effective feature selection algorithms. In contrast to existing approaches, we propose an integrated approach to feature selection for the FDP problem that embeds expert knowledge with the wrapper method. The financial features are categorized into seven classes according to their financial semantics based on experts’ domain knowledge surveyed from literature. We then apply the wrapper method to search for “good” feature subsets consisting of top candidates from each feature class. For concept verification, we compare several scholars’ models as well as leading feature selection methods with the proposed method. Our empirical experiment indicates that the prediction model based on the feature set selected by the proposed method outperforms those models based on traditional feature selection methods in terms of prediction accuracy.  相似文献   

5.
A new improved forward floating selection (IFFS) algorithm for selecting a subset of features is presented. Our proposed algorithm improves the state-of-the-art sequential forward floating selection algorithm. The improvement is to add an additional search step called “replacing the weak feature” to check whether removing any feature in the currently selected feature subset and adding a new one at each sequential step can improve the current feature subset. Our method provides the optimal or quasi-optimal (close to optimal) solutions for many selected subsets and requires significantly less computational load than optimal feature selection algorithms. Our experimental results for four different databases demonstrate that our algorithm consistently selects better subsets than other suboptimal feature selection algorithms do, especially when the original number of features of the database is large.  相似文献   

6.
Power quality (PQ) issues have become more important than before due to increased use of sensitive electrical loads. In this paper, a new hybrid algorithm is presented for PQ disturbances detection in electrical power systems. The proposed method is constructed based on four main steps: simulation of PQ events, extraction of features, selection of dominant features, and classification of selected features. By using two powerful signal processing tools, i.e. variational mode decomposition (VMD) and S-transform (ST), some potential features are extracted from different PQ events. VMD as a new tool decomposes signals into different modes and ST also analyzes signals in both time and frequency domains. In order to avoid large dimension of feature vector and obtain a detection scheme with optimum structure, sequential forward selection (SFS) and sequential backward selection (SBS) as wrapper based methods and Gram–Schmidt orthogonalization (GSO) based feature selection method as filter based method are used for elimination of redundant features. In the next step, PQ events are discriminated by support vector machines (SVMs) as classifier core. Obtained results of the extensive tests prove the satisfactory performance of the proposed method in terms of speed and accuracy even in noisy conditions. Moreover, the start and end points of PQ events can be detected with high precision.  相似文献   

7.
Feature Fusion plays an important role in speech emotion recognition to improve the classification accuracy by combining the most popular acoustic features for speech emotion recognition like energy, pitch and mel frequency cepstral coefficients. However the performance of the system is not optimal because of the computational complexity of the system, which occurs due to high dimensional correlated feature set after feature fusion. In this paper, a two stage feature selection method is proposed. In first stage feature selection, appropriate features are selected and fused together for speech emotion recognition. In second stage feature selection, optimal feature subset selection techniques [sequential forward selection (SFS) and sequential floating forward selection (SFFS)] are used to eliminate the curse of dimensionality problem due to high dimensional feature vector after feature fusion. Finally the emotions are classified by using several classifiers like Linear Discriminant Analysis (LDA), Regularized Discriminant Analysis (RDA), Support Vector Machine (SVM) and K Nearest Neighbor (KNN). The performance of overall emotion recognition system is validated over Berlin and Spanish databases by considering classification rate. An optimal uncorrelated feature set is obtained by using SFS and SFFS individually. Results reveal that SFFS is a better choice as a feature subset selection method because SFS suffers from nesting problem i.e it is difficult to discard a feature after it is retained into the set. SFFS eliminates this nesting problem by making the set not to be fixed at any stage but floating up and down during the selection based on the objective function. Experimental results showed that the efficiency of the classifier is improved by 15–20 % with two stage feature selection method when compared with performance of the classifier with feature fusion.  相似文献   

8.
With the advent of technology in various scientific fields, high dimensional data are becoming abundant. A general approach to tackle the resulting challenges is to reduce data dimensionality through feature selection. Traditional feature selection approaches concentrate on selecting relevant features and ignoring irrelevant or redundant ones. However, most of these approaches neglect feature interactions. On the other hand, some datasets have imbalanced classes, which may result in biases towards the majority class. The main goal of this paper is to propose a novel feature selection method based on the interaction information (II) to provide higher level interaction analysis and improve the search procedure in the feature space. In this regard, an evolutionary feature subset selection algorithm based on interaction information is proposed, which consists of three stages. At the first stage, candidate features and candidate feature pairs are identified using traditional feature weighting approaches such as symmetric uncertainty (SU) and bivariate interaction information. In the second phase, candidate feature subsets are formed and evaluated using multivariate interaction information. Finally, the best candidate feature subsets are selected using dominant/dominated relationships. The proposed algorithm is compared with some other feature selection algorithms including mRMR, WJMI, IWFS, IGFS, DCSF, IWFS, K_OFSD, WFLNS, Information Gain and ReliefF in terms of the number of selected features, classification accuracy, F-measure and algorithm stability using three different classifiers, namely KNN, NB, and CART. The results justify the improvement of classification accuracy and the robustness of the proposed method in comparison with the other approaches.  相似文献   

9.
In this paper, a new feature selection method based on Association Rules (AR) and Neural Network (NN) is presented for the diagnosis of erythemato-squamous diseases. AR is used for reducing the dimension of erythemato-squamous diseases dataset and NN is used for efficient classification. The proposed AR+NN system performance is compared with that of other feature selection algorithms+NN. The dimension of input feature space is reduced from thirty four to twenty four by using AR. In test stage, 3-fold cross validation method is applied to the erythemato-squamous diseases dataset to evaluate the proposed system performances. The correct classification rate of proposed system is 98.61%. This research demonstrated that the AR can be used for reducing the dimension of feature space and proposed AR+NN model can be used to obtain fast automatic diagnostic systems for other diseases.  相似文献   

10.
为获取文本中的较优特征子集,剔除干扰和冗余特征,提出了一种结合过滤式算法和群智能算法的混合特征寻优算法。首先计算每个特征词的信息增益值,选取较优的特征作为预选特征集合,再利用正余弦算法对预选特征进行寻优,获取精选特征集合。为较好地平衡正余弦算法中的全局搜索和局部开发能力,加入了自适应惯性权重;为更精确地评价特征子集,引入以特征数量和准确率进行加权的适应度函数,并提出了新的位置更新机制。在KNN和贝叶斯分类器上的实验结果表明,该特征选择算法与其它特征选择算法及改进前的算法相比,分类准确率得到了一定的提升。  相似文献   

11.
An intelligent identification system for mixed anuran vocalizations is developed in this work to provide the public to easily consult online. The raw mixed anuran vocalization samples are first filtered by noise removal, high frequency compensation, and discrete wavelet transform techniques in order. An adaptive end-point detection segmentation algorithm is proposed to effectively separate the individual syllables from the noise. Six features, including spectral centroid, signal bandwidth, spectral roll-off, threshold-crossing rate, spectral flatness, and average energy, are extracted and served as the input parameters of the classifier. Meanwhile, a decision tree is constructed based on several parameters obtained during sample collection in order to narrow the scope of identification targets. Then fast learning neural-networks are employed to classify the anuran species based on feature set chosen by wrapper feature selection method. A series of experiments were conducted to measure the outcome performance of the proposed work. Experimental results exhibit that the recognition rate of the proposed identification system can achieve up to 93.4%. The effectiveness of the proposed identification system for anuran vocalizations is thus verified.  相似文献   

12.
In this paper, we present an improved feature reduction method in input and feature spaces for classification using support vector machines (SVMs). In the input space, we select a subset of input features by ranking their contributions to the decision function. In the feature space, features are ranked according to the weighted support vector in each dimension. By applying feature reduction in both input and feature spaces, we develop a fast non-linear SVM without a significant loss in performance. We have tested the proposed method on the detection of face, person, and car. Subsets of features are chosen from pixel values for face detection and from Haar wavelet features for person and car detection. The experimental results show that the proposed feature reduction method works successfully. In fact, our method performs better than the methods of using all the features and the Fisher's features in the detection of person and car. We also gain the advantage of speed.  相似文献   

13.
特征选择通过移除不相关和冗余的特征来提高学习算法的性能。基于进化算法在求解优化问题时表现出的优越性能,提出FSSAC特征选择方法。新的初始化策略和评估函数使得SAC能将特征选择作为离散空间搜索问题来解决,利用特征子集的准确率指导SAC的采样阶段。在实验阶段,FSSAC结合SVM,J48和KNN分类器,通过UCI数据集完成验证,并与FSFOA,HGAFS,PSO等算法进行了比较。实验结果表明,FSSAC可以提高分类器的分类准确率,且具有良好的泛化性能。除此之外,对FSSAC和其他算法在特征空间维度缩减情况方面做了对比。  相似文献   

14.
Selecting an optimal subset from original large feature set in the design of pattern classifier is an important and difficult problem. In this paper, we use tabu search to solve this feature selection problem and compare it with classic algorithms, such as sequential methods, branch and bound method, etc., and most other suboptimal methods proposed recently, such as genetic algorithm and sequential forward (backward) floating search methods. Based on the results of experiments, tabu search is shown to be a promising tool for feature selection in respect of the quality of obtained feature subset and computation efficiency. The effects of parameters in tabu search are also analyzed by experiments.  相似文献   

15.
雷军程  黄同成  柳小文 《计算机科学》2012,39(7):250-252,275
在分析比较几种常用的特征选择方法的基础上,提出了一种引入文本类区分加权频率的特征选择方法TFIDF_Ci。它将具体类的文档出现频率引入TFIDF函数,提高了特征项所在文档所属类区分其他类的能力。实验中采用KNN分类算法对该方法和其他特征选择方法进行了比较测试。结果表明,TFIDF_Ci方法较其他方法在不同的训练集规模情况下具有更高的分类精度和稳定性。  相似文献   

16.
Feature selection has become an increasingly important field of research. It aims at finding optimal feature subsets that can achieve better generalization on unseen data. However, this can be a very challenging task, especially when dealing with large feature sets. Hence, a search strategy is needed to explore a relatively small portion of the search space in order to find “semi-optimal” subsets. Many search strategies have been proposed in the literature, however most of them do not take into consideration relationships between features. Due to the fact that features usually have different degrees of dependency among each other, we propose in this paper a new search strategy that utilizes dependency between feature pairs to guide the search in the feature space. When compared to other well-known search strategies, the proposed method prevailed.  相似文献   

17.
Most feature selection algorithms based on information-theoretic learning (ITL) adopt ranking process or greedy search as their searching strategies. The former selects features individually so that it ignores feature interaction and dependencies. The latter heavily relies on the search paths, as only one path will be explored with no possible back-track. In addition, both strategies typically lead to heuristic algorithms. To cope with these problems, this article proposes a novel feature selection framework based on correntropy in ITL, namely correntropy based feature selection using binary projection (BPFS). Our framework selects features by projecting the original high-dimensional data to a low-dimensional space through a special binary projection matrix. The formulated objective function aims at maximizing the correntropy between selected features and class labels. And this function can be efficiently optimized via standard mathematical tools. We apply the half-quadratic method to optimize the objective function in an iterative manner, where each iteration reduces to an assignment subproblem which can be highly efficiently solved with some off-the-shelf toolboxes. Comparative experiments on six real-world datasets indicate that our framework is effective and efficient.  相似文献   

18.
This paper presents two new techniques, viz., DWT Dual-subband Frequency-domain Feature Extraction (DDFFE) and Threshold-Based Binary Particle Swarm Optimization (ThBPSO) feature selection, to improve the performance of a face recognition system. DDFFE uses a unique combination of DWT, DFT, and DCT, and is used for efficient extraction of pose, translation and illumination invariant features. The DWT stage selectively utilizes the approximation coefficients along with the horizontal detail coefficients of the 2-dimensional DWT of a face image, whilst retaining the spatial correlation of pixels. The translation variance problem of the DWT is compensated in the following DFT stage, which also exploits the frequency characteristics of the image. Then, all the low frequency components present at the center of the DFT spectrum are extracted by drawing a quadruple ellipse mask around the spectrum center. Finally, DCT is used to lay the ground for BPSO based feature selection. The second proposed technique, ThBPSO, is a novel feature selection algorithm, based on the recurrence of selected features, and is used to search the feature space to obtain a feature subset for recognition. Experimental results obtained by applying the proposed algorithm on seven benchmark databases, namely, Cambridge ORL, UMIST, Extended Yale B, CMUPIE, Color FERET, FEI, and HP, show that the proposed system outperforms other FR systems. A significant increase in the recognition rate and a substantial reduction in the number of features required for recognition are observed. The experimental results indicate that the minimum feature reduction obtained is 98.2% for all seven databases.  相似文献   

19.
The aim of this paper is to propose a new hybrid data mining model based on combination of various feature selection and ensemble learning classification algorithms, in order to support decision making process. The model is built through several stages. In the first stage, initial dataset is preprocessed and apart of applying different preprocessing techniques, we paid a great attention to the feature selection. Five different feature selection algorithms were applied and their results, based on ROC and accuracy measures of logistic regression algorithm, were combined based on different voting types. We also proposed a new voting method, called if_any, that outperformed all other voting methods, as well as a single feature selection algorithm's results. In the next stage, a four different classification algorithms, including generalized linear model, support vector machine, naive Bayes and decision tree, were performed based on dataset obtained in the feature selection process. These classifiers were combined in eight different ensemble models using soft voting method. Using the real dataset, the experimental results show that hybrid model that is based on features selected by if_any voting method and ensemble GLM + DT model performs the highest performance and outperforms all other ensemble and single classifier models.  相似文献   

20.
An efficient filter feature selection (FS) method is proposed in this paper, the SVM-FuzCoC approach, achieving a satisfactory trade-off between classification accuracy and dimensionality reduction. Additionally, the method has reasonably low computational requirements, even in high-dimensional feature spaces. To assess the quality of features, we introduce a local fuzzy evaluation measure with respect to patterns that embraces fuzzy membership degrees of every pattern in their classes. Accordingly, the above measure reveals the adequacy of data coverage provided by each feature. The required membership grades are determined via a novel fuzzy output kernel-based support vector machine, applied on single features. Based on a fuzzy complementary criterion (FuzCoC), the FS procedure iteratively selects features with maximum additional contribution in regard to the information content provided by previously selected features. This search strategy leads to small subsets of powerful and complementary features, alleviating the feature redundancy problem. We also devise different SVM-FuzCoC variants by employing seven other methods to derive fuzzy degrees from SVM outputs, based on probabilistic or fuzzy criteria. Our method is compared with a set of existing FS methods, in terms of performance capability, dimensionality reduction, and computational speed, via a comprehensive experimental setup, including synthetic and real-world datasets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号