首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 861 毫秒
1.
M.E. ElAlami 《Knowledge》2009,22(5):356-362
This paper describes a novel feature subset selection algorithm, which utilizes a genetic algorithm (GA) to optimize the output nodes of trained artificial neural network (ANN). The new algorithm does not depend on the ANN training algorithms or modify the training results. The two groups of weights between input-hidden and hidden-output layers are extracted after training the ANN on a given database. The general formula for each output node (class) of ANN is then generated. This formula depends only on input features because the two groups of weights are constant. This dependency is represented by a non-linear exponential function. The GA is involved to find the optimal relevant features, which maximize the output function for each class. The dominant features in all classes are the features subset to be selected from the input feature group.  相似文献   

2.
Feature selection is used to choose a subset of relevant features for effective classification of data. In high dimensional data classification, the performance of a classifier often depends on the feature subset used for classification. In this paper, we introduce a greedy feature selection method using mutual information. This method combines both feature–feature mutual information and feature–class mutual information to find an optimal subset of features to minimize redundancy and to maximize relevance among features. The effectiveness of the selected feature subset is evaluated using multiple classifiers on multiple datasets. The performance of our method both in terms of classification accuracy and execution time performance, has been found significantly high for twelve real-life datasets of varied dimensionality and number of instances when compared with several competing feature selection techniques.  相似文献   

3.
特征选择是从原始特征集中选取特征子集,并且降低特征维度和减少冗余信息,从而达到提高分类准确度的效果。为了达到此效果,提出了新的特征选择算法。该算法使用经过离散化处理之后的增强烟花算法来搜索特征子集,同时将特征子集和经过惩罚因子处理之后约束条件融入到目标函数中,然后将搜索到的特征子集的数据放到[kNN]分类器进行训练和预测,最后使用十折交叉验证来检验分类的准确性。使用UCI数据进行仿真实验,仿真结果表明:与引导型烟花算法、烟花算法、蝙蝠算法、乌鸦算法、自适应粒子群算法相比,所提算法的总体性能优于其他五种算法。  相似文献   

4.
传统基于互信息的特征选择方法较少考虑特征之间的关联,并且随着特征数的增加,算法复杂度过大,基于此提出了一种新的基于互信息的特征子集评价函数。该方法充分考虑了特征间如何进行协作,选择了较优的特征子集,改善了分类准确度并且计算负荷有限。实验结果表明,该方法与传统的MIFS方法相比较,分类准确度提高了3%~5%,误差减少率也有25%~30%的改善。  相似文献   

5.
Feature selection (FS) is one of the most important fields in pattern recognition, which aims to pick a subset of relevant and informative features from an original feature set. There are two kinds of FS algorithms depending on the presence of information about dataset class labels: supervised and unsupervised algorithms. Supervised approaches utilize class labels of dataset in the process of feature selection. On the other hand, unsupervised algorithms act in the absence of class labels, which makes their process more difficult. In this paper, we propose unsupervised probabilistic feature selection using ant colony optimization (UPFS). The algorithm looks for the optimal feature subset in an iterative process. In this algorithm, we utilize inter-feature information which shows the similarity between the features that leads the algorithm to decreased redundancy in the final set. In each step of the ACO algorithm, to select the next potential feature, we calculate the amount of redundancy between current feature and all those which have been selected thus far. In addition, we utilize a matrix to hold ant related pheromone which shows the rate of the co-presence of every pair of features in solutions. Afterwards, features are ranked based on a probability function extracted from the matrix; then, their m-top is returned as the final solution. We compare the performance of UPFS with 15 well-known supervised and unsupervised feature selection methods using different classifiers (support vector machine, naive Bayes, and k-nearest neighbor) on 10 well-known datasets. The experimental results show the efficiency of the proposed method compared to the previous related methods.  相似文献   

6.
Searching for an optimal feature subset from a high-dimensional feature space is an NP-complete problem; hence, traditional optimization algorithms are inefficient when solving large-scale feature selection problems. Therefore, meta-heuristic algorithms are extensively adopted to solve such problems efficiently. This study proposes a regression-based particle swarm optimization for feature selection problem. The proposed algorithm can increase population diversity and avoid local optimal trapping by improving the jump ability of flying particles. The data sets collected from UCI machine learning databases are used to evaluate the effectiveness of the proposed approach. Classification accuracy is used as a criterion to evaluate classifier performance. Results show that our proposed approach outperforms both genetic algorithms and sequential search algorithms.  相似文献   

7.
Most of the widely used pattern classification algorithms, such as Support Vector Machines (SVM), are sensitive to the presence of irrelevant or redundant features in the training data. Automatic feature selection algorithms aim at selecting a subset of features present in a given dataset so that the achieved accuracy of the following classifier can be maximized. Feature selection algorithms are generally categorized into two broad categories: algorithms that do not take the following classifier into account (the filter approaches), and algorithms that evaluate the following classifier for each considered feature subset (the wrapper approaches). Filter approaches are typically faster, but wrapper approaches deliver a higher performance. In this paper, we present the algorithm – Predictive Forward Selection – based on the widely used wrapper approach forward selection. Using ideas from meta-learning, the number of required evaluations of the target classifier is reduced by using experience knowledge gained during past feature selection runs on other datasets. We have evaluated our approach on 59 real-world datasets with a focus on SVM as the target classifier. We present comparisons with state-of-the-art wrapper and filter approaches as well as one embedded method for SVM according to accuracy and run-time. The results show that the presented method reaches the accuracy of traditional wrapper approaches requiring significantly less evaluations of the target algorithm. Moreover, our method achieves statistically significant better results than the filter approaches as well as the embedded method.  相似文献   

8.
In practice, classifiers in an ensemble are not independent. This paper is the continuation of our previous work on ensemble subset selection [A. Ula?, M. Semerci, O.T. Y?ld?z, E. Alpayd?n, Incremental construction of classifier and discriminant ensembles, Information Sciences, 179 (9) (2009) 1298–1318] and has two parts: first, we investigate the effect of four factors on correlation: (i) algorithms used for training, (ii) hyperparameters of the algorithms, (iii) resampled training sets, (iv) input feature subsets. Simulations using 14 classifiers on 38 data sets indicate that hyperparameters and overlapping training sets have higher effect on positive correlation than features and algorithms. Second, we propose postprocessing before fusing using principal component analysis (PCA) to form uncorrelated eigenclassifiers from a set of correlated experts. Combining the information from all classifiers may be better than subset selection where some base classifiers are pruned before combination, because using all allows redundancy.  相似文献   

9.
Feature selection is used for finding a feature subset that has the most discriminative information from the original feature set. In practice, since we do not know the classifier to be used after feature selection, it is desirable to find a feature subset that is universally effective for any classifier. Such a trial is called classifier-independent feature selection. In this study, we propose a novel classifier-independent feature selection method on the basis of the estimation of Bayes discrimination boundary. The experimental results on 12 real-world datasets showed the fundamental effectiveness of the proposed method.  相似文献   

10.
Image annotation can be formulated as a classification problem. Recently, Adaboost learning with feature selection has been used for creating an accurate ensemble classifier. We propose dynamic Adaboost learning with feature selection based on parallel genetic algorithm for image annotation in MPEG-7 standard. In each iteration of Adaboost learning, genetic algorithm (GA) is used to dynamically generate and optimize a set of feature subsets on which the weak classifiers are constructed, so that an ensemble member is selected. We investigate two methods of GA feature selection: a binary-coded chromosome GA feature selection method used to perform optimal feature subset selection, and a bi-coded chromosome GA feature selection method used to perform optimal-weighted feature subset selection, i.e. simultaneously perform optimal feature subset selection and corresponding optimal weight subset selection. To improve the computational efficiency of our approach, master-slave GA, a parallel program of GA, is implemented. k-nearest neighbor classifier is used as the base classifier. The experiments are performed over 2000 classified Corel images to validate the performance of the approaches.  相似文献   

11.
Computed tomographic (CT) colonography is a promising alternative to traditional invasive colonoscopic methods used in the detection and removal of cancerous growths, or polyps in the colon. Existing computer-aided diagnosis (CAD) algorithms used in CT colonography typically employ the use of a classifier to discriminate between true and false positives generated by a polyp candidate detection system based on a set of features extracted from the candidates. However, these classifiers often suffer from a phenomenon termed the curse of dimensionality, whereby there is a marked degradation in the performance of a classifier as the number of features used in the classifier is increased. In addition, an increase in the number of features used also contributes to an increase in computational complexity and demands on storage space.This paper investigates the benefits of feature selection on a polyp candidate database, with the aim of increasing specificity while preserving sensitivity. Two new mutual information methods for feature selection are proposed in order to select a subset of features for optimum performance. Initial results show that the performance of the widely used support vector machine (SVM) classifier is indeed better with the use of a small set of features, with receiver operating characteristic curve (AUC) measures reaching 0.78-0.88.  相似文献   

12.
Feature selection and feature weighting are useful techniques for improving the classification accuracy of K-nearest-neighbor (K-NN) rule. The term feature selection refers to algorithms that select the best subset of the input feature set. In feature weighting, each feature is multiplied by a weight value proportional to the ability of the feature to distinguish pattern classes. In this paper, a novel hybrid approach is proposed for simultaneous feature selection and feature weighting of K-NN rule based on Tabu Search (TS) heuristic. The proposed TS heuristic in combination with K-NN classifier is compared with several classifiers on various available data sets. The results have indicated a significant improvement in the performance in classification accuracy. The proposed TS heuristic is also compared with various feature selection algorithms. Experiments performed revealed that the proposed hybrid TS heuristic is superior to both simple TS and sequential search algorithms. We also present results for the classification of prostate cancer using multispectral images, an important problem in biomedicine.  相似文献   

13.
Features selection is the process of choosing the relevant subset of features from the high-dimensional dataset to enhance the performance of the classifier. Much research has been carried out in the present world for the process of feature selection. Algorithms such as Naïve Bayes (NB), decision tree, and genetic algorithm are applied to the high-dimensional dataset to select the relevant features and also to increase the computational speed. The proposed model presents a solution for selection of features using ensemble classifier algorithms. The proposed algorithm is the combination of minimum redundancy and maximum relevance (mRMR) and forest optimization algorithm (FOA). Ensemble-based algorithms such as support vector machine (SVM), K-nearest neighbor (KNN), and NB is further used to enhance the performance of the classifier algorithm. The mRMR-FOA is used to select the relevant features from the various datasets and 21% to 24% improvement is recorded in the feature selection. The ensemble classifier algorithms further improves the performance of the algorithm and provides accuracy of 96%.  相似文献   

14.
In fault detection systems, a massive amount of data gathered from the life-cycle of equipment is often used to learn models or classifiers that aims at diagnosing different kinds of errors or failures. Among this huge quantity of information, some features (or sets of features) are more correlated with a kind of failure than another. The presence of irrelevant features might affect the performance of the classifier. To improve the performance of a detection system, feature selection is hence a key step. We propose in this paper an algorithm named STRASS, which aims at detecting relevant features for classification purposes. In certain cases, when there exists a strong correlation between some features and the associated class, conventional feature selection algorithms fail at selecting the most relevant features. In order to cope with this problem, STRASS algorithm uses k-way correlation between features and the class to select relevant features. To assess the performance of STRASS, we apply it on simulated data collected from the Tennessee Eastman chemical plant simulator. The Tennessee Eastman process (TEP) has been used in many fault detection studies and three specific faults are not well discriminated with conventional algorithms. The results obtained by STRASS are compared to those obtained with reference feature selection algorithms. We show that the features selected by STRASS always improve the performance of a classifier compared to the whole set of original features and that the obtained classification is better than with most of the other feature selection algorithms.  相似文献   

15.
Classification is a key problem in machine learning/data mining. Algorithms for classification have the ability to predict the class of a new instance after having been trained on data representing past experience in classifying instances. However, the presence of a large number of features in training data can hurt the classification capacity of a machine learning algorithm. The Feature Selection problem involves discovering a subset of features such that a classifier built only with this subset would attain predictive accuracy no worse than a classifier built from the entire set of features. Several algorithms have been proposed to solve this problem. In this paper we discuss how parallelism can be used to improve the performance of feature selection algorithms. In particular, we present, discuss and evaluate a coarse-grained parallel version of the feature selection algorithm FortalFS. This algorithm performs well compared with other solutions and it has certain characteristics that makes it a good candidate for parallelization. Our parallel design is based on the master--slave design pattern. Promising results show that this approach is able to achieve near optimum speedups in the context of Amdahl's Law.  相似文献   

16.
Selecting an optimal subset from original large feature set in the design of pattern classifier is an important and difficult problem. In this paper, we use tabu search to solve this feature selection problem and compare it with classic algorithms, such as sequential methods, branch and bound method, etc., and most other suboptimal methods proposed recently, such as genetic algorithm and sequential forward (backward) floating search methods. Based on the results of experiments, tabu search is shown to be a promising tool for feature selection in respect of the quality of obtained feature subset and computation efficiency. The effects of parameters in tabu search are also analyzed by experiments.  相似文献   

17.
李鲜  王艳  罗勇  周激流 《计算机应用》2019,39(5):1485-1489
针对医学图像中存在的灰度对比度低、器官组织边界模糊等问题,提出一种新的随机森林(RF)特征选择算法用于鼻咽肿瘤MR图像的分割。首先,充分提取图像的灰度、纹理、几何等特征信息用于构建一个初始的随机森林分类器;随后,结合随机森林特征重要性度量,将改进的特征选择方法应用于原始手工特征集;最终,以得到的最优特征子集构建新的随机森林分类器对测试图像进行分割。实验结果表明,该算法对鼻咽肿瘤的分割精度为:Dice系数79.197%,Acc准确率97.702%,Sen敏感度72.191%,Sp特异性99.502%。通过与基于传统随机森林和基于深度卷积神经网络(DCNN)的分割算法对比可知,所提特征选择算法能有效提取鼻咽肿瘤MR图像中的有用信息,并较大程度地提升小样本情况下鼻咽肿瘤的分割精度。  相似文献   

18.
19.
特征选择技术是机器学习和数据挖掘任务的关键预处理技术。传统贪婪式特征选择方法仅考虑本轮最佳特征,从而导致获取的特征子集仅为局部最优,无法获得最优或者近似最优的特征集合。进化搜索方式则有效地对特征空间进行搜索,然而不同的进化算法在搜索过程中存在自身的局限。本文吸取遗传算法(GA)和粒子群优化算法(PSO)的进化优势,以信息熵度量为评价,通过协同演化的方式获取最终特征子集。并提出适用于特征选择问题特有的比特率交叉算子和信息交换策略。实验结果显示,遗传算法和粒子群协同进化(GA-PSO)在进化搜索特征子集的能力和具体分类学习任务上都优于单独的演化搜索方式。进化搜索提供的组合判断能力优于贪婪式特征选择方法。  相似文献   

20.
Properly determining the discriminative features which characterize the inherent behaviors of electroencephalography (EEG) signals remains a great challenge for epileptic seizure detection. In this present study, a novel feature selection scheme based on the discrete wavelet packet decomposition and cuckoo search algorithm (CSA) was proposed. The normal as well as epileptic EEG recordings were first decomposed into various frequency bands by means of wavelet packet decomposition, and subsequently, statistical features at all developed nodes in the wavelet packet decomposition tree were derived. Instead of using the complete set of the extracted features to construct a wavelet neural networks-based classifier, an optimal feature subset that maximizes the predictive competence of the classifier was selected by using the CSA. Experimental results on the publicly available benchmarks demonstrated that the proposed feature subset selection scheme achieved promising recognition accuracies of 98.43–100%, and the results were statistically significant using z-test with p value <0.0001.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号