首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
随着各类生物智能演化算法的日益成熟,基于演化技术及其混合算法的特征选择方法不断涌现。针对高维小样本安全数据的特征选择问题,将文化基因算法(Memetic Algorithm,MA)与最小二乘支持向量机(Least Squares Support Vector Machine,LS-SVM)进行结合,设计了一种封装式(Wrapper)特征选择方法(MA-LSSVM)。该方法利用最小二乘支持向量机易于求解的特点来构造分类器,以分类的准确率作为文化基因算法寻优过程中适应度函数的主要成分。实验表明,MA-LSSVM可以较高效地、稳定地获取对分类贡献较大的特征,降低了数据维度,提高了分类效率。  相似文献   

2.
We propose a multivariate feature selection method that uses proximity graphs for assessing the quality of feature subsets. Initially, a complete graph is built, where nodes are the samples, and edge weights are calculated considering only the selected features. Next, a proximity graph is constructed on the basis of these weights and different fitness functions, calculated over the proximity graph, to evaluate the quality of the selected feature set. We propose an iterative methodology on the basis of a memetic algorithm for exploring the space of possible feature subsets aimed at maximizing a quality score. We designed multiple local search strategies, and we used an adaptive strategy for automatic balancing between the global and local search components of the memetic algorithm. The computational experiments were carried out using four well‐known data sets. We investigate the suitability of three different proximity graphs (minimum spanning tree, k‐nearest neighbors, and relative neighborhood graph) for the proposed approach. The selected features have been evaluated using a total of 49 classification methods from an open‐source data mining and machine learning package (WEKA). The computational results show that the proposed adaptive memetic algorithm can perform better than traditional genetic algorithms in finding more useful feature sets. Finally, we establish the competitiveness of our approach by comparing it with other well‐known feature selection methods.  相似文献   

3.
特征选择通过去除无关和冗余特征提高学习算法性能,本质是组合优化问题。黑寡妇算法是模拟黑寡妇蜘蛛生命周期的元启发式算法,在收敛速度、适应度值优化等方面具有诸多优势。针对黑寡妇算法不能进行特征选择的问题,设计五种优化策略:二进制策略、“或门”策略、种群限制策略、快速生殖策略以及适应度优先策略,提出黑寡妇特征选择算法(black widow optimization feature selection algorithm,BWOFS)和生殖调控黑寡妇特征选择算法(procreation controlled black widow optimization feature selection algorithm,PCBWOFS),从特征空间中搜索有效特征子集。在多个分类、回归公共数据集上验证新方法,实验结果表明,相较其他对比方法(全集、AMB、SFS、SFFS、FSFOA),BWOFS和PCBWOFS能找到预测精度更高的特征子集,可提供有竞争力、有前景的结果,而且与BWOFS相比,PCBWOFS计算量更小,性能更好。  相似文献   

4.
This correspondence presents a novel hybrid wrapper and filter feature selection algorithm for a classification problem using a memetic framework. It incorporates a filter ranking method in the traditional genetic algorithm to improve classification performance and accelerate the search in identifying the core feature subsets. Particularly, the method adds or deletes a feature from a candidate feature subset based on the univariate feature ranking information. This empirical study on commonly used data sets from the University of California, Irvine repository and microarray data sets shows that the proposed method outperforms existing methods in terms of classification accuracy, number of selected features, and computational efficiency. Furthermore, we investigate several major issues of memetic algorithm (MA) to identify a good balance between local search and genetic search so as to maximize search quality and efficiency in the hybrid filter and wrapper MA  相似文献   

5.
This correspondence presents a novel hybrid wrapper and filter feature selection algorithm for a classification problem using a memetic framework. It incorporates a filter ranking method in the traditional genetic algorithm to improve classification performance and accelerate the search in identifying the core feature subsets. Particularly, the method adds or deletes a feature from a candidate feature subset based on the univariate feature ranking information. This empirical study on commonly used data sets from the University of California, Irvine repository and microarray data sets shows that the proposed method outperforms existing methods in terms of classification accuracy, number of selected features, and computational efficiency. Furthermore, we investigate several major issues of memetic algorithm (MA) to identify a good balance between local search and genetic search so as to maximize search quality and efficiency in the hybrid filter and wrapper MA.  相似文献   

6.
Financially distressed prediction (FDP) has been a widely and continually studied topic in the field of corporate finance. One of the core problems to FDP is to design effective feature selection algorithms. In contrast to existing approaches, we propose an integrated approach to feature selection for the FDP problem that embeds expert knowledge with the wrapper method. The financial features are categorized into seven classes according to their financial semantics based on experts’ domain knowledge surveyed from literature. We then apply the wrapper method to search for “good” feature subsets consisting of top candidates from each feature class. For concept verification, we compare several scholars’ models as well as leading feature selection methods with the proposed method. Our empirical experiment indicates that the prediction model based on the feature set selected by the proposed method outperforms those models based on traditional feature selection methods in terms of prediction accuracy.  相似文献   

7.
In classification, every feature of the data set is an important contributor towards prediction accuracy and affects the model building cost. To extract the priority features for prediction, a suitable feature selector is schemed. This paper proposes a novel memetic based feature selection model named Shapely Value Embedded Genetic Algorithm (SVEGA). The relevance of each feature towards prediction is measured by assembling genetic algorithms with shapely value measures retrieved from SVEGA. The obtained results are then evaluated using Support Vector Machine (SVM) with different kernel configurations on 11 + 11 benchmark datasets (both binary class and multi class). Eventually, a contrasting analysis is done between SVEGA-SVM and other existing feature selection models. The experimental results with the proposed setup provides robust outcome; hence proving it to be an efficient approach for discovering knowledge via feature selection with improved classification accuracy compared to conventional methods.  相似文献   

8.
基于PSO的LS-SVM特征选择与参数优化算法   总被引:1,自引:0,他引:1       下载免费PDF全文
针对最小二乘支持向量机特征选择及参数优化问题,提出了一种基于PSO的LS-SVM特征选择与参数同步优化算法。首先产生若干种群(特征子集),然后用PSO算法对特征及参数进行优化。在UCI标准数据集上进行的仿真实验表明,该算法可有效地找出合适的特征子集及LS-SVM参数,且与基于遗传算法的最小二乘支持向量机算法(GALS-SVM)和传统的LS-SVM算法相比具有较好的分类效果。  相似文献   

9.
A genetic algorithm-based method for feature subset selection   总被引:5,自引:2,他引:3  
As a commonly used technique in data preprocessing, feature selection selects a subset of informative attributes or variables to build models describing data. By removing redundant and irrelevant or noise features, feature selection can improve the predictive accuracy and the comprehensibility of the predictors or classifiers. Many feature selection algorithms with different selection criteria has been introduced by researchers. However, it is discovered that no single criterion is best for all applications. In this paper, we propose a framework based on a genetic algorithm (GA) for feature subset selection that combines various existing feature selection methods. The advantages of this approach include the ability to accommodate multiple feature selection criteria and find small subsets of features that perform well for a particular inductive learning algorithm of interest to build the classifier. We conducted experiments using three data sets and three existing feature selection methods. The experimental results demonstrate that our approach is a robust and effective approach to find subsets of features with higher classification accuracy and/or smaller size compared to each individual feature selection algorithm.  相似文献   

10.
孙林  赵婧  徐久成  王欣雅 《计算机应用》2022,42(5):1355-1366
针对经典的帝王蝶优化(MBO)算法不能很好地处理连续型数据,以及粗糙集模型对于大规模、高维复杂的数据处理能力不足等问题,提出了基于邻域粗糙集(NRS)和MBO的特征选择算法。首先,将局部扰动和群体划分策略与MBO算法结合,并构建传输机制以形成一种二进制MBO(BMBO)算法;其次,引入突变算子增强算法的探索能力,设计了基于突变算子的BMBO(BMBOM)算法;然后,基于NRS的邻域度构造适应度函数,并对初始化的特征子集的适应度值进行评估并排序;最后,使用BMBOM算法通过不断迭代搜索出最优特征子集,并设计了一种元启发式特征选择算法。在基准函数上评估BMBOM算法的优化性能,并在UCI数据集上评价所提出的特征选择算法的分类能力。实验结果表明,在5个基准函数上,BMBOM算法的最优值、最差值、平均值以及标准差明显优于MBO和粒子群优化(PSO)算法;在UCI数据集上,与基于粗糙集的优化特征选择算法、结合粗糙集与优化算法的特征选择算法、结合NRS与优化算法的特征选择算法、基于二进制灰狼优化的特征选择算法相比,所提特征选择算法在分类精度、所选特征数和适应度值这3个指标上表现良好,能够选择特征数少且分类精度高的最优特征子集。  相似文献   

11.
This paper presents a novel wrapper feature selection algorithm for classification problems, namely hybrid genetic algorithm (GA)- and extreme learning machine (ELM)-based feature selection algorithm (HGEFS). It utilizes GA to wrap ELM to search for the optimum subsets in the huge feature space, and then, a set of subsets are selected to make ensemble to improve the final prediction accuracy. To prevent GA from being trapped in the local optimum, we propose a novel and efficient mechanism specifically designed for feature selection problems to maintain GA’s diversity. To measure each subset’s quality fairly and efficiently, we adopt a modified ELM called error-minimized extreme learning machine (EM-ELM) which automatically determines an appropriate network architecture for each feature subsets. Moreover, EM-ELM has good generalization ability and extreme learning speed which allows us to perform wrapper feature selection processes in an affordable time. In other words, we simultaneously optimize feature subset and classifiers’ parameters. After finishing the search process of GA, to further promote the prediction accuracy and get a stable result, we select a set of EM-ELMs from the obtained population to make the final ensemble according to a specific ranking and selecting strategy. To verify the performance of HGEFS, empirical comparisons are carried out on different feature selection methods and HGEFS with benchmark datasets. The results reveal that HGEFS is a useful method for feature selection problems and always outperforms other algorithms in comparison.  相似文献   

12.
为解决垃圾网页检测过程中的“维数灾难”和不平衡分类问题,提出一种基于免疫克隆特征选择和欠采样(US)集成的二元分类器算法。首先,使用欠采样技术将训练样本集大类抽样成多个与小类样本数相近的样本集,再将其分别与小类样本合并构成多个平衡的子训练样本集;然后,设计一种免疫克隆算法遴选出多个最优的特征子集;基于最优特征子集对平衡的子样本集进行投影操作,生成平衡数据集的多个视图;最后,用随机森林(RF)分类器对测试样本进行分类,采用简单投票法确定测试样本的最终类别。在WEBSPAM UK-2006数据集上的实验结果表明,该集成分类器算法应用于垃圾网页检测:与随机森林算法及其Bagging和AdaBoost集成分类器算法相比,准确率、F1测度、AUC等指标均提高11%以上;与其他最优的研究结果相比,该集成分类器算法在F1测度上提高2%,在AUC上达到最优。  相似文献   

13.
基于双群双域四向水平倾角最小化圈绕的凸壳并行新算法   总被引:3,自引:3,他引:0  
本文针对现行凸壳算法(诸如:串行类的卷包裹凸壳算法、格雷厄姆凸壳算法等,并行类的折半分治凸壳算 法、快速凸壳算法等)效率不高的缺点,根据同构化凸壳构造基本定理,利用工作站机群优点,提出了效率更高的双群(即:其机群分为2个子机群)、双域(即:其数据分布域分为2个子分布域)、四向(即:其每个子分布域内凸壳顶点的寻找方向均各自为顺时针、逆时针2个寻找方向)水平倾角最小化圈绕的凸壳并行新算法.  相似文献   

14.
特征选择作为一个数据预处理过程,在数据挖掘、模式识别和机器学习中有着重要地位。通过特征选择,可以降低问题的复杂度,提高学习算法的预测精度、鲁棒性和可解释性。介绍特征选择方法框架,重点描述生成特征子集、评价准则两个过程;根据特征选择和学习算法的不同结合方式对特征选择算法分类,并分析各种方法的优缺点;讨论现有特征选择算法存在的问题,提出一些研究难点和研究方向。  相似文献   

15.
针对原始病理图像经软件提取形态学特征后存在高维度,以及医学领域上样本的少量性问题,提出ReliefF-HEPSO头颈癌病理图像特征选择算法。该算法构建了多层次降维框架,首先根据特征和类别的相关性,利用ReliefF算法确定不同的特征权重,实现初步降维。其次利用进化神经策略(ENS)丰富二进制粒子群算法(BPSO)的种群的多样性,提出混合二进制进化粒子群算法(HEPSO)对候选特征子集完成最佳特征子集的自动寻找。与7种特征选择算法的实验对比结果证明,该算法能更有效筛选出高相关性的病理图像形态学特征,实现快速降维,以较少特征获得较高分类性能。  相似文献   

16.
提出了一种基于二次Renyi's熵的正则化互信息特征选择方法,该方法能高效地对互信息进行估计从而使计算复杂度大大降低。同时把正则化互信息特征选择方法与嵌入式方法相结合得到一个两段式特征选择算法,该算法可以找出更具特征的特征子集。通过实验比较了该方法与其他基于互信息的特征选择算法的效率与分类精度,结果表明该方法能够有效改善计算复杂度。  相似文献   

17.
Feature selection is an important method of data preprocessing in data mining. In this paper, a novel feature selection method based on multi-fractal dimension and harmony search algorithm is proposed. Multi-fractal dimension is adopted as the evaluation criterion of feature subset, which can determine the number of selected features. An improved harmony search algorithm is used as the search strategy to improve the efficiency of feature selection. The performance of the proposed method is compared with that of other feature selection algorithms on UCI data-sets. Besides, the proposed method is also used to predict the daily average concentration of PM2.5 in China. Experimental results show that the proposed method can obtain competitive results in terms of both prediction accuracy and the number of selected features.  相似文献   

18.
A new improved forward floating selection (IFFS) algorithm for selecting a subset of features is presented. Our proposed algorithm improves the state-of-the-art sequential forward floating selection algorithm. The improvement is to add an additional search step called “replacing the weak feature” to check whether removing any feature in the currently selected feature subset and adding a new one at each sequential step can improve the current feature subset. Our method provides the optimal or quasi-optimal (close to optimal) solutions for many selected subsets and requires significantly less computational load than optimal feature selection algorithms. Our experimental results for four different databases demonstrate that our algorithm consistently selects better subsets than other suboptimal feature selection algorithms do, especially when the original number of features of the database is large.  相似文献   

19.
Several studies have demonstrated the superior performance of ensemble classification algorithms, whereby multiple member classifiers are combined into one aggregated and powerful classification model, over single models. In this paper, two rotation-based ensemble classifiers are proposed as modeling techniques for customer churn prediction. In Rotation Forests, feature extraction is applied to feature subsets in order to rotate the input data for training base classifiers, while RotBoost combines Rotation Forest with AdaBoost. In an experimental validation based on data sets from four real-life customer churn prediction projects, Rotation Forest and RotBoost are compared to a set of well-known benchmark classifiers. Moreover, variations of Rotation Forest and RotBoost are compared, implementing three alternative feature extraction algorithms: principal component analysis (PCA), independent component analysis (ICA) and sparse random projections (SRP). The performance of rotation-based ensemble classifier is found to depend upon: (i) the performance criterion used to measure classification performance, and (ii) the implemented feature extraction algorithm. In terms of accuracy, RotBoost outperforms Rotation Forest, but none of the considered variations offers a clear advantage over the benchmark algorithms. However, in terms of AUC and top-decile lift, results clearly demonstrate the competitive performance of Rotation Forests compared to the benchmark algorithms. Moreover, ICA-based Rotation Forests outperform all other considered classifiers and are therefore recommended as a well-suited alternative classification technique for the prediction of customer churn that allows for improved marketing decision making.  相似文献   

20.
Real estate is an important industry in most countries.However,the analysis of the real estate market is very challenging as the data are high dimensional and have complex spatial and temporal patterns.In this paper,we present a novel Web-based visual analytics system,which integrates state-of-the-art interactive visualizations to enable end users to create their own visualizations and gain insight into the real estate market.The system is implemented using the new features in HTML5,which are natively supported in current browsers.We adopt a coordinated view design in our system consisting of four major components:a map view to show the geographical information of houses,a stacked graph view to show the evolution of house sales over time,a pixel-bar view to visualize multiple attributes of houses,and a treemap view to present the hierarchical structure of the data.Novel clutter reduction methods and rich user interactions are further proposed to enhance the flexibility and analytical ability of the whole system.We have applied our system to real property market data and obtained some interesting findings.Moreover,feedback from the end users of our system is very positive.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号