首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
Classification of intrusion attacks and normal network traffic is a challenging and critical problem in pattern recognition and network security. In this paper, we present a novel intrusion detection approach to extract both accurate and interpretable fuzzy IF-THEN rules from network traffic data for classification. The proposed fuzzy rule-based system is evolved from an agent-based evolutionary framework and multi-objective optimization. In addition, the proposed system can also act as a genetic feature selection wrapper to search for an optimal feature subset for dimensionality reduction. To evaluate the classification and feature selection performance of our approach, it is compared with some well-known classifiers as well as feature selection filters and wrappers. The extensive experimental results on the KDD-Cup99 intrusion detection benchmark data set demonstrate that the proposed approach produces interpretable fuzzy systems, and outperforms other classifiers and wrappers by providing the highest detection accuracy for intrusion attacks and low false alarm rate for normal network traffic with minimized number of features.  相似文献   

2.
Software cost estimation is one of the most crucial activities in software development process. In the past decades, many methods have been proposed for cost estimation. Case based reasoning (CBR) is one of these techniques. Feature selection is an important preprocessing stage of case based reasoning. Most existing feature selection methods of case based reasoning are ‘wrappers’ which can usually yield high fitting accuracy at the cost of high computational complexity and low explanation of the selected features. In our study, the mutual information based feature selection (MICBR) is proposed. This approach hybrids both ‘wrapper’ and ‘filter’ mechanism which is another kind of feature selector with much lower complexity than wrappers, and the features selected by filters are likely to be generalized to other conditions. The MICBR is then compared with popular feature selectors and the published works. The results show that the MICBR is an effective feature selector for case based reasoning by overcoming some of the limitations and computational complexities of other feature selection techniques in the field.  相似文献   

3.
Feature selection is an important filtering method for data analysis, pattern classification, data mining, and so on. Feature selection reduces the number of features by removing irrelevant and redundant data. In this paper, we propose a hybrid filter–wrapper feature subset selection algorithm called the maximum Spearman minimum covariance cuckoo search (MSMCCS). First, based on Spearman and covariance, a filter algorithm is proposed called maximum Spearman minimum covariance (MSMC). Second, three parameters are proposed in MSMC to adjust the weights of the correlation and redundancy, improve the relevance of feature subsets, and reduce the redundancy. Third, in the improved cuckoo search algorithm, a weighted combination strategy is used to select candidate feature subsets, a crossover mutation concept is used to adjust the candidate feature subsets, and finally, the filtered features are selected into optimal feature subsets. Therefore, the MSMCCS combines the efficiency of filters with the greater accuracy of wrappers. Experimental results on eight common data sets from the University of California at Irvine Machine Learning Repository showed that the MSMCCS algorithm had better classification accuracy than the seven wrapper methods, the one filter method, and the two hybrid methods. Furthermore, the proposed algorithm achieved preferable performance on the Wilcoxon signed-rank test and the sensitivity–specificity test.  相似文献   

4.
Gene selection is a significant preprocessing of the discriminant analysis of microarray data. The classical gene selection methods can be classified into three categories: the filters, the wrappers and the embedded methods. In this paper, a novel hybrid gene selection method (HGSM) is proposed by exploring both the mutual information criterion (filters) and leave-one-out-error criterion (wrappers) under the framework of an improved ant algorithm. Extensive experiments are conducted on three benchmark datasets and the results confirm the effectiveness and efficiency of HGSM.  相似文献   

5.
特征选择方法主要包括过滤方法和绕封方法。为了利用过滤方法计算简单和绕封方法精度高的优点,提出一种组合过滤和绕封方法的特征选择新方法。该方法首先利用基于互信息准则的过滤方法得到满足一定精度要求的子集后,再采用绕封方法找到最后的优化特征子集。由于遗传算法在组合优化问题上的成功应用,对特征子集寻优采用了遗传算法。在数值仿真和轴承故障特征选择中,采用新方法在保证诊断精度的同时,可以节省大量选择时间。组合特征选择方法有较好的寻优特征子集的能力,能够节省选择时间,具有高效、高精度的双重优点。  相似文献   

6.
全基因组关联研究是研究复杂疾病和性状遗传效应的一种有效手段。现有关联分析主要用的是边缘统计检验的方法,但未考虑特征间相关性、阈值选取不稳定等问题。该文以心脑血管疾病为研究对象,提出了一种基于多步筛选法的全基因组关联分析新方法。该方法可以简要概括为以下 两步:首先利用 Gini 指数做特征初始筛选,获得一个候选单核苷酸多态性子集,再用基于随机森林的递归聚类消除法从单核苷酸多态性子集中发现关联单核苷酸多态性。实验结果表明,多步筛选法比单步特征选择的效果更好,基于 Gini 指数的基于随机森林的递归聚类消除法筛选的单核苷酸多态性子集与疾病的关联性更高。  相似文献   

7.
数据挖掘中如何有效地从高维特征空间选择最优特征子集,很大程度上影响模型的预测结果,基于此本文提出一种复合适应性函数、多特征组合搜索的自适应性遗传算法。算法依据统计学原理对原始特征先行过滤构建特征候选集,使用多模型融合的交叉验证结果作为适应性函数以提高每轮进化的适应值,轮盘赌算法、定长基因段交叉算法、随机基因位点变异算法分别构成选择算子、交叉算子和变异算子。通过实验对比表明该遗传算法具有一定的稳定性和有效性,能够在原始特征空间中启发性的选择最优特征子集,从而提高数值型预测准确率。  相似文献   

8.
A review of feature selection methods on synthetic data   总被引:2,自引:1,他引:1  
With the advent of high dimensionality, adequate identification of relevant features of the data has become indispensable in real-world scenarios. In this context, the importance of feature selection is beyond doubt and different methods have been developed. However, with such a vast body of algorithms available, choosing the adequate feature selection method is not an easy-to-solve question and it is necessary to check their effectiveness on different situations. Nevertheless, the assessment of relevant features is difficult in real datasets and so an interesting option is to use artificial data. In this paper, several synthetic datasets are employed for this purpose, aiming at reviewing the performance of feature selection methods in the presence of a crescent number or irrelevant features, noise in the data, redundancy and interaction between attributes, as well as a small ratio between number of samples and number of features. Seven filters, two embedded methods, and two wrappers are applied over eleven synthetic datasets, tested by four classifiers, so as to be able to choose a robust method, paving the way for its application to real datasets.  相似文献   

9.
This paper presents a hybrid filter-wrapper feature subset selection algorithm based on particle swarm optimization (PSO) for support vector machine (SVM) classification. The filter model is based on the mutual information and is a composite measure of feature relevance and redundancy with respect to the feature subset selected. The wrapper model is a modified discrete PSO algorithm. This hybrid algorithm, called maximum relevance minimum redundancy PSO (mr2PSO), is novel in the sense that it uses the mutual information available from the filter model to weigh the bit selection probabilities in the discrete PSO. Hence, mr2PSO uniquely brings together the efficiency of filters and the greater accuracy of wrappers. The proposed algorithm is tested over several well-known benchmarking datasets. The performance of the proposed algorithm is also compared with a recent hybrid filter-wrapper algorithm based on a genetic algorithm and a wrapper algorithm based on PSO. The results show that the mr2PSO algorithm is competitive in terms of both classification accuracy and computational performance.  相似文献   

10.

This paper is about enhancing the smart grid by proposing a new hybrid feature-selection method called feature selection-based ranking (FSBR). In general, feature selection is to exclude non-promising features out from the collected data at Fog. This could be achieved using filter methods, wrapper methods, or a hybrid. Our proposed method consists of two phases: filter and wrapper phases. In the filter phase, the whole data go through different ranking techniques (i.e., relative weight ranking, effectiveness ranking, and information gain ranking) The results of these ranks are sent to a fuzzy inference engine to generate the final ranks. In the wrapper phase, data is being selected based on the final ranks and passed on three different classifiers (i.e., Naive Bayes, Support Vector Machine, and neural network) to select the best set of the features based on the performance of the classifiers. This process can enhance the smart grid by reducing the amount of data being sent to the cloud, decreasing computation time, and decreasing data complexity. Thus, the FSBR methodology enables the user load forecasting (ULF) to take a fast decision, the fast reaction in short-term load forecasting, and to provide a high prediction accuracy. The authors explain the suggested approach via numerical examples. Two datasets are used in the applied experiments. The first dataset reported that the proposed method was compared with six other methods, and the proposed method was represented the best accuracy of 91%. The second data set, the generalization data set, reported 90% accuracy of the proposed method compared to fourteen different methods.

  相似文献   

11.
The image mining technique deals with the extraction of implicit knowledge and image with data relationship or other patterns not explicitly stored in the images. It is an extension of data mining to image domain. The main objective of this paper is to apply image mining in the domain such as breast mammograms to classify and detect the cancerous tissue. Mammogram image can be classified into normal, benign, and malignant class. Total of 26 features including histogram intensity features and gray-level co-occurrence matrix features are extracted from mammogram images. A hybrid approach of feature selection is proposed, which approximately reduces 75% of the features, and new decision tree is used for classification. The most interesting one is that branch and bound algorithm that is used for feature selection provides the best optimal features and no where it is applied or used for gray-level co-occurrence matrix feature selection from mammogram. Experiments have been taken for a data set of 300 images taken from MIAS of different types with the aim of improving the accuracy by generating minimum number of rules to cover more patterns. The accuracy obtained by this method is approximately 97.7%, which is highly encouraging.  相似文献   

12.
Malware replicates itself and produces offspring with the same characteristics but different signatures by using code obfuscation techniques. Current generation Anti-Virus (AV) engines employ a signature-template type detection approach where malware can easily evade existing signatures in the database. This reduces the capability of current AV engines in detecting malware. In this paper we propose a hybrid framework for malware detection by using the hybrids of Support Vector Machines Wrapper, Maximum-Relevance–Minimum-Redundancy Filter heuristics where Application Program Interface (API) call statistics are used as a malware features. The novelty of our hybrid framework is that it injects the filter’s ranking score in the wrapper selection process and combines the properties of both wrapper and filters and API call statistics which can detect malware based on the nature of infectious actions instead of signature. To the best of our knowledge, this kind of hybrid approach has not been explored yet in the literature in the context of feature selection and malware detection. Knowledge about the intrinsic characteristics of malicious activities is determined by the API call statistics which is injected as a filter score into the wrapper’s backward elimination process in order to find the most significant APIs. While using the most significant APIs in the wrapper classification on both obfuscated and benign types malware datasets, the results show that the proposed hybrid framework clearly surpasses the existing models including the independent filters and wrappers using only a very compact set of significant APIs. The performances of the proposed and existing models have further been compared using binary logistic regression. Various goodness of fit comparison criteria such as Chi Square, Akaike’s Information Criterion (AIC) and Receiver Operating Characteristic Curve ROC are deployed to identify the best performing models. Experimental outcomes based on the above criteria also show that the proposed hybrid framework outperforms other existing models of signature types including independent wrapper and filter approaches to identify malware.  相似文献   

13.
Rough set theory (RS) has been a topic of general interest in the field of knowledge discovery and pattern recognition. Machine learning algorithms are known to degrade in performance when faced with many features (sometimes attributes) that are not necessary for rule discovery. Many methods for selecting a subset of features have been proposed. However, only one method cannot handle the complex system with many attributes or features, so a hybrid mechanism is proposed based on rough set integrating artificial neural network (Rough-ANN) for feature selection in pattern recognition. RS-based attributes reduction as the preprocessor can decrease the inputs of the NN and improve the speed of training. So the sensitivity of rough set to noise can be avoided and the system’s robustness is to be improved. A RS-based heuristic algorithm is proposed for feature selection. The approach can select an optimal subset of features quickly and effectively from a large database with a lot of features. Moreover, the validity of the proposed hybrid recognizer and solution is verified by the application of practical experiments and fault diagnosis in industrial process.  相似文献   

14.
为获取文本中的较优特征子集,剔除干扰和冗余特征,提出了一种结合过滤式算法和群智能算法的混合特征寻优算法。首先计算每个特征词的信息增益值,选取较优的特征作为预选特征集合,再利用正余弦算法对预选特征进行寻优,获取精选特征集合。为较好地平衡正余弦算法中的全局搜索和局部开发能力,加入了自适应惯性权重;为更精确地评价特征子集,引入以特征数量和准确率进行加权的适应度函数,并提出了新的位置更新机制。在KNN和贝叶斯分类器上的实验结果表明,该特征选择算法与其它特征选择算法及改进前的算法相比,分类准确率得到了一定的提升。  相似文献   

15.
Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.  相似文献   

16.
Feature selection has long been an active research topic in machine learning. Beginning with an empty set of features, it selects features most necessary for learning a target concept. Feature elimination, a newer technique, starts out with a full set of features and eliminates those most unnecessary for learning the target concept. Feature elimination tends to be more effective, can capture interacting features more easily, and suffers less from feature interaction than feature selection. Because the most unnecessary features are eliminated from the beginning, they will not mislead the induction process in terms of efficiency or accuracy. Induction-algorithm-oriented feature elimination, with particular parameter configurations, can achieve higher predictive accuracy than existing popular feature selection approaches. We propose two sets of well-tuned parameters based on empirical analysis. To understand how to achieve the best performance possible from IAOFE, we conducted a comprehensive analysis of IAOFE parameter tuning.  相似文献   

17.
运动想象脑电是一种多通道高维信号,特征选择可以降低特征维数,选择更具判别性的特征,从而有效提高脑电解码的性能。现有的特征选择方法主要包括过滤式、包裹式和嵌入式方法,这3类方法各有优缺点。为了综合利用各类方法的优势,提出2种混合特征选择方法。第1种方法,使用最小绝对值收缩和选择算子(LASSO)进行特征选择,得到LASSO模型的权重之后,再设定一系列权重阈值进行二次特征筛选。第2种方法,使用Fisher分数对特征进行评分,然后设定一系列权重阈值进行二次特征筛选。使用Fisher线性判别分析(FLDA)对2种方法选择的特征子集进行分类。在2组脑机接口(BCI)竞赛数据集和1组实验室自采集数据集上进行实验,最高平均分类准确率分别为77.47%、76.11%、71.30%。实验结果表明,所提出的方法其分类性能优于现有的特征选择方法,而且特征选择时间也具有较大优势。  相似文献   

18.
This paper presents a novel wrapper feature selection algorithm for classification problems, namely hybrid genetic algorithm (GA)- and extreme learning machine (ELM)-based feature selection algorithm (HGEFS). It utilizes GA to wrap ELM to search for the optimum subsets in the huge feature space, and then, a set of subsets are selected to make ensemble to improve the final prediction accuracy. To prevent GA from being trapped in the local optimum, we propose a novel and efficient mechanism specifically designed for feature selection problems to maintain GA’s diversity. To measure each subset’s quality fairly and efficiently, we adopt a modified ELM called error-minimized extreme learning machine (EM-ELM) which automatically determines an appropriate network architecture for each feature subsets. Moreover, EM-ELM has good generalization ability and extreme learning speed which allows us to perform wrapper feature selection processes in an affordable time. In other words, we simultaneously optimize feature subset and classifiers’ parameters. After finishing the search process of GA, to further promote the prediction accuracy and get a stable result, we select a set of EM-ELMs from the obtained population to make the final ensemble according to a specific ranking and selecting strategy. To verify the performance of HGEFS, empirical comparisons are carried out on different feature selection methods and HGEFS with benchmark datasets. The results reveal that HGEFS is a useful method for feature selection problems and always outperforms other algorithms in comparison.  相似文献   

19.
郭娜  刘聪  李彩虹  陆婷  闻立杰  曾庆田 《软件学报》2024,35(3):1341-1356
流程剩余时间预测对于业务异常的预防和干预有着重要的价值和意义.现有的剩余时间预测方法通过深度学习技术达到了更高的准确率,然而大多数深度模型结构复杂难以解释预测结果,即不可解释问题.此外,剩余时间预测除了活动这一关键属性还会根据领域知识选择若干其他属性作为预测模型的输入特征,缺少通用的特征选择方法,对于预测的准确率和模型的可解释性存在一定的影响.针对上述问题,提出基于可解释特征分层模型(explainable feature-based hierarchical model,EFH model)的流程剩余时间预测框架.具体而言,首先提出特征自选择策略,通过基于优先级的后向特征删除和基于特征重要性值的前向特征选择,得到对预测任务具有积极影响的属性作为模型输入.然后提出可解释特征分层模型架构,通过逐层加入不同特征得到每层的预测结果,解释特征值与预测结果的内在联系.采用LightGBM (light gradient boosting machine)和LSTM (long short-term memory)算法实例化所提方法,框架是通用的,不限于选用算法.最后在8个真实事件日志上与最新方法进行比较.实验结果表明所提方法能够选取出有效特征,提高预测的准确率,并解释预测结果.  相似文献   

20.
Cropland classification using optical and full polarimetric synthetic aperture radar (PolSAR) images is a topic of considerable interest in the remote-sensing community. These two data sources can provide a diverse set of temporal, spectral, textural and polarimetric features which can be invaluable for cropland classification. However, some optical features or some radar features may have a relatively high correlation with other features. Hence, it seems to be necessary to choose the optimum features in order to reduce the dimensions of the data and to improve cropland classification accuracy. This article proposes a strategic feature selection method from a feature set of bitemporal RapidEye and Uninhabited Aerial Vehicle synthetic aperture radar (UAVSAR) images. The proposed method is designed to select the most relevant features and to remove redundant features based on the two concepts of separability and dependency. The proposed method is therefore referred to as maximum separability and minimum dependency (MSMD). For evaluating efficiency, MSMD and some well-known filter and wrapper feature selection methods are compared using a random forest classifier. Experimental tests confirmed that the classification results obtained from the MSMD feature selection method were more accurate than those achieved by filter methods. Moreover, they had an accuracy comparable to that of the results from the wrapper method. Furthermore, with regard to running time, MSMD operated as fast as the filter methods. It had a straightforward structure compared to the wrapper method, and as a result was faster than this method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号