首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 20 毫秒
1.
适合于入侵检测的分步特征选择算法   总被引:1,自引:0,他引:1       下载免费PDF全文
针对入侵检测数据集维数高,导致检测算法处理速度慢,而其中包含许多对检测效果影响不大的特征的问题,提出了一种分步特征选择算法。它通过对相关特征和冗余特征的定义,以互信息为准则,首先删除不相关特征,然后删除冗余特征。该算法的时间复杂性低,且独立于检测算法,可以通过调整阈值平衡检测精度和特征的数量。以权威数据集KDD-99为实验数据集,对多种检测算法进行了实验。结果表明,该算法能有效地选择特征向量,保证检测精度,提高检测速度。  相似文献   

2.
针对在模式分类问题中,数据往往存在不相关的或冗余的特征,从而影响分类的准确性的问题,提出一种融合Shapley值和粒子群优化算法的混合特征选择算法,以利用最少的特征获得最佳分类效果。在粒子群优化算法的局部搜索中引入博弈论的Shapley值,首先计算粒子(特征子集)中每个特征对分类效果的贡献值(Shapley值),然后逐步删除Shapley值最低的特征以优化特征子集,进而更新粒子,同时也增强了算法的全局搜索能力,最后将改进后的粒子群优化算法运用于特征选择,以支持向量机分类器的分类性能和选择的特征数目作为特征子集评价标准,对UCI机器学习数据集和基因表达数据集的17个具有不同特征数量的医疗数据集进行分类实验。实验结果表明所提算法能有效地删除数据集中55%以上不相关的或冗余的特征,尤其对于中大型数据集能删减80%以上,并且所选择的特征子集也具有较好的分类能力,分类准确率能提高2至23个百分点。  相似文献   

3.
高维数据集包含了成千上万可用于数据分析和预测的特征,然而这些数据集存在许多不相关或冗余特征,影响了数据分析和预测的准确性。现有分类技术难以准确地识别最佳特征子集。针对该问题,提出了一种基于wrapper模式的特征选择方法AB-CRO,该方法结合了人工蜂群算法(ABC)和改进的化学反应算法(CRO)的优点进行特征选择。针对迭代过程中较优的个体可能在化学反应过程中被消耗掉的现象,适当地加入精英策略来保持种群的优良性。实验结果表明,AB-CRO算法在最佳特征子集的识别和分类精度方面相对于基准算法ABC,CRO以及基于GA,PSO和混合蛙跳算法都所有改进。  相似文献   

4.
The degree of malignancy in brain glioma is assessed based on magnetic resonance imaging (MRI) findings and clinical data before operation. These data contain irrelevant features, while uncertainties and missing values also exist. Rough set theory can deal with vagueness and uncertainty in data analysis, and can efficiently remove redundant information. In this paper, a rough set method is applied to predict the degree of malignancy. As feature selection can improve the classification accuracy effectively, rough set feature selection algorithms are employed to select features. The selected feature subsets are used to generate decision rules for the classification task. A rough set attribute reduction algorithm that employs a search method based on particle swarm optimization (PSO) is proposed in this paper and compared with other rough set reduction algorithms. Experimental results show that reducts found by the proposed algorithm are more efficient and can generate decision rules with better classification performance. The rough set rule-based method can achieve higher classification accuracy than other intelligent analysis methods such as neural networks, decision trees and a fuzzy rule extraction algorithm based on Fuzzy Min-Max Neural Networks (FRE-FMMNN). Moreover, the decision rules induced by rough set rule induction algorithm can reveal regular and interpretable patterns of the relations between glioma MRI features and the degree of malignancy, which are helpful for medical experts.  相似文献   

5.
特征选择通过移除不相关和冗余的特征来提高学习算法的性能。基于进化算法在求解优化问题时表现出的优越性能,提出FSSAC特征选择方法。新的初始化策略和评估函数使得SAC能将特征选择作为离散空间搜索问题来解决,利用特征子集的准确率指导SAC的采样阶段。在实验阶段,FSSAC结合SVM,J48和KNN分类器,通过UCI数据集完成验证,并与FSFOA,HGAFS,PSO等算法进行了比较。实验结果表明,FSSAC可以提高分类器的分类准确率,且具有良好的泛化性能。除此之外,对FSSAC和其他算法在特征空间维度缩减情况方面做了对比。  相似文献   

6.

对于包含大量特征的数据集, 特征选择已成为一个研究热点, 能剔除无关和冗余特征, 将会有效改善分类准确性. 对此, 在分析已有文献的基础上, 提出一种基于属性关系的特征选择算法(NCMIPV), 获取优化特征子集, 并在UCI 数据集上对NCMIPV 算法进行性能评估. 实验结果表明, 与原始特征子集相比, 该算法能有效降低特征空间维数, 运行时间也相对较短, 分类差错率可与其他算法相比, 在某些场合下性能明显优于其他算法.

  相似文献   

7.
多标记数据有很多的冗余特征和数据,为了解决多标记数据中冗余和无关特征,提高多标记学习算法的泛化能力。提出一个基于模拟退火的卷积式特征选择方法——SAML(simulated annealing based feature selection for multi-label data),已有的算法只是使用了遗传算法来进行优化,新算法采用模拟退火来寻找最优子集,其效果在已有的工作中表现出比前者遗传算法更好的效果。在用于公开评测的Yahoo网页分类数据集上的实验结果表明,SAML算法的性能优于新近提出的一些流行的多标记特征选择方法。  相似文献   

8.
孔莉芳  张虹 《控制与决策》2012,27(7):967-974
针对大量无关或冗余的特征通常会降低模式分类中分类器性能的问题,提出一种基于异步并行微粒群优化的特征子集选择方法(AP-PSO).该方法采用二进制微粒群优化搜索特征子集,利用异步并行方式提高算法的运算效率;为有效协调种群的全局探索和局部开发能力,充分利用混沌运动的遍历性和随机性,提出一种一致混沌变异算子.与已知4种特征子集选择方法进行比较,所得结果验证了该算法的有效性.  相似文献   

9.
黄君毅  吴静  张晖 《计算机工程》2010,36(16):68-70
基于流的特征并使用机器学习技术进行网络流量分类是目前网络流量分类的主流技术。由于许多流的特征可用于流分类,其中有许多是不相关和冗余的特征,因此特征选择对算法性能的优化具有重要的作用。将基于过滤的特征选择方法应用于C4.5、Bayesnet、NBD、NBK等分类算法,实验结果表明该方法在无损于分类准确性的同时能够改进计算性能。  相似文献   

10.
特征选择是模式识别与数据挖掘的关键问题之一,它可以移除数据集中的冗余和不相关特征以提升学习性能。基于最大相关最小冗余准则,提出一种新的基于相关性与冗余性分析的半监督特征选择方法(S2R2),S2R2方法独立于任何分类学习算法。该方法首先对无监督相关度信息度量进行分析与扩充,然后结合信息增益,设计一种半监督特征相关性与冗余性度量,可以有效识别与移除不相关和冗余特征,最后采用增量搜索技术贪婪地构建特征子集,避免搜索指数级大小的解空间,提高算法的运行效率。本文还提出S2R2方法的快速过滤版本,FS2R2,以更好地应对大规模特征选择问题。多个标准数据集上的实验结果表明了所提方法的有效性和优越性。  相似文献   

11.
《Pattern recognition letters》2001,22(6-7):799-811
Feature selection is used to improve the efficiency of learning algorithms by finding an optimal subset of features. However, most feature selection techniques can handle only certain types of data. Additional limitations of existing methods include intensive computational requirements and inability to identify redundant variables. In this paper, we present a novel, information-theoretic algorithm for feature selection, which finds an optimal set of attributes by removing both irrelevant and redundant features. The algorithm has a polynomial computational complexity and is applicable to datasets of a mixed nature. The method performance is evaluated on several benchmark datasets by using a standard classifier (C4.5).  相似文献   

12.
黄琴    钱文彬    王映龙  吴兵龙 《智能系统学报》2019,14(5):929-938
在多标记学习中,特征选择是提升多标记学习分类性能的有效手段。针对多标记特征选择算法计算复杂度较大且未考虑到现实应用中数据的获取往往需要花费代价,本文提出了一种面向代价敏感数据的多标记特征选择算法。该算法利用信息熵分析特征与标记之间的相关性,重新定义了一种基于测试代价的特征重要度准则,并根据服从正态分布的特征重要度和特征代价的标准差,给出一种合理的阈值选择方法,同时通过阈值剔除冗余和不相关特征,得到低总代价的特征子集。通过在多标记数据的实验对比和分析,表明该方法的有效性和可行性。  相似文献   

13.
网络故障诊断中大量无关或冗余的特征会降低诊断的精度,需要对初始特征进行选择。Wrapper模式特征选择方法分类算法计算量大,为了降低计算量,本文提出了基于支持向量的二进制粒子群(SVB-BPSO)的故障特征选择方法。该算法以SVM为分类器,首先通过对所有样本的SVM训练选出SV集,在封装的分类训练中仅使用SV集,然后采用异类支持向量之间的平均距离作为SVM的参数进行训练,最后根据分类结果,利用BPSO在特征空间中进行全局搜索选出最优特征集。在DARPA数据集上的实验表明本文提出的方法能够降低封装模式特征选择的计算量且获得了较高的分类精度以及较明显的降维效果。  相似文献   

14.
孙林  赵婧  徐久成  王欣雅 《计算机应用》2022,42(5):1355-1366
针对经典的帝王蝶优化(MBO)算法不能很好地处理连续型数据,以及粗糙集模型对于大规模、高维复杂的数据处理能力不足等问题,提出了基于邻域粗糙集(NRS)和MBO的特征选择算法。首先,将局部扰动和群体划分策略与MBO算法结合,并构建传输机制以形成一种二进制MBO(BMBO)算法;其次,引入突变算子增强算法的探索能力,设计了基于突变算子的BMBO(BMBOM)算法;然后,基于NRS的邻域度构造适应度函数,并对初始化的特征子集的适应度值进行评估并排序;最后,使用BMBOM算法通过不断迭代搜索出最优特征子集,并设计了一种元启发式特征选择算法。在基准函数上评估BMBOM算法的优化性能,并在UCI数据集上评价所提出的特征选择算法的分类能力。实验结果表明,在5个基准函数上,BMBOM算法的最优值、最差值、平均值以及标准差明显优于MBO和粒子群优化(PSO)算法;在UCI数据集上,与基于粗糙集的优化特征选择算法、结合粗糙集与优化算法的特征选择算法、结合NRS与优化算法的特征选择算法、基于二进制灰狼优化的特征选择算法相比,所提特征选择算法在分类精度、所选特征数和适应度值这3个指标上表现良好,能够选择特征数少且分类精度高的最优特征子集。  相似文献   

15.
In classification, feature selection is an important data pre-processing technique, but it is a difficult problem due mainly to the large search space. Particle swarm optimisation (PSO) is an efficient evolutionary computation technique. However, the traditional personal best and global best updating mechanism in PSO limits its performance for feature selection and the potential of PSO for feature selection has not been fully investigated. This paper proposes three new initialisation strategies and three new personal best and global best updating mechanisms in PSO to develop novel feature selection approaches with the goals of maximising the classification performance, minimising the number of features and reducing the computational time. The proposed initialisation strategies and updating mechanisms are compared with the traditional initialisation and the traditional updating mechanism. Meanwhile, the most promising initialisation strategy and updating mechanism are combined to form a new approach (PSO(4-2)) to address feature selection problems and it is compared with two traditional feature selection methods and two PSO based methods. Experiments on twenty benchmark datasets show that PSO with the new initialisation strategies and/or the new updating mechanisms can automatically evolve a feature subset with a smaller number of features and higher classification performance than using all features. PSO(4-2) outperforms the two traditional methods and two PSO based algorithm in terms of the computational time, the number of features and the classification performance. The superior performance of this algorithm is due mainly to both the proposed initialisation strategy, which aims to take the advantages of both the forward selection and backward selection to decrease the number of features and the computational time, and the new updating mechanism, which can overcome the limitations of traditional updating mechanisms by taking the number of features into account, which reduces the number of features and the computational time.  相似文献   

16.
针对高维数入侵检测数据集中信息冗余导致入侵检测算法处理速度慢的问题,提出了一种基于粒子群优化的入侵特征选择算法,通过分析网络入侵数据特征之间的相关性,可使粒子群优化算法在所有特征空间中优化搜索,自主选择有效特征子集,降低数据维度。实验结果表明该算法能够有效去除冗余特征,减少特征选择时间,在保证检测准确率的前提下,有效地提高了系统的检测速度。  相似文献   

17.
开放动态环境下的机器学习任务面临着数据特征空间的高维性和动态性。目前已有在线流特征选择算法基本仅考虑特征的重要性和冗余性,忽略了特征的交互性。特征交互是指那些本身与标签单独统计时呈现无关或弱相关,但与其他特征结合时却能与标签呈强相关的特征。基于此,提出一种基于邻域信息交互的在线流特征选择算法,该算法分为在线交互特征选择和在线冗余特征剔除两个阶段,即直接计算新到特征与整个已选特征子集的交互强弱程度,以及利用成对比较机制剔除冗余特征。在10个数据集上的实验结果表明了所提算法的有效性。  相似文献   

18.
Biological data often consist of redundant and irrelevant features. These features can lead to misleading in modeling the algorithms and overfitting problem. Without a feature selection method, it is difficult for the existing models to accurately capture the patterns on data. The aim of feature selection is to choose a small number of relevant or significant features to enhance the performance of the classification. Existing feature selection methods suffer from the problems such as becoming stuck in local optima and being computationally expensive. To solve these problems, an efficient global search technique is needed.Black Hole Algorithm (BHA) is an efficient and new global search technique, inspired by the behavior of black hole, which is being applied to solve several optimization problems. However, the potential of BHA for feature selection has not been investigated yet. This paper proposes a Binary version of Black Hole Algorithm called BBHA for solving feature selection problem in biological data. The BBHA is an extension of existing BHA through appropriate binarization. Moreover, the performances of six well-known decision tree classifiers (Random Forest (RF), Bagging, C5.0, C4.5, Boosted C5.0, and CART) are compared in this study to employ the best one as an evaluator of proposed algorithm.The performance of the proposed algorithm is tested upon eight publicly available biological datasets and is compared with Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Simulated Annealing (SA), and Correlation based Feature Selection (CFS) in terms of accuracy, sensitivity, specificity, Matthews’ Correlation Coefficient (MCC), and Area Under the receiver operating characteristic (ROC) Curve (AUC). In order to verify the applicability and generality of the BBHA, it was integrated with Naive Bayes (NB) classifier and applied on further datasets on the text and image domains.The experimental results confirm that the performance of RF is better than the other decision tree algorithms and the proposed BBHA wrapper based feature selection method is superior to BPSO, GA, SA, and CFS in terms of all criteria. BBHA gives significantly better performance than the BPSO and GA in terms of CPU Time, the number of parameters for configuring the model, and the number of chosen optimized features. Also, BBHA has competitive or better performance than the other methods in the literature.  相似文献   

19.
面向入侵检测的基于多目标遗传算法的特征选择   总被引:5,自引:0,他引:5  
俞研  黄皓 《计算机科学》2007,34(3):197-200
针对刻画网络行为的特征集中存在着不相关或冗余特征,从而导致入侵检测性能下降的问题,本文提出了一种基于多目标遗传算法的特征选择方法,将入侵检测中的特征选择问题视为多目标优化问题来处理。实验结果表明,该方法能够实现检测精度与检测算法复杂性的均衡优化,在显著提高检测算法效率的同时,检测精度也有所提高。  相似文献   

20.
高维数据流包含大量的无关信息和冗余信息,这些信息可能极大地降低学习算法的性能。利用属性相关性可以有效地去除数据流中的不相关属性和冗余属性,提高学习算法的效率。分析现有的属性相关性计算方法在应用中的局限性,提出基于曲线拟合的属性相关性特征选择算法FSCFFR(Feature Selection based on Curve-Fitting Feature Relevance)。理论分析和实验表明,FSCFFR在特征选择过程中具有较高的实时性和有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号