首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
There is significant interest in the network management and industrial security community about the need to identify the “best” and most relevant features for network traffic in order to properly characterize user behaviour and predict future traffic. The ability to eliminate redundant features is an important Machine Learning (ML) task because it helps to identify the best features in order to improve the classification accuracy as well as to reduce the computational complexity related to the construction of the classifier. In practice, feature selection (FS) techniques can be used as a preprocessing step to eliminate irrelevant features and as a knowledge discovery tool to reveal the “best” features in many soft computing applications. In this paper, we investigate the advantages and disadvantages of such FS techniques with new proposed metrics (namely goodness, stability and similarity). We continue our efforts toward developing an integrated FS technique that is built on the key strengths of existing FS techniques. A novel way is proposed to identify efficiently and accurately the “best” features by first combining the results of some well-known FS techniques to find consistent features, and then use the proposed concept of support to select a smallest set of features and cover data optimality. The empirical study over ten high-dimensional network traffic data sets demonstrates significant gain in accuracy and improved run-time performance of a classifier compared to individual results produced by some well-known FS techniques.  相似文献   

2.
Classification of intrusion attacks and normal network traffic is a challenging and critical problem in pattern recognition and network security. In this paper, we present a novel intrusion detection approach to extract both accurate and interpretable fuzzy IF-THEN rules from network traffic data for classification. The proposed fuzzy rule-based system is evolved from an agent-based evolutionary framework and multi-objective optimization. In addition, the proposed system can also act as a genetic feature selection wrapper to search for an optimal feature subset for dimensionality reduction. To evaluate the classification and feature selection performance of our approach, it is compared with some well-known classifiers as well as feature selection filters and wrappers. The extensive experimental results on the KDD-Cup99 intrusion detection benchmark data set demonstrate that the proposed approach produces interpretable fuzzy systems, and outperforms other classifiers and wrappers by providing the highest detection accuracy for intrusion attacks and low false alarm rate for normal network traffic with minimized number of features.  相似文献   

3.
针对网络流量特征属性的优化选择问题,提出了一种结合粗糙集和禁忌搜索的网络流量特征选择方法(RS-TS).该方法通过粗糙集算法对网络流量特征属性进行约简,将所得到的特征子集作为禁忌搜索的初始解,并利用禁忌搜索得到最优特征子集.实验验证RS-TS方法优于基于GA的特征选择方法和基于IG的特征选择方法,能够有效地去除网络流量的冗余特征属性,提高网络流量分类精度.  相似文献   

4.
目前对等网络(Peer-to-Peer,P2P)流量的识别是网络管理研究的热门话题。基于支持向量机(Support Vector Machine , SVM)的P2P流量识别方法是常用的P2P流量识别方法之一。然而SVM的性能主要受参数和其使用特征的影响,而传统的方法则是将SVM的参数优化和特征选择问题分开处理,因此这样很难获得整体性能最优的SVM分类器。本论文提出了一种基于最优人工蜂群算法和支持向量机相结合的P2P流量识别方法,利用人工蜂群算法,将SVM的参数和特征选择问题视为最优化问题同步处理,可以获得整体性能最优的参数和特征子集。在真实的P2P数据上的实验结果表明提出的方法具有很好的自适应性和分类精度,能够同时获取特征子集和SVM参数的最优解,提高SVM分类器的整体性能。  相似文献   

5.
林荣强  李鸥  李青  李林林 《计算机应用》2014,34(11):3206-3209
针对网络流量特征选择过程中存在的样本标记瓶颈问题,以及现有半监督方法无法选择强相关的特征的不足,提出一种基于类标记扩展的多类半监督特征选择(SFSEL)算法。该算法首先从少量的标记样本出发,通过K-means算法对未标记样本进行类标记扩展;然后结合基于双重正则的支持向量机(MDrSVM)算法实现多类数据的特征选择。与半监督特征选择算法Spectral、PCFRSC和SEFR在Moore数据集进行了对比实验,SFSEL得到的分类准确率和召回率明显都要高于其他算法,而且SFSEL算法选择的特征个数明显少于其他算法。实验结果表明: SFSEL算法能够有效地提高所选特征的相关性,获取更好的网络流量分类性能。  相似文献   

6.
针对传统机器学习算法对于流量分类的瓶颈问题,提出基于一维卷积神经网络模型的应用程序流量分类算法。将网络流量数据集进行数据预处理,去除无关数据字段,并使数据满足卷积神经网络的输入特性。设计了一种新的一维卷积神经网络模型,从网络结构、超参数空间以及参数优化方面入手构造了最优分类模型。该模型通过卷积层自主学习数据特征,解决了传统基于机器学习的流量分类算法中特征选择问题。通过网络公开数据集进行模型测试,相比于传统的一维卷积神经网络模型,所设计的神经网络模型的分类准确率提升了16.4%,总分类时间节省了71.48%。另外在类精度、召回率以及[F1]分数方面都有较好的提升。  相似文献   

7.
在基于网络流量分析,被动式的网络设备识别研究中,网络流量数据中往往存在许多高维数据,其中的部分特征对设备识别贡献不大,甚至会严重影响分类结果和分类性能.所以针对这个问题本文提出了一种将Filter和Wrapper方式相结合,基于对称不确定性(SU)和近似马尔可夫毯(AMB)的网络流量特征选择算法FSSA,本文提出的方法...  相似文献   

8.

We propose a hybrid grasshopper optimizer to reduce the size of the feature set in the steganalysis process using information theory and other stochastic optimization techniques. This paper results from the stagnancy of local minima and slow convergence rate by the grasshopper algorithm in optimization problems. Therefore, we enhance the grasshopper optimization (GOA) performance with chaotic maps to make it Chaotic GOA (CGOA). Then, we combine the CGOA with adaptive particle swarm optimization (APSO) to make it Chaotic Particle-Swarm Grasshopper Optimization Algorithm (CPGOA). Next, we use the proposed optimizer with entropy to find the best feature subset of the original Subtractive Pixel Adjacency Model (SPAM) and Spatial Rich Model (SRM) feature set. Finally, the proposed technique is experimented with to detect the spatial domain steganography with different embedding rates on the BOSSbase 1.01 grayscale image database. The results show the improved results from the proposed hybrid optimizer compared to the original GOA and other state-of-the-art feature selection methods in steganalysis.

  相似文献   

9.
基于粒子群优化算法和相关性分析的特征子集选择   总被引:3,自引:0,他引:3  
特征选择是模式识别与数据挖掘等领域的重要问题之一.针对此问题,提出了基于离散粒子群和相关性分析的特征子集选择算法,算法中采用过滤模式的特征选择方法,通过分析网络入侵数据中所有特征之间的相关性,利用离散粒子群算法在所有特征的空间里优化搜索,自动选择有效的特征子集以降低数据维度.1999 KDD Cup Data中IDS数据集的实验结果表明了提出算法的有效性.  相似文献   

10.
针对异常流量检测领域的高维数据降维问题,提出了一种基于信息熵理论的特征选择算法。首先计算特征的重要系数,删除重要系数小于一定阈值的特征,得到重要特征集。然后,计算特征间的冗余系数,删除冗余特征,得到精简的特征集。最后,用ID3算法对精简的特征集进行了验证,结果表明这种特征选择算法是有效的。  相似文献   

11.
Data preprocessing is widely recognized as an important stage in anomaly detection. This paper reviews the data preprocessing techniques used by anomaly-based network intrusion detection systems (NIDS), concentrating on which aspects of the network traffic are analyzed, and what feature construction and selection methods have been used. Motivation for the paper comes from the large impact data preprocessing has on the accuracy and capability of anomaly-based NIDS. The review finds that many NIDS limit their view of network traffic to the TCP/IP packet headers. Time-based statistics can be derived from these headers to detect network scans, network worm behavior, and denial of service attacks. A number of other NIDS perform deeper inspection of request packets to detect attacks against network services and network applications. More recent approaches analyze full service responses to detect attacks targeting clients. The review covers a wide range of NIDS, highlighting which classes of attack are detectable by each of these approaches.Data preprocessing is found to predominantly rely on expert domain knowledge for identifying the most relevant parts of network traffic and for constructing the initial candidate set of traffic features. On the other hand, automated methods have been widely used for feature extraction to reduce data dimensionality, and feature selection to find the most relevant subset of features from this candidate set. The review shows a trend toward deeper packet inspection to construct more relevant features through targeted content parsing. These context sensitive features are required to detect current attacks.  相似文献   

12.

In recent years, Botnets have been adopted as a popular method to carry and spread many malicious codes on the Internet. These malicious codes pave the way to execute many fraudulent activities including spam mail, distributed denial-of-service attacks and click fraud. While many Botnets are set up using centralized communication architecture, the peer-to-peer (P2P) Botnets can adopt a decentralized architecture using an overlay network for exchanging command and control data making their detection even more difficult. This work presents a method of P2P Bot detection based on an adaptive multilayer feed-forward neural network in cooperation with decision trees. A classification and regression tree is applied as a feature selection technique to select relevant features. With these features, a multilayer feed-forward neural network training model is created using a resilient back-propagation learning algorithm. A comparison of feature set selection based on the decision tree, principal component analysis and the ReliefF algorithm indicated that the neural network model with features selection based on decision tree has a better identification accuracy along with lower rates of false positives. The usefulness of the proposed approach is demonstrated by conducting experiments on real network traffic datasets. In these experiments, an average detection rate of 99.08 % with false positive rate of 0.75 % was observed.

  相似文献   

13.
针对网络流量分类中的多类不均衡问题,提出一种基于相对不确定性和对称不确定性的Hybrid型特征选择方法。首先,利用相对不确定性为每个类选择候选特征集;然后,保留每个候选特征集中对称不确定性较高的特征并去除其它特征;最后,利用基于C4.5决策树的Wrapper型特征选择方法确定最优特征子集。在真实网络流量数据集上的实验结果表明,与传统方法相比,该方法具有较高的整体准确率、小类召回率和g-mean值,从而可以减轻多类不均衡问题带来的不良影响。  相似文献   

14.
当前网络流量日趋复杂,给网络管理带来许多困难.为了准确地识别出网络中的各种流量,本文以支持向量机为分类器,以流的统计学特征为分类依据,提出一种组合式特征选择算法,该算法首先快速去除和分类不相关的特征,针对余下的特征,再利用遗传算法引导特征的选择和支持向量机模型参数的寻优,最终获得了最优的特征集和最佳的支持向量机分类模型.经过实验验证,基于该算法的网络流量识别方法在识别P2P流量时能以更少的特征获得更高的分类准确率.  相似文献   

15.
Feature subset selection is basically an optimization problem for choosing the most important features from various alternatives in order to facilitate classification or mining problems. Though lots of algorithms have been developed so far, none is considered to be the best for all situations and researchers are still trying to come up with better solutions. In this work, a flexible and user-guided feature subset selection algorithm, named as FCTFS (Feature Cluster Taxonomy based Feature Selection) has been proposed for selecting suitable feature subset from a large feature set. The proposed algorithm falls under the genre of clustering based feature selection techniques in which features are initially clustered according to their intrinsic characteristics following the filter approach. In the second step the most suitable feature is selected from each cluster to form the final subset following a wrapper approach. The two stage hybrid process lowers the computational cost of subset selection, especially for large feature data sets. One of the main novelty of the proposed approach lies in the process of determining optimal number of feature clusters. Unlike currently available methods, which mostly employ a trial and error approach, the proposed method characterises and quantifies the feature clusters according to the quality of the features inside the clusters and defines a taxonomy of the feature clusters. The selection of individual features from a feature cluster can be done judiciously considering both the relevancy and redundancy according to user’s intention and requirement. The algorithm has been verified by simulation experiments with different bench mark data set containing features ranging from 10 to more than 800 and compared with other currently used feature selection algorithms. The simulation results prove the superiority of our proposal in terms of model performance, flexibility of use in practical problems and extendibility to large feature sets. Though the current proposal is verified in the domain of unsupervised classification, it can be easily used in case of supervised classification.  相似文献   

16.

The effective modelling of high-dimensional data with hundreds to thousands of features remains a challenging task in the field of machine learning. This process is a manually intensive task and requires skilled data scientists to apply exploratory data analysis techniques and statistical methods in pre-processing datasets for meaningful analysis with machine learning methods. However, the massive growth of data has brought about the need for fully automated data analysis methods. One of the key challenges is the accurate selection of a set of relevant features, which can be buried in high-dimensional data along with irrelevant noisy features, by choosing a subset of the complete set of input features that predicts the output with higher accuracy comparable to the performance of the complete input set. Kohonen’s self-organising neural network map has been utilised in various ways for this task, such as with the weighted self-organising map (WSOM) approach and this method is reviewed for its efficacy. The study demonstrates that the WSOM approach can result in different results on different runs on a given dataset due to the inappropriate use of the steepest descent optimisation method to minimise the weighted SOM’s cost function. An alternative feature weighting approach based on analysis of the SOM after training is presented; the proposed approach allows the SOM to converge before analysing the input relevance, unlike the WSOM that aims to apply weighting to the inputs during the training which distorts the SOM’s cost function, resulting in multiple local minimums meaning the SOM does not consistently converge to the same state. We demonstrate the superiority of the proposed method over the WSOM and a standard SOM in feature selection with improved clustering analysis.

  相似文献   

17.
翟俊海    刘博  张素芳 《智能系统学报》2017,12(3):397-404
特征选择是指从初始特征全集中,依据既定规则筛选出特征子集的过程,是数据挖掘的重要预处理步骤。通过剔除冗余属性,以达到降低算法复杂度和提高算法性能的目的。针对离散值特征选择问题,提出了一种将粗糙集相对分类信息熵和粒子群算法相结合的特征选择方法,依托粒子群算法,以相对分类信息熵作为适应度函数,并与其他基于进化算法的特征选择方法进行了实验比较,实验结果表明本文提出的方法具有一定的优势。  相似文献   

18.
陈红  刘光远  赖祥伟 《计算机科学》2012,39(4):250-253,274
针对脉搏信号的情感识别问题,提出了一种相关性分析和最大最小蚁群算法相结合的方法,找出了对情感识别模型构建具有较好性能的稳定特征子集。首先将原始特征用序列后向选择(SBS)方法排序,然后利用线性相关系数分析法计算特征间的相关度,并根据排序结果去除部分相关度较大的特征,最后针对筛选后的特征子集用最大最小蚁群算法进行特征选择,并结合Fisher分类器对高兴、惊奇、厌恶、悲伤、愤怒和恐惧6种情感进行分类。实验结果表明,该方法能在原始特征集合中找出更稳定有效的特征子集,从而建立起有效的情感识别模型。  相似文献   

19.
Aggregating outputs of multiple classifiers into a committee decision is one of the most important techniques for improving classification accuracy. The issue of selecting an optimal subset of relevant features plays also an important role in successful design of a pattern recognition system. In this paper, we present a neural network based approach for identifying salient features for classification in neural network committees. Feature selection is based on two criteria, namely the reaction of the cross-validation data set classification error due to the removal of the individual features and the diversity of neural networks comprising the committee. The algorithm developed removed a large number of features from the original data sets without reducing the classification accuracy of the committees. The accuracy of the committees utilizing the reduced feature sets was higher than those exploiting all the original features.  相似文献   

20.
Intrusion Detection System (IDS) deals with huge amount of network traffic and uses large feature set to discriminate normal pattern and intrusive pattern. However, most of existing systems lack the ability to process data for real-time anomaly detection. In this paper, we propose a 3-Tier Iterative Feature Selection Engine (IFSEng) for feature subspace selection. Principal Component Analysis (PCA) technique is used for the pre-processing of data. Mahalanobis Distance Map (MDM) is used to discover hidden correlations between the features and between the packets. We also propose a novel Real-time Payload-based Intrusion Detection System (RePIDS) that integrates a 3-Tier IFSEng and the MDM approach. Mahalanobis Distance (MD) dissimilarity criterion is used to classify each packet as either a normal or an attack packet.The effectiveness of the proposed RePIDS is evaluated using DARPA 99 dataset and Georgia Institute of Technology attack dataset. The traffic for Web-based application is considered for validating our model. F-value, a criterion, is used to evaluate the detection performance of RePIDS. Experimental results show that RePIDS achieves better performance (high F-values, 0.9958 for DARPA 99 dataset and 0.976 for Georgia Institute of Technology attack dataset respectively, with only 0.85% false alarm rate) and lower computational complexity when compared against two state-of-the-art payload-based intrusion detection systems. Additionally, it has 1.3 time higher throughput in comparison with real scenario of medium sized enterprise network.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号