僵尸网络是互联网网络的重大安全威胁之一,本文对僵尸网络的工作机制进了详细介绍,对僵尸网络的通信特征进行了研究分析,提出了一种新的检测方案,详细阐述了僵尸网络通信过程和检测原理,并对关键技术进行了设计实现。  相似文献   

针对现有的僵尸网络研究工作所检测的僵尸网络生命周期的阶段较为单一的问题,提出基于集成学习的僵尸网络在线检测方法。首先,细粒度地标记僵尸网络多个阶段的流量,生成僵尸网络数据集;其次,结合多种特征选择算法生成包含23个特征的重要特征集和包含28个特征的次重要特征集,基于Stacking集成学习技术集成多种深度学习模型,并针对不同的初级分类器提供不同的输入特征集,得到僵尸网络在线检测模型;最后,将僵尸网络在线检测模型部署在网络入口处在线检测多种僵尸网络。实验表明,所提基于集成学习的僵尸网络在线检测方法能够有效地检测出多个阶段的僵尸网络流量,恶意流量检测率可达96.47%。  相似文献   

在面对现实中广泛存在的不平衡数据分类问题时,大多数 传统分类算法假定数据集类分布是平衡的,分类结果偏向多数类,效果不理想。为此,提出了一种基于聚类融合欠抽样的改进AdaBoost分类算法。该算法首先进行聚类融合,根据样本权值从每个簇中抽取一定比例的多数类和全部的少数类组成平衡数据集。使用AdaBoost算法框架,对多数类和少数类的错分类给予不同的权重调整,选择性地集成分类效果较好的几个基分类器。实验结果表明,该算法在处理不平衡数据分类上具有一定的优势。  相似文献   

Over the last few years, the dimensionality of datasets involved in data mining applications has increased dramatically. In this situation, feature selection becomes indispensable as it allows for dimensionality reduction and relevance detection. The research proposed in this paper broadens the scope of feature selection by taking into consideration not only the relevance of the features but also their associated costs. A new general framework is proposed, which consists of adding a new term to the evaluation function of a filter feature selection method so that the cost is taken into account. Although the proposed methodology could be applied to any feature selection filter, in this paper the approach is applied to two representative filter methods: Correlation-based Feature Selection (CFS) and Minimal-Redundancy-Maximal-Relevance (mRMR), as an example of use. The behavior of the proposed framework is tested on 17 heterogeneous classification datasets, employing a Support Vector Machine (SVM) as a classifier. The results of the experimental study show that the approach is sound and that it allows the user to reduce the cost without compromising the classification error.  相似文献   

为进一步提升网络入侵检测效果,提出一种融合FAST特征选择与自适应二进制量子引力搜索支持向量机的(FAST-ABQGSA-SVM)网络入侵检测算法。利用FAST算法过滤掉原始特征集中冗余无关的特征形成候选特征子集,基于组合优化策略采用自适应二进制量子引力搜索算法对候选特征子集与SVM分类器参数进行组合优化。在ABQGSA反复学习寻优过程中,采取动态自适应波动式调整策略更新量子旋转角以平衡算法全局搜索能力和局部搜索能力,同时为提升算法的自适应变异能力,设计与进化程度及个体适应度值相关的自适应变异概率,当种群进化出现停滞时及时引入量子位离散交叉操作帮助种群摆脱局部极值。最后使用KDD CUP 99入侵检测数据进行仿真实验。结果表明,所提出的ABQGSA-SVM算法较其他同类型检测算法具有更好的鲁棒性、学习精度以及检测效果。  相似文献   

特征选择是数据挖掘和机器学习领域中一种常用的数据预处理技术。在无监督学习环境下,定义了一种特征平均相关度的度量方法,并在此基础上提出了一种基于特征聚类的特征选择方法 FSFC。该方法利用聚类算法在不同子空间中搜索簇群,使具有较强依赖关系(存在冗余性)的特征被划分到同一个簇群中,然后从每一个簇群中挑选具有代表性的子集共同构成特征子集,最终达到去除不相关特征和冗余特征的目的。在 UCI 数据集上的实验结果表明,FSFC 方法与几种经典的有监督特征选择方法具有相当的特征约减效果和分类性能。  相似文献   

为了在只有少量已知标记的数据集中获得较好的聚类效果,提出了一种基于图收缩的半监督聚类算法。首先将整个样本空间中的数据表达为一个带权图,再根据给出的must-link约束,对图进行边收缩的修改,进而增强must-link约束。在此基础上引入图拉普拉斯算子,结合cannot-link约束将样本空间投影到一个特征子空间。最后在子空间上进行聚类分析。实验结果表明,该方法不仅提高了对复杂数据的聚类结果,而且在约束对数量较少时也能获得较好的结果。  相似文献   

针对现有属性选择算法平等地对待每个样本而忽略样本之间的差异性,从而使学习模型无法避免噪声样本影响问题,提出一种融合自步学习理论的无监督属性选择(UFS-SPL)算法。首先自动选取一个重要的样本子集训练得到属性选择的鲁棒性初始模型,然后逐步自动引入次要样本提升模型的泛化能力,最终获得一个能避免噪声干扰而同时具有鲁棒性和泛化性的属性选择模型。在真实数据集上与凸半监督多标签属性选择(CSFS)、正则化自表达(RSR)和无监督属性选择的耦合字典学习方法(CDLFS)相比,UFS-SPL的聚类准确率、互信息和纯度平均提升12.06%、10.54%和10.5%。实验结果表明,UFS-SPL能够有效降低数据集中无关信息的影响。  相似文献   

聚类是机器学习领域中的一个研究热点,弱监督学习是半监督学习中一个重要的研究方向,有广泛的应用场景.在对聚类与弱监督学习的研究中,提出了一种基于k个标记样本的弱监督学习框架.该框架首先用聚类及聚类置信度实现了标记样本的扩展.其次,对受限玻尔兹曼机的能量函数进行改进,提出了基于k个标记样本的受限玻尔兹曼机学习模型.最后,完成了对该模型的推理并设计相关算法.为了完成对该框架和模型的检验,选择公开的数据集进行对比实验,实验结果表明,基于k个标记样本的弱监督学习框架实验效果较好.  相似文献   

提出心衰死亡率预测系统,预测心衰病人本次住院后30天内死亡率。基于上海曙光医院提供的心衰病人信息,首先对原始数据和特征进行预处理。由于特征的冗余性,再选用经典的Relief特征选择算法筛选出重要的心衰特征,最后选用bp-SVM算法来实现死亡率预测。实验结果证明,死亡率预测系统可以达到较高的性能并通过提供决策信息,辅助医生治疗病人。医生可以根据系统预测的病人死亡率的高低,采取不同的治疗方式,提高临床诊断结果和医院的资源分配。  相似文献   


Many applications today are using an encrypted channel to secure their communication and transactions. Though, their security is often challenged by adversaries such as Botnet. Botnet leverages the encrypted channel to launch attacks and amplify the impact of attacks. The numbers of Botnet attacks over an encrypted channel are increasing and continue to cause a great loss of money. This study proposes an encrypted Botnet detection technique based on packet header analysis. This technique does not require deep packet inspection and intense traffic analysis. However, the proposed technique requires the analysis of the features taken from the packet header, which are essential for detection. The study endeavors to show that features selected can significantly affect the classification of encrypted Botnet. Therefore, in this paper, the researchers focus on the effects of feature selection on the classification of encrypted Botnet. The researchers use different classification mode (full training and 10-fold cross-validation) mainly by using seven features (7-features) and three features (3-features). Seven features are the number of features extracted from the packet header, and after the feature selection, only three features out of the seven features have weight (value). Therefore, the three features are the most significant features from the seven features that have been extracted. Generally, the result shows that classification with three most significant features provides higher true positive compared to the 7-features classification. Different machine learning algorithms have been used for the classification. Relatively, the results show that the True Positives are higher for 3-features classification than 7-features classification.


目前,僵尸网络检测方法大多依靠对僵尸网络通信活动或通信内容的分析,前者对数据流的特征进行统计分析,不涉及数据流中的内容,在检测加密类型方面具有较强优势,但准确性较低;后者依赖先验知识进行检测,具有较强的准确度,但检测的通用性较低。因此,根据杰卡德相似度系数定义了通信相似度,并提出了一种基于用户请求域名系统(DNS,domain name system)的通信相似度计算方法,用于基于网络流量的僵尸网络节点检测。最后,基于Spark框架对所提出的方法进行了实验验证,实验结果表明该方法可以有效地用于僵尸网络节点检测。  相似文献   

针对网络流量特征选择过程中存在的样本标记瓶颈问题,以及现有半监督方法无法选择强相关的特征的不足,提出一种基于类标记扩展的多类半监督特征选择(SFSEL)算法。该算法首先从少量的标记样本出发,通过K-means算法对未标记样本进行类标记扩展;然后结合基于双重正则的支持向量机(MDrSVM)算法实现多类数据的特征选择。与半监督特征选择算法Spectral、PCFRSC和SEFR在Moore数据集进行了对比实验,SFSEL得到的分类准确率和召回率明显都要高于其他算法,而且SFSEL算法选择的特征个数明显少于其他算法。实验结果表明: SFSEL算法能够有效地提高所选特征的相关性,获取更好的网络流量分类性能。  相似文献   

虚拟化系统的强隔离性质在为安全机制部署提供可靠环境的同时,也引入了语义鸿沟问题。针对现有研究普遍依赖的软件体系结构信息、数据结构和控制流容易被窜改,采用的检测算法在客户机状态识别方面效率较低等问题,设计了特征构造和窗口标记的方法对虚拟机数据进行预处理,以满足实施数据挖掘的必要条件,建立了基于特征选择的虚拟化系统语义鸿沟桥接模型,能够仅依赖硬件体系结构数据构建虚拟机执行模式并进行安全检测。实验结果表明,所设计的系统模型能够筛选出关键的虚拟机特征,并有效地识别出客户机异常行为,提高语义鸿沟的桥接效率,为处理语义鸿沟问题提供了一种可行方案。  相似文献   

Patient no-shows have significant adverse effects on healthcare systems. Therefore, predicting patients’ no-shows is necessary to use their appointment slots effectively. In the literature, filter feature selection methods have been prominently used for patient no-show prediction. However, filter methods are less effective than wrapper methods. This paper presents new wrapper methods based on three variants of the proposed algorithm, Opposition-based Self-Adaptive Cohort Intelligence (OSACI). The three variants of OSACI are referred to in this paper as OSACI-Init, OSACI-Update, and OSACI-Init_Update, which are formed by the integration of Self-Adaptive Cohort Intelligence (SACI) with three Opposition-based Learning (OBL) strategies; namely: OBL initialization, OBL update, and OBL initialization and update, respectively. The performance of the proposed algorithms was examined and compared with that of Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Differential Evolution (DE), and SACI in terms of AUC, sensitivity, specificity, dimensionality reduction, and convergence speed. Patient no-show data of a primary care clinic in upstate New York was used in the numerical experiments. The results showed that the proposed algorithms outperformed the other compared algorithms by achieving higher dimensionality reduction and better convergence speed while achieving comparable AUC, sensitivity, and specificity scores.  相似文献   

针对回归模型在进行属性选择未考虑类标签之间关系从而导致回归效果不理想,提出了一种新的具有鲁棒性的低秩属性选择算法。具体为,在线性回归的模型框架下,通过低秩约束来考虑类标签间的相关性和通过稀疏学习理论中的[l2,p-]范数来考虑属性间的关联结构,以此去除不相关的冗余属性的影响;算法通过嵌入子空间学习方法(线性判别分析(LDA))来调整属性选择结果。经实验验证,提出的属性选择算法在六个公开数据集上的效果均优于四种对比算法。  相似文献   

针对故障诊断中数据存在噪声和高维的缺点,使用一种快速特征提取方法对故障数据进行降维,该方法以特征信号的均值和方差作为其权重衡量的依据。利用支持向量机的模式分类功能,构造了基于特征提取的多故障分类器。实例表明,在保证诊断效果的情况下,该方法实现了数据降维,降低了运算复杂度。  相似文献   

The aim of this paper is to provide an efficient input feature selection algorithm for modeling of systems based on modified definition of fuzzy-rough sets. Some of the critical issues concerning the complexity and convergence of the feature selection algorithm are discussed in detail. Based on some natural properties of fuzzy t-norm and t-conorm operators, the concept of fuzzy-rough sets on compact computational domain is put forward, which is then utilized to construct improved Fuzzy-Rough Feature Selection algorithm. Various mathematical properties of this new definition of fuzzy-rough sets are discussed from pattern classification viewpoint. Speedup factor as high as 622 has been achieved with proposed algorithm compared to recently proposed FRSAR, with improved model performance on selected set of features.  相似文献   

When gene expression datasets contain some labeled data samples, the labeled information should be incorporated into clustering algorithm such that more reasonable clustering results can be achieved. In this paper, a novel semi-supervised clustering algorithm, Semi-supervised Iterative Visual Clustering Algorithm (Semi-IVCA), is presented to tackle with such datasets. The new algorithm first constructs the visual sampling image of the dataset based on visual theorem and obtains its attractors using the gradient learning rules, where each attractor denotes a cluster of the dataset. Then the new algorithm introduces an iterative clustering procedure to realize the semi-supervised learning. The new algorithm is a generalization of the current Visual Clustering Algorithm (VCA) presented by authors. Except for the advantage that Semi-IVCA can effectively utilize the labeled data information in clustering, it is robust and insensitive to initialization, and it has strong parameter learning capability and good interpretation for the clustering results. When the new algorithm Semi-IVCA is applied to the artificial and real gene expression datasets, the experimental results confirm the above advantages of algorithm Semi-IVCA.  相似文献   

多标记特征选择已在图像分类、疾病诊断等领域得到广泛应用;然而,现实中数据的标记空间往往存在部分标记缺失的问题,这破坏了标记间的结构性和关联性,使得学习算法难以准确地选择重要特征。针对此问题,提出一种缺失标记下基于类属属性的多标记特征选择(MFSLML)算法。首先,通过利用稀疏学习方法获取每个类标记的类属属性;同时基于线性回归模型构建类属属性与标记的映射关系,以用于恢复缺失标记;最后,选取7组数据集以及4个评价指标进行实验。实验结果表明:相比基于最大依赖度和最小冗余度的多标记特征选择算法(MDMR)和基于特征交互的多标记特征选择算法(MFML)等一些先进的多标记特征选择算法,MFSLML在平均查准率指标上能够提升4.61~5.5个百分点,由此可见MFSLML具有更优的分类性能。  相似文献   

