首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
针对新型P2P业务采用净荷加密和伪装端口等方法来逃避检测的问题,提出了一种基于决策树的P2P流量识别方法.该方法将决策树方法应用于网络流量识别领域,以适应网络流量的识别要求.决策树方法通过利用训练数据集中的信息熵来构建分类模型,并通过对分类模型的简单查找来完成未知网络流样本的分类.实验结果验证了C4.5决策树算法相比较Na(i)ve Bayes、Bayes Network算法,处理相对简单且计算量不大,具有较高的数据处理效率和分类精度,能够提高网络流量分类精度,更适用于P2P流量识别.  相似文献   

2.
随着互联网应用的广泛使用,网络应用已经呈现出很多类别,尤其是P2P应用流量的暴增。传统的流量分类和应用识别方法已经达不到稳定可观的应用识别率。为了提高P2P应用流量分类准确率和稳定性,科学管理规划网络,提出WMFA(滑动窗口多流关联)分类算法,使用P2P应用流量统计特征,通过降低流统计特征维数,以及减少计算每个流中包的数量,利用C4.5决策树算法对P2P主流应用进行一次分类,采用WMFA算法进行误识别流的挖掘,再进行多流关联进行二次识别,从而提高P2P应用流量分类准确率。实验表明,在降低流特征维数以及减少每个流数据包的前提下,面向国内主流P2P应用WMFA算法对P2P应用在线识别的分类正确率达到96%以上,在准确率上比现有方法平均提高3%。  相似文献   

3.
数据挖掘中决策树分类算法的研究与改进   总被引:4,自引:0,他引:4  
决策树分类算法是数据挖掘中一个重要的内容,而ID3算法又是决策树分类算法中的一种重要方法且被广泛应用。然而在实际应用过程中,现存的决策树算法也存在着很多不足之处,如计算效率低下、多值偏向等。为了解决这些问题,提出了一种基于ID3算法的加权简化信息熵算法,它提高了决策树的构建速度,减少了算法的计算运行时间,同时也克服了ID3算法往往偏向于选择取值较多的属性作为测试属性的缺陷。并且随着数据规模的增大,决策树的分类性能表现得越好。  相似文献   

4.
P2P技术作为一种全新的网络应用,正主导着互联网的发展方向,P2P的管理问题也成为当前互联网络中最大的难题.通过分析P2P流最特征及控制P2P流量过程中存在的问题,比较目前P2P流量检测的几种技术,提出一种基于属性关键度的多决策树分类方法,设计了一个基于多决策树算法的P2P流量检测模型,阐述了模型的工作原理.从虚警率和漏警率以及检测率三个方面评价了采用多决策树算法进行P2P流量检测的有效性.通过大量实验证明,该方法具有较高的检测率,说明采用多决策树分类算法进行P2P流量检测的有效性.  相似文献   

5.
针对传统对支持向量机多类分类算法(Multi-TWSVM)中出现的模糊性问题,提出了一种基于遗传算法的决策树对支持向量机(GA-DTTSVM)多类分类算法。GA-DTTSVM用遗传算法对特征数据建立决策树,通过构建决策树可以分离样本的模糊区域,提高模糊区域样本的识别率。在决策树的每个节点上用对支持向量机(TWSVM)训练分类器,最后用训练的分类器进行分类和预测。实验结果表明,与决策树对支持向量机(DTTSVM)多类分类算法以及Multi-TWSVM相比,GA-DTTSVM多类分类算法具有较高的分类精度和较快的训练速度。  相似文献   

6.
随着物联网技术广泛地应用在各个领域,物联网中业务种类及其流量也愈发丰富,业务流量的增加为物联网的管理和质量保证带来了巨大的挑战。针对物联网中流量大,特征属性多,业务繁杂等特点,提出了基于决策树的业务流分类模型,通过SDN控制平面将物联网中的业务流进行分类,然后根据不同业务流量对网络需求的不同设计了基于网络状态QoS的业务流调度策略,通过该策略为业务流量选择最优的传输路径,从而保障物联网中不同业务的数据服务质量,提高物联网的数据传输效率。  相似文献   

7.
张坤  穆志纯  常晓辉 《控制工程》2008,15(1):103-106
决策树算法训练速度快、结果易于解释,但在实际应用中其分类精度难以满足业务要求。为了提高决策树算法的精度,基于LogitBoost算法的优点,对决策树C4.5算法进行了改进。在决策树的叶节点上应用LogitBoost算法建立叠加回归模型,得到一种新型的模型树算法-LCTree算法。通过11组UCI数据集试验,经分析比较,证明LCTree算法比其他算法更有效。将该算法应用于电信客户离网预警系统建模,结果表明,该算法可有效地分析客户特征,精确地预测离网客户。  相似文献   

8.
目前关于决策树剪枝优化方面的研究主要集中于预剪枝和后剪枝算法。然而,这些剪枝算法通常作用于传统的决策树分类算法,在代价敏感学习与剪枝优化算法相结合方面还没有较好的研究成果。基于经济学中的效益成本分析理论,提出代价收益矩阵及单位代价收益等相关概念,采用单位代价收益最大化原则对决策树叶节点的类标号进行分配,并通过与预剪枝策略相结合,设计一种新型的决策树剪枝算法。通过对生成的决策树进行单位代价收益剪枝,使其具有代价敏感性,能够很好地解决实际问题。实验结果表明,该算法能生成较小规模的决策树,且与REP、EBP算法相比具有较好的分类效果。  相似文献   

9.
高效性和可扩展性是多关系数据挖掘中最重要的问题,而提高算法效率的主要瓶颈在于假设空间,且用户对分类的指导会在很大程度上帮助系统完成分类任务,减少系统独自摸索的时间。针对以上问题提出了改进的多关系决策树算法,即将虚拟连接元组传播技术和提出的背景属性传递技术应用到多关系决策树算法中。对改进的多关系决策树算法进行了理论证明,并且对多关系决策树算法和改进的多关系决策树算法进行比较实验。通过实验可以得出,当改进的多关系决策树在搜索数据项达到背景属性传递阈值时,改进的多关系决策树算法的效率相对很高且受属性个数增加(或  相似文献   

10.
基于多类别肿瘤基因表达谱数据集,从研究肿瘤与正常组织的分类入手,对肿瘤分类特征基因选取问题进行分析和研究。将决策树算法应用到肿瘤基因表达谱分类研究中,尝试引入遗传算法,对决策树分类规则进行优化。试验结果表明,在样本有限的情况下,该方法比单个决策树具有更高的分类精度。  相似文献   

11.
为了实现网络流的线速转发,高性能交换机普遍采用三态内容寻址存储器(TCAM)来构建其包分类引擎。针对TCAM功耗高的问题,近年来出现了许多低功耗索引方案,实现了TCAM存储块的选择性激活以降低功耗,但这些索引方案普遍采用自底向上的局部优化算法来构建,无法有效实现流表规则的均匀划分,严重影响了TCAM的存储效率及功耗降低效果。提出并实现了一种基于决策树映射的TCAM低功耗索引方案,在极大降低功耗的同时提升了TCAM的存储效率。利用规则普遍存在的小域特征,将原始规则集划分为若干个规则子集,然后针对各个子集的特征域,采用自顶向下的方式分别构建平衡决策树,最后通过对各个决策树进行贪心遍历,从而得到TCAM索引列表。实验表明,针对规模为十万条的规则集,算法在仅使用额外1.3%存储空间开销的同时实现了98.2%的功耗降低。  相似文献   

12.
Motivated by the desire to construct compact (in terms of expected length to be traversed to reach a decision) decision trees, we propose a new node splitting measure for decision tree construction. We show that the proposed measure is convex and cumulative and utilize this in the construction of decision trees for classification. Results obtained from several datasets from the UCI repository show that the proposed measure results in decision trees that are more compact with classification accuracy that is comparable to that obtained using popular node splitting measures such as Gain Ratio and the Gini Index.  相似文献   

13.
数据包分类技术应用于许多网络服务,其性能基本决定了服务的质量。RFC算法是具有代表性的数据包分类算法,分类速度快,但由于存储开销巨大,增加了算法实现的存储消耗,加大了成本。该文在RFC算法的基础上提出了一种利用Hash技术减少存储开销且保持相对快速的数据包分类算法。  相似文献   

14.
Packet classification is implemented in modern network routers for providing differentiated services based on packet header information. Traditional packet classification only reports a single matched rule with the highest priority for an incoming packet and takes an action accordingly. With the emergence of new Internet applications such as network intrusion detection system, all matched rules need to be reported. This multi-match problem is more challenging and is attracting attentions in recent years. Because of the stringent time budget on classification, architectural solutions using ternary content addressable memory (TCAM) are the preferred choice for backbone network routers. However, despite its advantage on search speed, TCAM is much more expensive than SRAM, and is notorious for its extraordinarily high power consumption. These problems limit the application and scalability of TCAM-based solutions. This paper presents a tree-based multi-match packet classification technique combining the benefits of both TCAMs and SRAMs. The experiments show that the proposed solution achieves significantly more savings on both memory space and power consumption on packet matching compared to existing solutions.  相似文献   

15.
针对C4.5决策树构造复杂、分类精度不高等问题,提出了一种基于变精度粗糙集的决策树构造改进算法.该算法采用近似分类质量作为节点选择属性的启发函数,与信息增益率相比,该标准更能准确地刻画属性分类的综合贡献能力,同时对噪声有一定的抑制能力.此外还针对两个或两个以上属性的近似分类质量相等的特殊情形,给出了如何选择最优的分类属...  相似文献   

16.
As the competition between mobile telecom operators becomes severe, it becomes critical for operators to diversify their business areas. Especially, the mobile operators are turning from traditional voice communication to mobile value-added services (VAS), which are new services to generate more average revenue per user (ARPU). That is, cross-selling is critical for mobile telecom operators to expand their revenues and profits. In this study, we propose a customer classification model, which may be used for facilitating cross-selling in a mobile telecom market. Our model uses the cumulated data on the existing customers including their demographic data and the patterns for using old products or services to find new products and services with high sales potential. The various data mining techniques are applied to our proposed model in two steps. In the first step, several classification techniques such as logistic regression, artificial neural networks, and decision trees are applied independently to predict the purchase of new products, and each model produces the results of their prediction as a form of probabilities. In the second step, our model compromises all these probabilities by using genetic algorithm (GA), and makes the final decision for a target customer whether he or she would purchase a new product. To validate the usefulness of our model, we applied it to a real-world mobile telecom company’s case in Korea. As a result, we found that our model produced high-quality information for cross-selling, and that GA in the second step contributed to significantly improve the performance.  相似文献   

17.
Decision forest is an ensemble classification method that combines multiple decision trees to in a manner that results in more accurate classifications. By combining multiple heterogeneous decision trees, decision forest is effective in mitigating noise that is often prevalent in real‐world classification tasks. This paper presents a new genetic algorithm for constructing a decision forest. Each decision tree classifier is trained using a disjoint set of attributes. Moreover, we examine the effectiveness of using a Vapnik–Chervonenkis dimension bound for evaluating the fitness function of decision forest. The new algorithm was tested on various datasets. The obtained results have been compared to other methods, indicating the superiority of the proposed algorithm. © 2008 Wiley Periodicals, Inc.  相似文献   

18.
Basak J 《Neural computation》2004,16(9):1959-1981
Decision trees and neural networks are widely used tools for pattern classification. Decision trees provide highly localized representation, whereas neural networks provide a distributed but compact representation of the decision space. Decision trees cannot be induced in the online mode, and they are not adaptive to changing environment, whereas neural networks are inherently capable of online learning and adpativity. Here we provide a classification scheme called online adaptive decision trees (OADT), which is a tree-structured network like the decision trees and capable of online learning like neural networks. A new objective measure is derived for supervised learning with OADT. Experimental results validate the effectiveness of the proposed classification scheme. Also, with certain real-life data sets, we find that OADT performs better than two widely used models: the hierarchical mixture of experts and multilayer perceptron.  相似文献   

19.
Recent work in feature-based classification has focused on nonparametric techniques that can classify instances even when the underlying feature distributions are unknown. The inference algorithms for training these techniques, however, are designed to maximize the accuracy of the classifier, with all errors weighted equally. In many applications, certain errors are far more costly than others, and the need arises for nonparametric classification techniques that can be trained to optimize task-specific cost functions. This correspondence reviews the linear machine decision tree (LMDT) algorithm for inducing multivariate decision trees, and shows how LMDT can be altered to induce decision trees that minimize arbitrary misclassification cost functions (MCF's). Demonstrations of pixel classification in outdoor scenes show how MCF's can optimize the performance of embedded classifiers within the context of larger image understanding systems  相似文献   

20.
A new decision tree method for application in data mining, machine learning, pattern recognition, and other areas is proposed in this paper. The new method incorporates a classical multivariate statistical method, linear discriminant function, into decision trees' recursive partitioning process. The proposed method considers not only the linear combination with all variables, but also combinations with fewer variables. It uses a tabu search technique to find appropriate variable combinations within a reasonable length of time. For problems with more than two classes, the tabu search technique is also used to group the data into two superclasses before each split. The results of our experimental study indicate that the proposed algorithm appears to outperform some of the major classification algorithms in terms of classification accuracy, the proposed algorithm generates decision trees with relatively small sizes, and the proposed algorithm runs faster than most multivariate decision trees and its computing time increases linearly with data size, indicating that the algorithm is scalable to large datasets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号