首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 93 毫秒
1.
为解决图神经网络(GNN)上不平衡节点的分类问题,提出一种Bagging集成模型,该模型使用图卷积网络(GCN)作为基分类器。在该模型中,先对若干基分类器进行并行训练,然后使用多数投票的方式对这些基分类器的预测结果进行集成,最终完成分类任务。实验结果表明,该文提出的模型显著优于其他现有基线方法,验证了其在不平衡节点分类中的有效性。  相似文献   

2.
周进登  王晓丹  权文  许燕  姚旭 《电子学报》2011,39(7):1514-1522
 纠错输出编码作为解决多类分类问题的通用集成框架,能有效的把多类问题分解为二类问题从而使问题得以简化.然而在生成基分类器的过程中,经常面临提高基分类器之间的差异性和增加各基分类器与集成分类器学习的一致性的矛盾,称之为consistent-diverse平衡问题.在保证差异性的前提下减小由学习不一致性引起的分类错误率是解决该平衡问题的一个出发点,在此利用加权解码,通过对加权系数矩阵的再学习进而减弱和消除由基分类器学习不一致性产生的误差.实验利用人工数据集和UCI数据集分别加以验证,结果表明以集成分类器的分类错误率为适应度函数的遗传算法搜索出的最优加权系数矩阵相比其它方法产生的系数矩阵在解决consistent-diverse平衡问题更具有优越性.  相似文献   

3.
综合分析了数据流分类算法以及云计算的基本理论,提出了基于Hadoop框架的数据流系综分类算法,算法采用MapReduce并行编程模型对传统基于动态权重系综模型进行改进,以提升算法的分类效率.分析结果表明,该算法在处理快速海量到达的数据流时,其执行效率远高于传统系综算法.  相似文献   

4.
李晓旭  李睿凡  冯方向  曹洁  王小捷 《电子学报》2014,42(10):2040-2044
本文主要关注多视图数据的分类问题.考虑到集成分类方法可组合多个弱分类器构成一个强分类器,以及主题模型能学习复杂数据的语义表示,本文试图将集成学习思想引入主题模型中,以便同时学习多视图数据的分类规则和预测性语义特征.具体地,结合概率主题模型LDA模型和集成分类方法Softmax混合模型,提出了一个多视图有监督的分类模型.基于变分EM方法,推导了该模型的参数估计算法.两个真实图像数据集上的实验结果表明了提出模型有较好的分类性能.  相似文献   

5.
入侵检测问题可以模型化为数据流分类问题,传统的数据流分类算法需要标注大量的训练样本,代价昂贵,降低了相关算法的实用性。在PU学习算法中,仅需标注部分正例样本就可以构造分类器。对此本文提出一种动态的集成PU学习数据流分类的入侵检测方法,只需要人工标注少量的正例样本,就可以构造数据流分类器。在人工数据集和真实数据集上的实验表明,该方法具有较好的分类性能,在处理偏斜数据流上优于三种PU 学习分类方法,并具有较高的入侵检测率。  相似文献   

6.
基于主动检测概念漂移的数据流多分类器方法   总被引:1,自引:0,他引:1  
概念漂移检测技术是数据流分类研究领域的重要组成部分之一,但目前绝大多数数据流多分类器方法并没有明确提出概念漂移的检测方法.依据契比雪夫不等式,提出一种基于主动检测概念漂移检测的多分类器方法.通过实验表明该方法可以有效提高概念漂移的适应能力以及数据流多分类器方法的分类精度.  相似文献   

7.
一种不平衡数据的分类方法   总被引:1,自引:0,他引:1  
针对数据挖掘和机器学习领域中常遇到的数据不平衡问题,分析了数据的不平衡性及目前解决不平衡问题的主要策略,提出了一种基于组合的不平衡数据分类方法,该方法将数据重取样与权重润饰结合,以降低分类器对大类别的偏好。实验结果表明,权重润饰能够较好的弥补重取样方法的某些不足,该组合方法可有效提高不平衡数据分类精度。  相似文献   

8.
基于机器学习的网络流量识别技术作为一种典型的数据流分类的应用,对概念漂移检测方法的要求越来越高。针就这个问题,首先分析了概念漂移检测的两种典型方法,然后结合实际的网络环境中经常存在类别不平衡的特性提出了一种检测概念漂移的算法CF—CDD,并对该算法的原理和统计学理论基础进行了详细的论述。再根据提出的概念漂移检测算法构建基于权重的集成分类器算法TCEL—CF—CDD,以达到自适应流量识别的目的。最后进行实验,验证了文中提出的概念漂移检测算法的可行性。  相似文献   

9.
基于信息熵差异性度量的数据流增量集成分类算法   总被引:2,自引:0,他引:2  
琚春华  邹江波 《电信科学》2015,31(2):92-102
对分类器之间的差异性进行了研究,提出了一种基于信息熵差异性度量的增量集成分类算法,将信息熵差异性度量方法融入到基分类器选择过程中,通过对训练数据集的基分类结果的信息熵差异度计算,采用循环迭代优化的选择方法,以熵差异性最优化为约束目标,动态调整基分类器个数,实现了分类准确稳定,减少了系统开销。通过实验比对,证明了算法在数据流处理时比其他算法具有更小的开销和较强的适应性。  相似文献   

10.
基于随机子空间和AdaBoost的自适应集成方法   总被引:4,自引:0,他引:4  
如何构造差异性大且精确度高的基分类器是集成学习的重点,为此提出一种新的集成学习方法——利用PSO寻找使得AdaBoost依样本权重抽取的数据集分类错误率最小化的最优特征权重分布,依据此最优权重分布对特征随机抽样生成随机子空间,并应用于AdaBoost的训练过程中.这就在增加分类器间差异性的同时保证了基分类器的准确度.最后用多数投票法融合各基分类器的决策结果,并通过仿真实验验证该方法的有效性.  相似文献   

11.
王磊  赵磊  郑宝玉 《信号处理》2017,33(4):528-532
随着数据挖掘技术的发展,传统集成方法中的集成规则,例如 Max rule, Min rule, Product rule, 以及 Sum rule,已经不能满足现实中对于二类非均衡数据分类正确率的需要。因此本文提出了基于朴素贝叶斯和欧氏距离的二类非均衡数据集成方法。该集成方法是以朴素贝叶斯为基分类器,其集成规则通过引入测试数据与训练数据之间的欧式距离以及训练数据中多数类与少数类之间的关系,在空间距离上加强了最终的分类结果与原始训练数据之间的关联性。实验结果表明,该集成方法在处理二类非均衡数据时,Area Under roc Curve(AUC)值与现存的集成方法相比显著提高,从而具有更好的分类性能。因此,本文方法在处理二类非均衡数据时具有明显优势。   相似文献   

12.
Jie WANG  Lili YANG  Min YANG 《通信学报》2018,39(10):155-165
A malicious network traffic detection method based on multi-level distributed ensemble classifier was proposed for the problem that the attack model was not trained accurately due to the lack of some samples of attack steps for detecting attack in the current network big data environment,as well as the deficiency of the existing ensemble classifier in the construction of multilevel classifier.The dataset was first preprocessed and aggregated into different clusters,then noise processing on each cluster was performed,and then a multi-level distributed ensemble classifier,MLDE,was built to detect network malicious traffic.In the MLDE ensemble framework the base classifier was used at the bottom,while the non-bottom different ensemble classifiers were used.The framework was simple to be built.In the framework,big data sets were concurrently processed,and the size of ensemble classifier was adjusted according to the size of data sets.The experimental results show that the AUC value can reach 0.999 when MLDE base users random forest was used in the first layer,bagging was used in the second layer and AdaBoost classifier was used in the third layer.  相似文献   

13.
在很多真实世界问题中,不同类别的数据样本往往有显著的不平衡性,即大类的样本远多于小类.对类别不平衡样本进行学习,是目前国内外数据挖掘和机器学习领域的研究热点之一.以往对不平衡样本学习的研究主要针对二分类问题进行,由此针对多分类问题,提出一种基于HDDT决策树集成的多类不平衡学习方法.实验表明,该方法可以有效地对多类不平衡问题进行学习.  相似文献   

14.
基于贝叶斯分类器的图像隐写分析   总被引:1,自引:1,他引:0       下载免费PDF全文
集成分类器是目前用于图像隐写分析的主流分类器。为提高集成分类器的检测精度,针对集成分类器基分类器组合方法过于简单,无法体现基分类器之间的内在联系,不能从整体上对结果进行判定的缺点,依据图像特征在集成分类器分类超平面上的投影值服从多维正态分布这一特性,提出了一种基于贝叶斯分类器的图像隐写分析算法。首先基于随机森林算法生成若干基分类器,然后计算类条件概率密度函数与先验概率并训练贝叶斯分类器,最后使用经过训练的贝叶斯分类器代替简单投票方法进行分类判决。算法的检测错误率比以往算法平均降低了1.6%,ROC曲线比简单投票方法更接近于左上角,即具有更高的检测率,AUC值平均增长约2.12%,并且训练时间仅有少量提高,最大提高约2.610s。可以有效提高集成分类器的检测精度。  相似文献   

15.
As the risk of malware is sharply increasing in Android platform,Android malware detection has become an important research topic.Existing works have demonstrated that required permissions of Android applications are valuable for malware analysis,but how to exploit those permission patterns for malware detection remains an open issue.In this paper,we introduce the contrasting permission patterns to characterize the essential differences between malwares and clean applications from the permission aspect Then a framework based on contrasting permission patterns is presented for Android malware detection.According to the proposed framework,an ensemble classifier,Enclamald,is further developed to detect whether an application is potentially malicious.Every contrasting permission pattern is acting as a weak classifier in Enclamald,and the weighted predictions of involved weak classifiers are aggregated to the final result.Experiments on real-world applications validate that the proposed Enclamald classifier outperforms commonly used classifiers for Android Malware Detection.  相似文献   

16.
The explosion of DNA and protein sequence data in public and private databases has been encouraging interdisciplinary research on biology and information technology. Gene expression profiles are just sequences of numbers, and the necessity of tools analyzing them to get useful information has risen significantly. In order to predict the cancer class of patients from the gene expression profile, this paper presents a classification framework that combines a pair of classifiers trained with mutually exclusive features. The idea behind feature selection with nonoverlapping correlation is to encourage classifier ensemble, which consists of multiple classifiers, to learn different aspects of training data, so that classifiers can search in a wide solution space. Experimental results show that the classifier ensemble produces higher recognition accuracy than conventional classifiers.  相似文献   

17.
Real world classification tasks may involve high dimensional missing data. The traditional approach to handling the missing data is to impute the data first, and then apply the traditional classification algorithms on the imputed data. This method first assumes that there exist a distribution or feature relations among the data, and then estimates missing items with existing observed values. A reasonable assumption is a necessary guarantee for accurate imputation. The distribution or feature relations of data, however, is often complex or even impossible to be captured in high dimensional data sets, leading to inaccurate imputation. In this paper, we propose a complete-case projection subspace ensemble framework, where two alternative partition strategies, namely bootstrap subspace partition and missing pattern-sensitive subspace partition, are developed for incomplete datasets with even missing patterns and uneven missing patterns, respectively. Multiple component classifiers are then separately trained in these subspaces. After that, a final ensemble classifier is constructed by a weighted majority vote of component classifiers. In the experiments, we demonstrate the effectiveness of the proposed framework over eight high dimensional UCI datasets. Meanwhile, we apply the two proposed partition strategies over data sets with different missing patterns. As indicated, the proposed algorithm significantly outperforms existing imputation methods in most cases.  相似文献   

18.
Recently, it has been seen that the ensemble classifier is an effective way to enhance the prediction performance. However, it usually suffers from the problem of how to construct an appropriate classifier based on a set of complex data, for example, the data with many dimensions or hierarchical attributes. This study proposes a method to constructe an ensemble classifier based on the key attributes. In addition to its high-performance on precision shared by common ensemble classifiers, the calculation results are highly intelligible and thus easy for understanding. Furthermore, the experimental results based on the real data collected from China Mobile show that the key-attributes-based ensemble classifier has the good performance on both of the classifier construction and the customer churn prediction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号