首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.
针对重现概念漂移检测中的概念表征和分类器选择问题,提出了一种适用于含重现概念漂移的数据流分类的算法——基于主要特征抽取的概念聚类和预测算法(Conceptual clustering and prediction through main feature extraction, MFCCP)。MFCCP通过计算不同批次样本的主要特征及影响因子的差异度以识别重复出现的概念,为每个概念维持且及时更新一个分类器,并依据Hoeffding不等式选择最合适的分类器对当前样本集实施分类,以 提高对概念漂移的反应能力。在3个数据集上的实验表明:MFCCP在含重现概念漂移的数据集上的分类准确率,对概念漂移的反应能力及对概念漂移检测的准确率均明显优于其他4种 对比算法,且MFCCP也适用于对不含重现概念漂移的数据流进行分类。  相似文献   

2.
This paper presents a novel ensemble classifier system designed to process data streams featuring occasional changes in their characteristics (concept drift). The ensemble is especially effective when the concepts reappear (recurring context). The system collects information on emerging contexts in a pool of elementary classifiers trained on subsequent data chunks. The pool is updated only when concept drift is detected. In contrast to other ensemble solutions, classifiers are not removed from the pool, and therefore, knowledge of past contexts is preserved for future use. To ensure high classification performance, the number of classifiers contributing to decision-making is fixed and limited. Only selected elements from the pool can join the decision-making ensemble. The process of selecting classifiers and adjusting their weights is realized by an evolutionary-based optimization algorithm that aims to minimize the system misclassification rate. Performance of the system is evaluated through a series of experiments presenting some key features of the system.  相似文献   

3.
袁泉  郭江帆 《计算机应用》2018,38(6):1591-1595
针对数据流中概念漂移和噪声问题,提出一种新型的增量式学习的数据流集成分类算法。首先,引入噪声过滤机制过滤噪声;然后,引入假设检验方法对概念漂移进行检测,以增量式C4.5决策树为基分类器构建加权集成模型;最后,实现增量式学习实例并随之动态更新分类模型。实验结果表明,该集成分类器对概念漂移的检测精度达到95%~97%,对数据流抗噪性保持在90%以上。该算法分类精度较高,且在检测概念漂移的准确性和抗噪性方面有较好的表现。  相似文献   

4.
目前关于概念漂移数据流的分类研究已经取得了许多成果,但大部分没有充分考虑到数据流中概念重复出现的情况,这将耗费大量的计算和内存资源,增加了分类错误的可能性。为此,基于概念的重复性提出了一种数据流集成分类算法,该算法运用集成分类思想处理数据流中的概念漂移,但在学习过程中不会将暂时失效的概念及对应基分类器删除,而是把它们的基本信息存储起来,方便以后调用,并可根据概念间的转换关系预测即将到来的概念,在提高分类精度的同时又提高了时间效率。实验结果验证了算法的有效性。  相似文献   

5.
Traditional approaches for text data stream classification usually require the manual labeling of a number of documents, which is an expensive and time consuming process. In this paper, to overcome this limitation, we propose to classify text streams by keywords without labeled documents so as to reduce the burden of labeling manually. We build our base text classifiers with the help of keywords and unlabeled documents to classify text streams, and utilize classifier ensemble algorithms to cope with concept drifting in text data streams. Experimental results demonstrate that the proposed method can build good classifiers by keywords without manual labeling, and when the ensemble based algorithm is used, the concept drift in the streams can be well detected and adapted, which performs better than the single window algorithm.  相似文献   

6.
在监督或半监督学习的条件下对数据流集成分类进行研究是一个很有意义的方向.从基分类器、关键技术、集成策略等三个方面进行介绍,其中,基分类器主要介绍了决策树、神经网络、支持向量机等;关键技术从增量、在线等方面介绍;集成策略主要介绍了boosting、stacking等.对不同集成方法的优缺点、对比算法和实验数据集进行了总结与分析.最后给出了进一步研究方向,包括监督和半监督学习下对于概念漂移的处理、对于同质集成和异质集成的研究,无监督学习下的数据流集成分类等.  相似文献   

7.
Mining data streams is the process of extracting information from non-stopping, rapidly flowing data records to provide knowledge that is reliable and timely. Streaming data algorithms need to be one pass and operate under strict limitations of memory and response time. In addition, the classification of streaming data requires learning in an environment where the data characteristics might change constantly. Many of the classification algorithms presented in literature assume a 100 % labeling rate, which is impractical and expensive when data records are rapidly flowing in. In this paper, a new incremental grid density based learning framework, the GC3 framework, is proposed to perform classification of streaming data with concept drift and limited labeling. The proposed framework uses grid density clustering to detect changes in the input data space. It maintains an evolving ensemble of classifiers to learn and adapt to the model changes over time. The framework also uses a uniform grid density sampling mechanism to obtain a uniform subset of samples for better classification performance with a lower labeling rate. The entire framework is designed to be one-pass, incremental and work with limited memory to perform any-time classification on demand. Experimental comparison with state of the art concept drift handling systems demonstrate the GC3 frameworks ability to provide high classification performance, using fewer models in the ensemble and with only 4-6 % of the samples labeled. The results show that the GC3 framework is effective and attractive for use in real world data stream classification applications.  相似文献   

8.
Among the many issues related to data stream applications, those involved in predictive tasks such as classification and regression, play a significant role in Machine Learning (ML). The so-called ensemble-based approaches have characteristics that can be appealing to data stream applications, such as easy updating and high flexibility. In spite of that, some of the current approaches consider unsuitable ways of updating the ensemble along with the continuous stream processing, such as growing it indefinitely or deleting all its base learners when trying to overcome a concept drift. Such inadequate actions interfere with two inherent characteristics of data streams namely, its possible infinite length and its need for prompt responses. In this paper, a new ensemble-based algorithm, suitable for classification tasks, is proposed. It relies on applying boosting to new batches of data aiming at maintaining the ensemble by adding a certain number of base learners, which is established as a function of the current ensemble accuracy rate. The updating mechanism enhances the model flexibility, allowing the ensemble to gather knowledge fast to quickly overcome high error rates, due to concept drift, while maintaining satisfactory results by slowing down the updating rate in stable concepts. Results comparing the proposed ensemble-based algorithm against eight other ensembles found in the literature show that the proposed algorithm is very competitive when dealing with data stream classification.  相似文献   

9.
为了能有效应对数据流中的概念漂移现象,提出结合无监督学习的数据流分类算法.该算法以集成式分类技术为基础,在分类过程中引入属性约简,利用聚类算法对数据进行聚类,通过对比分类和聚类结果的准确率,判断是否发生概念漂移.实验表明,文中算法在综合时间花销和准确率上取得较好效果.  相似文献   

10.
It is challenging to use traditional data mining techniques to deal with real-time data stream classifications. Existing mining classifiers need to be updated frequently to adapt to the changes in data streams. To address this issue, in this paper we propose an adaptive ensemble approach for classification and novel class detection in concept drifting data streams. The proposed approach uses traditional mining classifiers and updates the ensemble model automatically so that it represents the most recent concepts in data streams. For novel class detection we consider the idea that data points belonging to the same class should be closer to each other and should be far apart from the data points belonging to other classes. If a data point is well separated from the existing data clusters, it is identified as a novel class instance. We tested the performance of this proposed stream classification model against that of existing mining algorithms using real benchmark datasets from UCI (University of California, Irvine) machine learning repository. The experimental results prove that our approach shows great flexibility and robustness in novel class detection in concept drifting and outperforms traditional classification models in challenging real-life data stream applications.  相似文献   

11.
一种基于混合集成方法的数据流概念漂移检测方法   总被引:1,自引:0,他引:1  
近年来,数据流分类问题研究受到了普遍关注,而漂移检测是其中一个重要的研究问题。已有的分类模型有单一集成模型和混合模型,其漂移检测机制多基于理想的分布假设。单一模型集成可能导致分类误差扩大,噪音环境下分类效果受到了一定影响,而混合集成模型多存在分类精度和时间性能难以两者兼顾的问题。为此,基于简单的WE集成框架,构建了基于决策树和bayes混合模型的集成分类方法 WE-DTB,并利用典型的概念漂移检测机制Hoeffding Bounds和μ检验来进行数据流环境下概念漂移的检测和分类。大量实验表明,WE-DTB能够有效检测概念漂移且具有较好的分类精度及时空性能。  相似文献   

12.
作为一种典型的大数据,数据流具有连续、无限、概念漂移和快速到达等特点,因此传统的分类技术无法直接有效地应用于数据流挖掘。本文在经典的精度加权集成(Accuracy weighted ensemble,AWE)算法的基础上提出概念自适应快速决策树更新集成(Concept very fast decision tree update ensemble,CUE)算法。该算法不仅在基分类器的权重分配方面进行了改进,而且在解决数据块大小的敏感性问题以及增加基分类器之间的相异性方面,有明显的改善。实验表明在分类准确率上,CUE算法高于AWE算法。最后,提出聚类动态分类器选择(Dynamic classifier selection with clustering,DCSC)算法。该算法基于分类器动态选择的思想,没有繁琐的赋权值机制,所以时间效率较高。实验结果验证了DCSC算法的有效和高效性,并能有效地处理概念漂移。  相似文献   

13.
概念漂移是数据流学习领域中的一个难点问题,同时数据流中存在的类不平衡问题也会严重影响算法的分类性能。针对概念漂移和类不平衡的联合问题,在基于数据块集成的方法上引入在线更新机制,结合重采样和遗忘机制提出了一种增量加权集成的不平衡数据流分类方法(incremental weighted ensemble for imbalance learning, IWEIL)。该方法以集成框架为基础,利用基于可变大小窗口的遗忘机制确定基分类器对窗口内最近若干实例的分类性能,并计算基分类器的权重,随着新实例的逐个到达,在线更新IWEIL中每个基分器及其权重。同时,使用改进的自适应最近邻SMOTE方法生成符合新概念的新少数类实例以解决数据流中类不平衡问题。在人工数据集和真实数据集上进行实验,结果表明,相比于DWMIL算法,IWEIL在HyperPlane数据集上的G-mean和recall指标分别提升了5.77%和6.28%,在Electricity数据集上两个指标分别提升了3.25%和6.47%。最后,IWEIL在安卓应用检测问题上表现良好。  相似文献   

14.
对数据流分类分析的常用方法是集成学习。为了得到更好的分类效果,给出一种基于堆叠集成的数据流分类分析方法。该方法通过构造一个分类器对基分类器进行集成。实验结果表明,与基于投票或加权投票的集成方法相比,基于堆叠集成方法对概念漂移的快速适应能力以及预测准确率得到了提高。  相似文献   

15.
一种面向周期性概念漂移的数据流分类算法   总被引:1,自引:0,他引:1  
数据流挖掘已在许多领域得到应用,概念漂移检测是数据流挖掘研究中的一个重点.目前关于数据流中的概念检测的研究虽然取得了很多成果,却没有充分考虑到数据流概念"周期性"出现的特点.针对周期性概念漂移的特点,提出了当"历史概念"重现时,利用对应的模型来对数据流进行分类的方法,从而减小模型更新的代价,加快分类预测的速度.实验证明这种方法提高了运行效率.  相似文献   

16.
Liang  Shunpan  Pan  Weiwei  You  Dianlong  Liu  Ze  Yin  Ling 《Applied Intelligence》2022,52(12):13398-13414

Multi-label learning has attracted many attentions. However, the continuous data generated in the fields of sensors, network access, etc., that is data streams, the scenario brings challenges such as real-time, limited memory, once pass. Several learning algorithms have been proposed for offline multi-label classification, but few researches develop it for dynamic multi-label incremental learning models based on cascading schemes. Deep forest can perform representation learning layer by layer, and does not rely on backpropagation, using this cascading scheme, this paper proposes a multi-label data stream deep forest (VDSDF) learning algorithm based on cascaded Very Fast Decision Tree (VFDT) forest, which can receive examples successively, perform incremental learning, and adapt to concept drift. Experimental results show that the proposed VDSDF algorithm, as an incremental classification algorithm, is more competitive than batch classification algorithms on multiple indicators. Moreover, in dynamic flow scenarios, the adaptability of VDSDF to concept drift is better than that of the contrast algorithm.

  相似文献   

17.
概念漂移数据流挖掘算法综述   总被引:1,自引:0,他引:1  
丁剑  韩萌  李娟 《计算机科学》2016,43(12):24-29, 62
数据流是一种新型的数据模型,具有动态、无限、高维、有序、高速和变化等特性。在真实的数据流环境中,一些数据分布是随着时间改变的,即具有概念漂移特征,称为可变数据流或概念漂移数据流。因此处理数据流模型的方法需要处理时空约束和自适应调整概念变化。对概念漂移问题和概念漂移数据流分类、聚类和模式挖掘等内容进行综述。首先介绍概念漂移的类型和常用概念改变检测方法。为了解决概念漂移问题,数据流挖掘中常使用滑动窗口模型对新近事务进行处理。数据流分类常用的模型包括单分类模型和集成分类模型,常用的方法包括决策树、分类关联规则等。数据流聚类方式通常包括基于k- means的和非基于k- means的。模式挖掘可以为分类、聚类和关联规则等提供有用信息。概念漂移数据流中的模式包括频繁模式、序列模式、episode、模式树、模式图和高效用模式等。最后详细介绍其中的频繁模式挖掘算法和高效用模式挖掘算法。  相似文献   

18.
This work aims to connect two rarely combined research directions, i.e., non-stationary data stream classification and data analysis with skewed class distributions. We propose a novel framework employing stratified bagging for training base classifiers to integrate data preprocessing and dynamic ensemble selection methods for imbalanced data stream classification. The proposed approach has been evaluated based on computer experiments carried out on 135 artificially generated data streams with various imbalance ratios, label noise levels, and types of concept drift as well as on two selected real streams. Four preprocessing techniques and two dynamic selection methods, used on both bagging classifiers and base estimators levels, were considered. Experimentation results showed that, for highly imbalanced data streams, dynamic ensemble selection coupled with data preprocessing could outperform online and chunk-based state-of-art methods.  相似文献   

19.
基于多分类器的数据流中的概念漂移挖掘   总被引:4,自引:0,他引:4  
数据流中概念漂移的检测是当前数据挖掘领域的重要研究分支, 近年来得到了广泛的关注. 本文提出了一种称为 M_ID4 的数据流挖掘算法. 它是在大容量数据流挖掘中, 通过尽量少的训练样本来实现概念漂移检测的快速方法. 利用多分类器综合技术, M_ID4 实现了数据流中概念漂移的增量式检测和挖掘. 实验结果表明, M_ID4 算法在处理数据流的概念漂移上表现出比已有同类算法更高的精确度和适应性.  相似文献   

20.
The problem addressed in this study concerns mining data streams with concept drift. The goal of the article is to propose and validate a new approach to mining data streams with concept-drift using the ensemble classifier constructed from the one-class base classifiers. It is assumed that base classifiers of the proposed ensemble are induced from incoming chunks of the data stream. Each chunk consists of prototypes and information about whether the class prediction of these instances, carried-out at earlier steps, has been correct. Each data chunk can be updated by using the instance selection technique when new data arrive. When a new data chunk is formed, the ensemble model is also updated on the basis of weights assigned to each one-class classifier. In this article, two well-known instance-based learning algorithms—the CNN and the ENN—have been adopted to solve the one-class classification problems and, consequently, update the proposed classifier ensemble. The proposed approaches have been validated experimentally, and the computational experiment results are shown and discussed. The experiment results prove that the proposed approach using the ensemble classifier constructed from the one-class base classifiers with instance selection for chunk updating can outperform well-known approaches for data streams with concept drift.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号