首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 639 毫秒
1.
基于多分类器的数据流中的概念漂移挖掘   总被引:4,自引:0,他引:4  
数据流中概念漂移的检测是当前数据挖掘领域的重要研究分支, 近年来得到了广泛的关注. 本文提出了一种称为 M_ID4 的数据流挖掘算法. 它是在大容量数据流挖掘中, 通过尽量少的训练样本来实现概念漂移检测的快速方法. 利用多分类器综合技术, M_ID4 实现了数据流中概念漂移的增量式检测和挖掘. 实验结果表明, M_ID4 算法在处理数据流的概念漂移上表现出比已有同类算法更高的精确度和适应性.  相似文献   

2.
The last decade has seen a surge of interest in adaptive learning algorithms for data stream classification, with applications ranging from predicting ozone level peaks, learning stock market indicators, to detecting computer security violations. In addition, a number of methods have been developed to detect concept drifts in these streams. Consider a scenario where we have a number of classifiers with diverse learning styles and different drift detectors. Intuitively, the current ‘best’ (classifier, detector) pair is application dependent and may change as a result of the stream evolution. Our research builds on this observation. We introduce the Tornado framework that implements a reservoir of diverse classifiers, together with a variety of drift detection algorithms. In our framework, all (classifier, detector) pairs proceed, in parallel, to construct models against the evolving data streams. At any point in time, we select the pair which currently yields the best performance. To this end, we introduce the CAR measure, which is employed to balance classification, adaptation and resource utilization requirements. We further incorporate two novel stacking-based drift detection methods, namely the FHDDMS and \(\hbox {FHDDMS}_{\mathrm{add}}\) approaches. The experimental evaluation confirms that the current ‘best’ (classifier, detector) pair is not only heavily dependent on the characteristics of the stream, but also that this selection evolves as the stream flows. Further, our FHDDMS variants detect concept drifts accurately in a timely fashion while outperforming the state-of-the-art.  相似文献   

3.
针对传统分类器的泛化性能差、可解释性及学习效率低等问题, 提出0阶TSK-FC模糊分类器.为了将该分类器 应用到大规模数据的分类中, 提出增量式0阶TSK-IFC模糊分类器, 采用增量式模糊聚类算 法(IFCM($c+p$))训练模糊规则参数并通过适当的矩阵变换提升参数学习效率.仿真实验表明, 与FCPM-IRLS模糊分类器、径向基函数神经网 络相比, 所提出的模糊分类器在不同规模数据集中均能保持很好的性能, 且TSK-IFC模糊分类器在大规模数据分类中尤为突出.  相似文献   

4.
现有概念漂移处理算法在检测到概念漂移发生后,通常需要在新到概念上重新训练分类器,同时“遗忘”以往训练的分类器。在概念漂移发生初期,由于能够获取到的属于新到概念的样本较少,导致新建的分类器在短时间内无法得到充分训练,分类性能通常较差。进一步,现有的基于在线迁移学习的数据流分类算法仅能使用单个分类器的知识辅助新到概念进行学习,在历史概念与新到概念相似性较差时,分类模型的分类准确率不理想。针对以上问题,文中提出一种能够利用多个历史分类器知识的数据流分类算法——CMOL。CMOL算法采取分类器权重动态调节机制,根据分类器的权重对分类器池进行更新,使得分类器池能够尽可能地包含更多的概念。实验表明,相较于其他相关算法,CMOL算法能够在概念漂移发生时更快地适应新到概念,显示出更高的分类准确率。  相似文献   

5.
We propose a particle filter‐based learning method, PF‐LR, for learning logistic regression models from evolving data streams. The method inherently handles concept drifts in a data stream and is able to learn an  ensemble of logistic regression models with particle filtering. A key feature of PF‐LR is that in its resampling, step particles are sampled from the ones that maximize the classification accuracy on the current data batch. Our experiments show that PF‐LR gives good performance, even with relatively small batch sizes. It reacts to concept drifts quicker than conventional particle filters while being robust to noise. In addition, PF‐LR learns more accurate models and is more computationally efficient than the gradient descent method for learning logistic regression models. Furthermore, we evaluate PF‐LR on both synthetic and real data sets and find that PF‐LR outperforms some other state‐of‐the‐art streaming mining algorithms on most of the data sets tested.  相似文献   

6.
在开放环境下,数据流具有数据高速生成、数据量无限和概念漂移等特性.在数据流分类任务中,利用人工标注产生大量训练数据的方式昂贵且不切实际.包含少量有标记样本和大量无标记样本且还带概念漂移的数据流给机器学习带来了极大挑战.然而,现有研究主要关注有监督的数据流分类,针对带概念漂移的数据流的半监督分类的研究尚未引起足够的重视....  相似文献   

7.
Along with the increase of data and information, incremental learning ability turns out to be more and more important for machine learning approaches. The online algorithms try not to remember irrelevant information instead of synthesizing all available information (as opposed to classic batch learning algorithms). In this study, we attempted to increase the prediction accuracy of an incremental version of Naive Bayes model by integrating instance based learning. We performed a large-scale comparison of the proposed method with other state-of-the-art algorithms on several datasets and the proposed method produce better accuracy in most cases.  相似文献   

8.
Liang  Shunpan  Pan  Weiwei  You  Dianlong  Liu  Ze  Yin  Ling 《Applied Intelligence》2022,52(12):13398-13414

Multi-label learning has attracted many attentions. However, the continuous data generated in the fields of sensors, network access, etc., that is data streams, the scenario brings challenges such as real-time, limited memory, once pass. Several learning algorithms have been proposed for offline multi-label classification, but few researches develop it for dynamic multi-label incremental learning models based on cascading schemes. Deep forest can perform representation learning layer by layer, and does not rely on backpropagation, using this cascading scheme, this paper proposes a multi-label data stream deep forest (VDSDF) learning algorithm based on cascaded Very Fast Decision Tree (VFDT) forest, which can receive examples successively, perform incremental learning, and adapt to concept drift. Experimental results show that the proposed VDSDF algorithm, as an incremental classification algorithm, is more competitive than batch classification algorithms on multiple indicators. Moreover, in dynamic flow scenarios, the adaptability of VDSDF to concept drift is better than that of the contrast algorithm.

  相似文献   

9.
袁泉  郭江帆 《计算机应用》2018,38(6):1591-1595
针对数据流中概念漂移和噪声问题,提出一种新型的增量式学习的数据流集成分类算法。首先,引入噪声过滤机制过滤噪声;然后,引入假设检验方法对概念漂移进行检测,以增量式C4.5决策树为基分类器构建加权集成模型;最后,实现增量式学习实例并随之动态更新分类模型。实验结果表明,该集成分类器对概念漂移的检测精度达到95%~97%,对数据流抗噪性保持在90%以上。该算法分类精度较高,且在检测概念漂移的准确性和抗噪性方面有较好的表现。  相似文献   

10.
A leaders set which is derived using the leaders clustering method can be used in place of a large training set to reduce the computational burden of a classifier. Recently, a fast and efficient leader-based classifier called weighted k-nearest leader-based classifier is shown by us to be an efficient and faster classifier. But, there exist some uncertainty while calculating the relative importance (weight) of the prototypes. This paper proposes a generalization over the earlier proposed k-nearest leader-based classifier where a novel soft computing approach is used to resolve the uncertainty. Combined principles of rough set theory and fuzzy set theory are used to analyze the proposed method. The proposed method called rough-fuzzy weighted k-nearest leader classifier (RF-wk-NLC) uses a two level hierarchy of prototypes along with their relative importance. RF-wk-NLC is shown by using some standard data sets to have improved performance and is compared with the earlier related methods.  相似文献   

11.
流数据分类中的概念漂移问题研究   总被引:3,自引:0,他引:3  
传统的流数据分类算法基于滑动窗口来优化现有分类器或建立多个分类器来跟踪概念的漂移过程,而不能根据概念漂移的强弱程度自适应地进行分类.在结合当前主流的CVFDT和集成分类器算法的基础之上,提出一种新型流数据分类算法:SADT算法.算法动态地判断概念漂移的发生,自动决定是优化还是重建分类器,适用于不同类型的数据的分类.通过分析和实验论证,该算法在处理概念漂移时具有更好的适应性.  相似文献   

12.
李燕  张玉红  胡学钢 《计算机科学》2010,37(12):138-142
具有概念漂移的含噪数据流的分类问题成为数据流挖掘领域研究的热点之一。提出了一种基于C4. 5和Naive I3ayes混合模型的数据流分类算法CDSMM。它以C4.5作为基分类器,采用朴素贝叶斯分类器过滤噪音,同时引入假设检验中的u检验方法检测概念漂移,动态更新模型。实验结果表明,CDSMM算法在处理带有噪音的概念漂移数据流时具有比同类算法更好的分类正确率。  相似文献   

13.
Data stream classification is a hot topic in data mining research. The great challenge is that the class priors may evolve along the data sequence. Algorithms have been proposed to estimate the dynamic class priors and adjust the classifier accordingly. However, the existing algorithms do not perform well on prior estimation due to the lack of samples from the target distribution. Sample size has great effects in parameter estimation and small-sample effects greatly contaminate the estimation performance. In this paper, we propose a novel parameter estimation method called transfer estimation. Transfer estimation makes use of samples not only from the target distribution but also from similar distributions. We apply this new estimation method to the existing algorithms and obtain an improved algorithm. Experiments on both synthetic and real data sets show that the improved algorithm outperforms the existing algorithms on both class prior estimation and classification.  相似文献   

14.
In many real-world applications, pattern recognition systems are designed a priori using limited and imbalanced data acquired from complex changing environments. Since new reference data often becomes available during operations, performance could be maintained or improved by adapting these systems through supervised incremental learning. To avoid knowledge corruption and sustain a high level of accuracy over time, an adaptive multiclassifier system (AMCS) may integrate information from diverse classifiers that are guided by a population-based evolutionary optimization algorithm. In this paper, an incremental learning strategy based on dynamic particle swarm optimization (DPSO) is proposed to evolve heterogeneous ensembles of classifiers (where each classifier corresponds to a particle) in response to new reference samples. This new strategy is applied to video-based face recognition, using an AMCS that consists of a pool of fuzzy ARTMAP (FAM) neural networks for classification of facial regions, and a niching version of DPSO that optimizes all FAM parameters such that the classification rate is maximized. Given that diversity within a dynamic particle swarm is correlated with diversity within a corresponding pool of base classifiers, DPSO properties are exploited to generate and evolve diversified pools of FAM classifiers, and to efficiently select ensembles among the pools based on accuracy and particle swarm diversity. Performance of the proposed strategy is assessed in terms of classification rate and resource requirements under different incremental learning scenarios, where new reference data is extracted from real-world video streams. Simulation results indicate the DPSO strategy provides an efficient way to evolve ensembles of FAM networks in an AMCS. Maintaining particle diversity in the optimization space yields a level of accuracy that is comparable to AMCS using reference ensemble-based and batch learning techniques, but requires significantly lower computational complexity than assessing diversity among classifiers in the feature or decision spaces.  相似文献   

15.
Abstract

The rapid growth of the information technology accelerates organizations to generate vast volumes of high-velocity data streams. The concept drift is a crucial issue, and discovering the sequential patterns over data streams are more challenging. The ensemble classifiers incrementally learn the data for providing quick reaction to the concept drifts. The ensemble classifiers have to process both the gradual and sudden concept drifts that happen in the real-time data streams. Thus, a novel ensemble classifier is essential that significantly reacting to various types of concept drifts quickly and maintaining the classification accuracy. This work proposes the stream data mining on the fly using an adaptive online learning rule (SOAR) model to handle both the gradual and sudden pattern changes and improves mining accuracy. Adding the number of classifiers fails because the ensemble tends to include redundant classifiers instead of high-quality ones. Thus, the SOAR includes different diversity levels of classifiers in the ensemble to provide fast recovery from both the concept drifts. Moreover, the SOAR synthesizes the essential features of the block and online-based ensemble and updates the weight of each classifier, regarding its quality. It facilitates adaptive windowing to handle both gradual and sudden concept drifts. To reduce the computational cost and analyze the data stream quickly, the SOAR caches the occurred primitive patterns into a bitmap with the internal relationship. Finally, the experimental results show that the SOAR performs better classification and accuracy over data streams.  相似文献   

16.
最小距离分类器的改进算法--加权最小距离分类器   总被引:12,自引:0,他引:12  
任靖  李春平 《计算机应用》2005,25(5):992-994
最小距离分类器是一种简单而有效的分类方法。为了提高最小距离分类器的分类性能,主要的改进方法是选择更有效的距离度量。通过分析多重限制分类器和决策树分类器的分类原则,提出了基于标准化欧式距离的加权最小距离分类器。该分类器通过对标称型和字符串型属性的距离的加权定义。以及增加属性值的范围约束,扩大了最小标准化欧式距离分类器的适用范围,同时提高了其分类准确率。实验结果表明,加权最小距离分类器具有较高的分类准确率。  相似文献   

17.
Clustering entities into dense parts is an important issue in social network analysis. Real social networks usually evolve over time and it remains a problem to efficiently cluster dynamic social networks. In this paper, a dynamic social network is modeled as an initial graph with an infinite change stream, called change stream model, which naturally eliminates the parameter setting problem of snapshot graph model. Based on the change stream model, the incremental version of a well known k-clique clustering problem is studied and incremental k-clique clustering algorithms are proposed based on local DFS (depth first search) forest updating technique. It is theoretically proved that the proposed algorithms outperform corresponding static ones and incremental spectral clustering algorithm in terms of time complexity. The practical performances of our algorithms are extensively evaluated and compared with the baseline algorithms on ENRON and DBLP datasets. Experimental results show that incremental k-clique clustering algorithms are much more efficient than corresponding static ones, and have no accumulating errors that incremental spectral clustering algorithm has and can capture the evolving details of the clusters that snapshot graph model based algorithms miss.  相似文献   

18.
The ability to predict a student’s performance could be useful in a great number of different ways associated with university-level distance learning. Students’ marks in a few written assignments can constitute the training set for a supervised machine learning algorithm. Along with the explosive increase of data and information, incremental learning ability has become more and more important for machine learning approaches. The online algorithms try to forget irrelevant information instead of synthesizing all available information (as opposed to classic batch learning algorithms). Nowadays, combining classifiers is proposed as a new direction for the improvement of the classification accuracy. However, most ensemble algorithms operate in batch mode. Therefore a better proposal is an online ensemble of classifiers that combines an incremental version of Naive Bayes, the 1-NN and the WINNOW algorithms using the voting methodology. Among other significant conclusions it was found that the proposed algorithm is the most appropriate to be used for the construction of a software support tool.  相似文献   

19.
Although classification in centralized environments has been widely studied in recent years, it is still an important research problem for classification in P2P networks due to the popularity of P2P computing environments. The main target of classification in P2P networks is how to efficiently decrease prediction error with small network overhead. In this paper, we propose an OS-ELM based ensemble classification framework for distributed classification in a hierarchical P2P network. In the framework, we apply the incremental learning principle of OS-ELM to the hierarchical P2P network to generate an ensemble classifier. There are two kinds of implementation methods of the ensemble classifier in the P2P network, one-by-one ensemble classification and parallel ensemble classification. Furthermore, we propose a data space coverage based peer selection approach to reduce high the communication cost and large delay. We also design a two-layer index structure to efficiently support peer selection. A peer creates a local Quad-tree to index its local data and a super-peer creates a global Quad-tree to summarize its local indexes. Extensive experimental studies verify the efficiency and effectiveness of the proposed algorithms.  相似文献   

20.
Most data-mining algorithms assume static behavior of the incoming data. In the real world, the situation is different and most continuously collected data streams are generated by dynamic processes, which may change over time, in some cases even drastically. The change in the underlying concept, also known as concept drift, causes the data-mining model generated from past examples to become less accurate and relevant for classifying the current data. Most online learning algorithms deal with concept drift by generating a new model every time a concept drift is detected. On one hand, this solution ensures accurate and relevant models at all times, thus implying an increase in the classification accuracy. On the other hand, this approach suffers from a major drawback, which is the high computational cost of generating new models. The problem is getting worse when a concept drift is detected more frequently and, hence, a compromise in terms of computational effort and accuracy is needed. This work describes a series of incremental algorithms that are shown empirically to produce more accurate classification models than the batch algorithms in the presence of a concept drift while being computationally cheaper than existing incremental methods. The proposed incremental algorithms are based on an advanced decision-tree learning methodology called “Info-Fuzzy Network” (IFN), which is capable to induce compact and accurate classification models. The algorithms are evaluated on real-world streams of traffic and intrusion-detection data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号