首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
现有的时间序列异步周期模式挖掘方法是在获取1-pattern有效段及周期的基础上再以枚举法得到i-patterns,时间复杂度较高。为解决该问题,提出一种改进的异步周期模式挖掘方法。在时间序列符号化后,使用基于Sequitur的候选模式算法获取候选i-patterns及其事件位置序列,通过基于OEOP的i-patterns有效段生成算法得到1-pattern和i-patterns的有效段及周期,从而生成有效子序列。实验结果表明,该方法具有较高的挖掘效率。  相似文献   

2.
Incremental mining has attracted the attention of many researchers due to its usefulness in online applications. Many algorithms have thus been proposed for incrementally mining frequent itemsets. Maintaining a frequent-itemset lattice (FIL) is difficult for databases with large numbers of frequent itemsets, especially huge databases, due to the storage of links of nodes in the lattice. However, generating association rules from a FIL has been shown to be more effective than traditional methods such as directly generating rules from frequent itemsets or frequent closed itemsets. Therefore, when the number of frequent itemsets is not huge (i.e., they can be stored in the lattice without excessive memory overhead), the lattice-based approach outperforms approaches which mine association rules from frequent itemsets/frequent closed itemsets. However, incremental algorithms for building FILs have not yet been proposed. This paper proposes an effective approach for the maintenance of a FIL based on the pre-large concept in incremental mining. The building process of a FIL is first improved using two proposed theorems regarding the paternity relation between two nodes in the lattice. An effective approach for maintaining a FIL with dynamically inserted data is then proposed based on the pre-large and the diffset concepts. The experimental results show that the proposed approach outperforms the batch approach for building a FIL in terms of execution time.  相似文献   

3.
改进型关联规则增量式更新算法与实现   总被引:9,自引:0,他引:9  
关联规则是数据挖掘中的重要研究内容之一。目前,已经提出了许多算法用于高效的发现大规模数据库中的关联规则,但是对关联规则的维护问题的研究工作却很少。本文对在事务数据库不变,只对最小支持度和最小可信度进行改变的情况下,如何进行关联规则的维护问题进行了探讨,并提出了一种高效的增量式更新算法。  相似文献   

4.
增量式K-Medoids聚类算法   总被引:3,自引:0,他引:3  
高小梅  冯志  冯兴杰 《计算机工程》2005,31(Z1):181-183
聚类是一种非常有用的数据挖掘方法,可用于发现隐藏在数据背后的分组和数据分布信息。目前已经提出了许多聚类算法及其变种,但在增量式聚类算法研究方面所做的工作较少。当数据集因更新而发生变化时,数据挖掘的结果也应该进行相应的更新。由于数据量大,在更新后的数据集上重新执行聚类算法以更新挖掘结果显然比较低效,因此亟待研究增量式聚类算法。该文通过对K-Medoids聚类算法的改进,提出一种增量式K-Medoids聚类算法。它能够很好地解决传统聚类算法在伸缩性、数据定期更新时所面临的问题。  相似文献   

5.
Periodicity detection in time series databases   总被引:7,自引:0,他引:7  
Periodicity mining is used for predicting trends in time series data. Discovering the rate at which the time series is periodic has always been an obstacle for fully automated periodicity mining. Existing periodicity mining algorithms assume that the periodicity, rate (or simply the period) is user-specified. This assumption is a considerable limitation, especially in time series data where the period is not known a priori. In this paper, we address the problem of detecting the periodicity rate of a time series database. Two types of periodicities are defined, and a scalable, computationally efficient algorithm is proposed for each type. The algorithms perform in O(n log n) time for a time series of length n. Moreover, the proposed algorithms are extended in order to discover the periodic patterns of unknown periods at the same time without affecting the time complexity. Experimental results show that the proposed algorithms are highly accurate with respect to the discovered periodicity rates and periodic patterns. Real-data experiments demonstrate the practicality of the discovered periodic patterns.  相似文献   

6.
Frequent itemset mining allows us to find hidden, important information from large databases. Moreover, processing incremental databases in the itemset mining area has become more essential because a huge amount of data has been accumulated continually in a variety of application fields and users want to obtain mining results from such incremental data in more efficient ways. One of the major problems in incremental itemset mining is that the corresponding mining results can be very large-scale according to threshold settings and data volumes. In addition, it is considerably hard to analyze all of them and find meaningful information. Furthermore, not all of the mining results become actually important information. In this paper, to solve these problems, we propose an algorithm for mining weighted maximal frequent itemsets from incremental databases. By scanning a given incremental database only once, the proposed algorithm can not only conduct its mining operations suitable for the incremental environment but also extract a smaller number of important itemsets compared to previous approaches. The proposed method also has an effect on expert and intelligent systems since it can automatically provide more meaningful pattern results reflecting characteristics of given incremental databases and threshold settings, which can help users analyze the given data more easily. Our comprehensive experimental results show that the proposed algorithm is more efficient and scalable than previous state-of-the-art algorithms.  相似文献   

7.
一种实用的关联规则增量式更新算法   总被引:2,自引:0,他引:2  
薛锦  陈原斌 《计算机工程与应用》2003,39(13):212-213,217
关联规则是数据挖掘中的一个重要研究内容。目前已经提出了许多用于高效地发现大规模数据库中的关联规则的算法,而对已发现规则的更新及维护问题的研究却较少。该文提出了一种实用的关联规则增量式更新算法,以处理事务数据库中增加了新的事务数据集后相应的关联规则的更新问题,并对其性能进行了分析。  相似文献   

8.
一种有效的关联规则增量式更新算法   总被引:6,自引:2,他引:6  
关联规则是数据挖掘中的一个重要研究内容。目前已经提出了许多用于高效地发现大规模数据库中的关联规则的算法,而对已发现规则的更新及维护问题的研究却较少。文章提出了基于频繁模式树的关联规则增量式更新算法,以处理事务数据库中增加了新的事务数据集后相应关联规则的更新问题,并对其性能进行了分析。  相似文献   

9.
The mining of partial periodic patterns is an interesting type of data mining that is widely used in the analysis of markets, such as for stock management and sales management. However, the existence of huge data sets make the scalability of data-mining algorithms a very important objective, and in recent years parallel computing has been applied to general data-mining algorithms. This paper addresses the problem of mining multiple partial periodic patterns in a parallel computing environment. To reduce the cost of communication between the processors, our approach employs the independence property of prime numbers to classify partial periodic patterns into multiple independent sets. Moreover, a novel method of distributing mining tasks among the processors is proposed. A set of simulations is used to demonstrate the benefits of our approach.  相似文献   

10.
OSAF-tree--可迭代的移动序列模式挖掘及增量更新方法   总被引:1,自引:0,他引:1  
移动通信技术和无限定位技术的发展积累了海量的、动态增长的时空数据.利用数据挖掘技术从移动用户的时空行为轨迹当中挖掘用户移动序列模式,在移动通信、交通管理、基于位置服务等领域有着广泛的应用前景.由于移动环境网络资源珍贵、数据量大的特点,传统的序列模式挖掘方法在效率上很难满足需求.OSAF-tree算法基于投影的概念,只需要对数据库进行一遍扫描,就可以很好地处理移动序列模式的挖掘及其增量更新和迭代挖掘问题,这是一个非常高效的算法.与已有的方法相比,OSAF-tree算法在性能和I/O代价等方面都具有明显的优势.  相似文献   

11.
In this paper, we study the issues of mining and maintaining association rules in a large database of customer transactions. The problem of mining association rules can be mapped into the problems of finding large itemsets which are sets of items brought together in a sufficient number of transactions. We revise a graph-based algorithm to further speed up the process of itemset generation. In addition, we extend our revised algorithm to maintain discovered association rules when incremental or decremental updates are made to the databases. Experimental results show the efficiency of our algorithms. The revised algorithm is a significant improvement over the original one on mining association rules. The algorithms for maintaining association rules are more efficient than re-running the mining algorithms for the whole updated database and outperform previously proposed algorithms that need multiple passes over the database. Received 4 August 1999 / Revised 18 March 2000 / Accepted in revised form 18 October 2000  相似文献   

12.
快速挖掘全局最大频繁项目集   总被引:18,自引:1,他引:18  
挖掘最大频繁项目集是多种数据挖掘应用中的关键问题.现行可用的最大频繁项目集挖掘算法大多基于单机环境,针对分布式环境下的全局最大频繁项目集挖掘尚不多见.若将基于单机环境的最大频繁项目集挖掘算法运用于分布式环境,或运用分布式环境下的全局频繁项目集挖掘算法来挖掘全局最大频繁项目集,均会产生大量的候选频繁项目集,且网络通信代价高.为此,提出了快速挖掘全局最大频繁项目集算法FMGMFI(fast mining global maximum frequent itemsets),该算法采用FP-tree存储结构,可方便地从各局部FP-tree的相关路径中得到项目集的频度,同时采用自顶向下和自底向上的双向搜索策略,可有效地降低网络通信代价.实验结果表明,FMGMF算法是有效、可行的.  相似文献   

13.
Mining sequential patterns by pattern-growth: the PrefixSpan approach   总被引:12,自引:0,他引:12  
Sequential pattern mining is an important data mining problem with broad applications. However, it is also a difficult problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Most of the previously developed sequential pattern mining methods, such as GSP, explore a candidate generation-and-test approach [R. Agrawal et al. (1994)] to reduce the number of candidates to be examined. However, this approach may not be efficient in mining large sequence databases having numerous patterns and/or long patterns. In this paper, we propose a projection-based, sequential pattern-growth approach for efficient mining of sequential patterns. In this approach, a sequence database is recursively projected into a set of smaller projected databases, and sequential patterns are grown in each projected database by exploring only locally frequent fragments. Based on an initial study of the pattern growth-based sequential pattern mining, FreeSpan [J. Han et al. (2000)], we propose a more efficient method, called PSP, which offers ordered growth and reduced projected databases. To further improve the performance, a pseudoprojection technique is developed in PrefixSpan. A comprehensive performance study shows that PrefixSpan, in most cases, outperforms the a priori-based algorithm GSP, FreeSpan, and SPADE [M. Zaki, (2001)] (a sequential pattern mining algorithm that adopts vertical data format), and PrefixSpan integrated with pseudoprojection is the fastest among all the tested algorithms. Furthermore, this mining methodology can be extended to mining sequential patterns with user-specified constraints. The high promise of the pattern-growth approach may lead to its further extension toward efficient mining of other kinds of frequent patterns, such as frequent substructures.  相似文献   

14.
The goal of analyzing a time series database is to find whether and how frequent a periodic pattern is repeated within the series. Periodic pattern mining is the problem that regards temporal regularity. However, most of the existing algorithms have a major limitation in mining interesting patterns of users interest, that is, they can mine patterns of specific length with all the events sequentially one after another in exact positions within this pattern. Though there are certain scenarios where a pattern can be flexible, that is, it may be interesting and can be mined by neglecting any number of unimportant events in between important events with variable length of the pattern. Moreover, existing algorithms can detect only specific type of periodicity in various time series databases and require the interaction from user to determine periodicity. In this paper, we have proposed an algorithm for the periodic pattern mining in time series databases which does not rely on the user for the period value or period type of the pattern and can detect all types of periodic patterns at the same time, indeed these flexibilities are missing in existing algorithms. The proposed algorithm facilitates the user to generate different kinds of patterns by skipping intermediate events in a time series database and find out the periodicity of the patterns within the database. It is an improvement over the generating pattern using suffix tree, because suffix tree based algorithms have weakness in this particular area of pattern generation. Comparing with the existing algorithms, the proposed algorithm improves generating different kinds of interesting patterns and detects whether the generated pattern is periodic or not. We have tested the performance of our algorithm on both synthetic and real life data from different domains and found a large number of interesting event sequences which were missing in existing algorithms and the proposed algorithm was efficient enough in generating and detecting periodicity of flexible patterns on both types of data.  相似文献   

15.
Shared-nothing并行事务数据库系统中规则的挖掘与更新算法   总被引:1,自引:0,他引:1  
关联规则是数据挖掘中的一个重要研究内容.本文提出了Shared—nothing并行事务数据库系统(简称SNPDBS)中一种快速的关联规则挖掘算法SNPMAR,并考虑当最小支持度发生变化后SNPDBS中关联规则的高效更新问题,提出了一种有效的关联规则更新算法SNPIUA.  相似文献   

16.
刘佳新 《计算机工程》2012,38(12):39-41
现有的增量式挖掘算法在支持度发生变化时,需要对序列数据库进行重复挖掘,为减少由此产生的时空消耗,提出一种高效的增量式序列模式挖掘算法。算法采用频繁序列树作为序列存储结构,当序列数据库和最小支持度发生变化时,通过执行更新操作,实现频繁序列树的更新,利用深度优先遍历频繁序列树找到序列数据库中所有的序列模式。实验结果表明,与IncSpan算法和PrefixSpan算法相比,该算法的挖掘效率较高。  相似文献   

17.
Data mining is a method for extracting useful information that is necessary for a system from a database. As the types of data processed by the system are diversified, the transformed pattern mining techniques for processing these type of data have been proposed. Unlike the traditional pattern mining methods, erasable pattern mining is a technique for finding the patterns that can be removed by coming with a small profit. Erasable pattern mining should be able to process data by considering both the environment that the data are generated from and the characteristics of the data. An uncertain database is a database that is composed of uncertain data. Since erasable patterns discovered from uncertain data contain significant information, these patterns need to be extracted. In addition, databases gradually increase, because the data from various fields is generated and accumulated over data streams. Data streams should be processed as intelligently as possible to provide the useful data to the system in real time. In this paper, we propose an efficient erasable pattern mining algorithm that processes uncertain data that is generated over data streams. The uncertain erasable patterns discovered through the suggested technique are more meaningful information by considering the probability of the item and the profit. Moreover, the proposed method can perform efficient mining operations by using both tree and list structures. The performance of the suggested algorithm is verified through the performance tests compared with state-of-the-art algorithms using real data sets and synthetic data sets.  相似文献   

18.
随着数据集规模的不断增大,提高频繁项集的挖掘效率成为数据挖掘领域的研究重点.频繁项集的增量更新挖掘算法因其可以利用已挖掘发现的信息提高对新数据集的挖掘效率,成为重要的研究方向.但现有频繁项集增量更新算法大多基于APRIORI算法框架,性能提高有限.最近出现的建立在FP-TREE等树形结构上的增量更新算法又往往存在树形结构调整困难、已发现频繁项集及树形结构保存效率较低等问题,算法性能有待进一步地提高.对此,通过分析增量挖掘过程中的关键信息,提出了一种基于磁盘存储1项集计数的增量FP_GROWTH算法(IU_FPGROWTH_1COUNTING).该算法无需保存临时树形结构及临时挖掘结果,可以在原数据集及支持度均发生变化时,减少FP_GROWTH算法对数据集的扫描,提高频繁项集的挖掘效率.在生成以及真实数据集上进行了验证实验以及性能分析,结果表明IU_FPGROWTH_1COUNTING是一种有效的频繁项集增量更新挖掘算法.  相似文献   

19.
基于频繁模式树的关联规则增量式更新算法   总被引:48,自引:1,他引:48  
研究了大型事务数据库中关联规则的增量式更新总是,提出了一种基于频繁模式树的关联规则增量式更新算法,以处理最小支持度或事务数据库发生变化后相应关联规则的更新问题,并对其性能进行了分析。  相似文献   

20.
增量式维护关联规则的时间窗口技术   总被引:3,自引:0,他引:3  
数据库中的知识发现是指在大型数据集中识别有效、新奇、潜在有用、且最终可理解模式的非平凡的过程。人们已经提出了许多种知识发现算法,然而,人们对由于数据随时间变化而导致的所发现知识的更新维护问题却较少研究。本文提出一种用于增量式关联规则维护的时间窗口技术。该技术可以集中在当前数据中发现强关联规则。避免利用过时数据,为了避免在已有数据上重新发现,降低数据存储开销,我们保存了次强关联规则。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号