共查询到20条相似文献,搜索用时 265 毫秒
1.
Jason J. Jung 《Expert systems with applications》2012,39(3):3169-3173
There have been many kinds of association rule mining (ARM) algorithms, e.g., Apriori and FP-tree, to discover meaningful frequent patterns from a large dataset. Particularly, it is more difficult for such ARM algorithms to be applied for temporal databases which are continuously changing over time. Such algorithms are generally based on repeating time-consuming tasks, e.g., scanning databases. To deal with this problem, in this paper, we propose a constraint graph-based method for maintaining frequent patterns (FP) discovered from the temporal databases. Particularly, the constraint graph, which is represented as a set of constraint between two items, can be established by temporal persistency of the patterns. It means that some patterns can be used to build the constraint graph, when the patterns have been shown in a set of the FP. Two types of constraints can be generated by users and adaptation. Based on our scheme, we find that a large number of dataset has been efficiently reduced during mining process and the gathering information while updating. 相似文献
2.
基于FP-Tree的反向频繁项集挖掘 总被引:2,自引:0,他引:2
在拓展现有反向频繁挖掘问题定义,探索反向频繁项集的3个具体应用后,提出了一种基于FP-tree的反向频繁项集挖掘方法.该方法首先采用分治思想,将目标约束划分为若干子约束,每步求解一个子线性约束问题,经过若干步迭代后找到一个满足整个给定约束的目标FP-tree;然后根据目标FP-tree生成一个仅含频繁项的临时事务数据库TempD;最后通过向TempD中撒入非频繁项得到目标数据集.理论分析和实验表明该方法是正确的、高效的,且与现有方法仅能输出1个目标数据集相比,该方法能够输出较多的目标数据集. 相似文献
3.
Qinghua Zou Wesley Chu David Johnson Henry Chiu 《Knowledge and Information Systems》2002,4(4):466-482
Efficient algorithms to mine frequent patterns are crucial to many tasks in data mining. Since the Apriori algorithm was
proposed in 1994, there have been several methods proposed to improve its performance. However, most still adopt its candidate
set generation-and-test approach. In addition, many methods do not generate all frequent patterns, making them inadequate
to derive association rules. We propose a pattern decomposition (PD) algorithm that can significantly reduce the size of the
dataset on each pass, making it more efficient to mine all frequent patterns in a large dataset. The proposed algorithm avoids
the costly process of candidate set generation and saves time by reducing the size of the dataset. Our empirical evaluation
shows that the algorithm outperforms Apriori by one order of magnitude and is faster than FP-tree algorithm.
Received 14 May 2001 / Revised 5 September 2001 / Accepted in revised form 26 October 2001
Correspondence and offprint requests to: Qinghua Zou, Department of Computer Science, California University–Los Angeles, CA 90095, USA. Email: zou@cs.ucla.eduau 相似文献
4.
基于改进FP-树的最大模式挖掘算法 总被引:2,自引:0,他引:2
频繁模式挖掘是数据挖掘领域中的一个非常重要的分支,但是由于其内在的计算复杂性,挖掘密集型数据的频繁模式完全集非常困难而且数量往往大得惊人,难以理解和应用。最大频繁模式(最大模式)压缩隐含了所有的频繁模式,存储所占用的空间远远小于完全集,因而最大模式挖掘具有十分重要的意义。该文改进了传统的FP-树结构并提出了一种有效的基于改进FP-树的最大模式挖掘算法IFP-M ax;通过引入后缀子树的概念,算法在挖掘过程中不用生成最大频繁模式候选集,从而大大提高了算法的时间效率和空间可伸缩性。实验表明,IFP-M ax的挖掘速度比M AFIA和GenM ax大约快一个数量级。 相似文献
5.
Guoliang Li Jianhua Feng Jianyong Wang Lizhu Zhou 《Data mining and knowledge discovery》2009,18(3):472-516
Existing algorithms of mining frequent XML query patterns (XQPs) employ a candidate generate-and-test strategy. They involve
expensive candidate enumeration and costly tree-containment checking. Further, most of existing methods compute the frequencies
of candidate query patterns from scratch periodically by checking the entire transaction database, which consists of XQPs
transferred from user query logs. However, it is not straightforward to maintain such discovered frequent patterns in real
XML databases as there may be frequent updates that may not only invalidate some existing frequent query patterns but also
generate some new frequent query patterns. Therefore, a drawback of existing methods is that they are rather inefficient for
the evolution of transaction databases. To address above-mentioned problems, this paper proposes an efficient algorithm ESPRIT to mine frequent XQPs without costly tree-containment checking. ESPRIT transforms XML queries into sequences using a one-to-one mapping technique and mines the frequent sequences to generate frequent
XQPs. We propose two efficient incremental algorithms, ESPRIT-i and ESPRIT-i
+, to incrementally mine frequent XQPs. We devise several novel optimization techniques of query rewriting, cache lookup, and
cache replacement to improve the answerability and the hit rate of caching. We have implemented our algorithms and conducted
a set of experimental studies on various datasets. The experimental results demonstrate that our algorithms achieve high efficiency
and scalability and outperform state-of-the-art methods significantly. 相似文献
6.
Mining Condensed Frequent-Pattern Bases 总被引:4,自引:1,他引:3
Frequent-pattern mining has been studied extensively and has many useful applications. However, frequent-pattern mining often generates too many patterns to be truly efficient or effective. In many applications, it is sufficient to generate and examine frequent patterns with a sufficiently good approximation of the support frequency instead of in full precision. Such a compact but close-enough frequent-pattern base is called a condensed frequent-pattern base.In this paper, we propose and examine several alternatives for the design, representation, and implementation of such condensed frequent-pattern bases. Several algorithms for computing such pattern bases are proposed. Their effectiveness at pattern compression and methods for efficiently computing them are investigated. A systematic performance study is conducted on different kinds of databases, and demonstrates the effectiveness and efficiency of our approach in handling frequent-pattern mining in large databases. 相似文献
7.
The FP-growth algorithm using the FP-tree has been widely studied for frequent pattern mining because it can dramatically improve performance compared to the candidate generation-and-test paradigm of Apriori. However, it still requires two database scans, which are not consistent with efficient data stream processing. In this paper, we present a novel tree structure, called CP-tree (compact pattern tree), that captures database information with one scan (insertion phase) and provides the same mining performance as the FP-growth method (restructuring phase). The CP-tree introduces the concept of dynamic tree restructuring to produce a highly compact frequency-descending tree structure at runtime. An efficient tree restructuring method, called the branch sorting method, that restructures a prefix-tree branch-by-branch, is also proposed in this paper. Moreover, the CP-tree provides full functionality for interactive and incremental mining. Extensive experimental results show that the CP-tree is efficient for frequent pattern mining, interactive, and incremental mining with a single database scan. 相似文献
8.
挖掘最大频繁项目集是多种数据挖掘应用中的关键问题,之前的很多研究都是采用Apriori类的候选项目集生成-检验方法.然而,候选项目集产生的代价是很高的,尤其是在存在大量强模式和/或长模式的时候.提出了一种快速的基于频繁模式树(FP-tree)的最大频繁项目集挖掘DMFIA(discover maximum frequent itemsets algorithm)及其更新算法UMFIA(update maximum frequent itemsets algorithm).算法UMFIA将充分利用以前的挖掘结果来减少在更新的数据库中发现新的最大频繁项目集的费用. 相似文献
9.
FP-growth算法是目前较高效的频繁模式挖掘算法之一,该算法不产生候选项集,但递归构造“条件FP-Tree”的CPU 开销和存储很大.为此提出了一种频繁模式挖掘算法IFPmine.首先,为了节省内存空间,采用了约束子树的挖掘方法;其次,采用了数组技术来减少树的遍历时间,从而提高算法的效率.实验结果表明,IFP算法是一种较有效的频繁模式挖掘算法,其挖掘效率优于STFP-树算法和FP-树算法,而需要的内存却少于STFP-树和FP-树算法. 相似文献
10.
Carson Kai-Sang Leung Quamrul I. Khan Zhan Li Tariqul Hoque 《Knowledge and Information Systems》2007,11(3):287-311
Since its introduction, frequent-pattern mining has been the subject of numerous studies, including incremental updating.
Many existing incremental mining algorithms are Apriori-based, which are not easily adoptable to FP-tree-based frequent-pattern
mining. In this paper, we propose a novel tree structure, called CanTree (canonical-order tree), that captures the content of the transaction database and orders tree nodes according to some canonical order. By exploiting
its nice properties, the CanTree can be easily maintained when database transactions are inserted, deleted, and/or modified.
For example, the CanTree does not require adjustment, merging, and/or splitting of tree nodes during maintenance. No rescan
of the entire updated database or reconstruction of a new tree is needed for incremental updating. Experimental results show
the effectiveness of our CanTree in the incremental mining of frequent patterns. Moreover, the applicability of CanTrees is
not confined to incremental mining; CanTrees can also be applicable to other frequent-pattern mining tasks including constrained
mining and interactive mining.
Carson K.-S. Leung received his B.Sc.(Honours), M.Sc., and Ph.D. degrees, all in computer science, from the University of British Columbia,
Canada. Currently, he is an Assistant Professor at the University of Manitoba, Canada. His research interests include the
areas of databases, data mining, and data warehousing. His work has been published in refereed journals and conferences such
as ACM Transactions on Database Systems (TODS), IEEE International Conference on Data Engineering (ICDE), and IEEE International Conference on Data Mining (ICDM)
Quamrul I. Khan received his B.Sc. degree in computer science from North South University, Bangladesh, in 2001. He then worked as a Test
Engineer and a Software Engineer for a few years before he started his current M.Sc. degree program in computer science at
the University of Manitoba under the academic supervision of Dr. C. K.-S. Leung.
Zhan Li received her B.Eng. degree in computer engineering from Harbin Engineering University, China, in 2002. Currently, she is
pursuing her M.Sc. degree in computer science at the University of Manitoba under the academic supervision of Dr. C. K.-S.
Leung.
Tariqul Hoque received his B.Sc. degree in computer science from North South University, Bangladesh, in 2001. Currently, he is pursuing
his M.Sc. degree in computer science at the University of Manitoba under the academic supervision of Dr. C. K.-S. Leung. 相似文献
11.
在理解现有的最大长度频繁项集挖掘问题的定义,探索最大长度频繁项集的几个具体应用后,提出了一种新的基于FP-tree(Frequent Pattern tree)结构的最大长度频繁项集挖掘方法——MLFI算法。该算法仅对初始的FP-tree实现遍历操作,从而完成对最大长度频繁项集的挖掘。在算法整个执行过程中,仅用到了一棵初始的FP-tree。理论分析和实验证明,该算法加快了挖掘速度,提高了挖掘效率。 相似文献
12.
关联规则中FP树算法的研究与改进 总被引:1,自引:0,他引:1
传统的基于关联规则的FP-树算法在挖掘频繁项目集算法中应用很广,它在数据挖掘过程中不需要产生候选集,但是该FP-树算法在挖掘较大型数据库时运行速度慢、占用内存大或根本无法构造基于内存的FP-树。为了解决这些问题,本文提出了一种占用内存少、能满足大型数据库挖掘需求的改进的FP树算法。 相似文献
13.
一种基于FP-tree的最大频繁项目集挖掘算法 总被引:7,自引:0,他引:7
挖掘关联规则是数据挖掘领域中的重要研究内容,其中挖掘最大频繁项目集是挖掘关联规则中的关键问题之一,以前的许多挖掘最大频繁项目集算法是先生成候选,再进行检验,然而候选项目集产生的代价是很高的,尤其是存在大量长模式的时候。文中改进了FP 树结构,提出了一种基于FP tree的快速挖掘最大频繁项目集的算法DMFIA 1,该算法不需要生成最大频繁候选项目集,比DMFIA算法挖掘最大频繁项目集的效率更高。改进的FP 树是单向的,每个结点只保留指向父结点的指针,这大约节省了三分之一的树空间。 相似文献
14.
Efficient mining of maximal frequent itemsets from databases on a cluster of workstations 总被引:2,自引:2,他引:0
In this paper, we propose two parallel algorithms for mining maximal frequent itemsets from databases. A frequent itemset
is maximal if none of its supersets is frequent. One parallel algorithm is named distributed max-miner (DMM), and it requires very low communication and synchronization overhead in distributed computing systems. DMM has the
local mining phase and the global mining phase. During the local mining phase, each node mines the local database to discover
the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for the top-down search in the subsequent
global mining phase. A new prefix tree data structure is developed to facilitate the storage and counting of the global candidate
itemsets of different sizes. This global mining phase using the prefix tree can work with any local mining algorithm. Another
parallel algorithm, named parallel max-miner (PMM), is a parallel version of the sequential max-miner algorithm (Proc of ACM SIGMOD Int Conf on Management of Data, 1998,
pp 85–93). Most of existing mining algorithms discover the frequent k-itemsets on the kth pass over the databases, and then generate the candidate (k + 1)-itemsets for the next pass. Compared to those level-wise algorithms, PMM looks ahead at each pass and prunes more candidate
itemsets by checking the frequencies of their supersets. Both DMM and PMM were implemented on a cluster of workstations, and
their performance was evaluated for various cases. They demonstrate very good performance and scalability even when there
are large maximal frequent itemsets (i.e., long patterns) in databases.
相似文献
Congnan LuoEmail: |
15.
FP-growth算法是挖掘频繁项集的经典算法,它利用FP-树这种紧凑的数据结构存储事务数据库与频繁项集挖掘相关的全部信息,但对于挖掘加权频繁项集并不合适。分析了现有加权频繁项集挖掘算法中存在的问题,并对FP-树进行改进,构造新的加权FP-树,提出了有效挖掘加权频繁项集的算法。最后举例说明了算法的挖掘过程,并通过实验验证了算法的有效性。 相似文献
16.
频繁模式挖掘是数据库挖掘中的一个十分重要的组成部分,然而以前的许多研究都是基于Apriori的产生候选集的测试迭代方法。这些方法普遍存在需要多次扫描数据库,对产生的大量候选集进行迭代测试的缺陷,尤其是对于挖掘长模式时这种缺陷就尤为突出。FP-growth方法采用分而治之的策略,只需对数据库进行二次扫描,而且避免了产生大量候选集的问题。文中的基于SQL的频繁模式挖掘方法既是在此基础上提出的,采用子查询及DBMS扩展技术(如用户定义函数等)对该方法进行了改进。 相似文献
17.
基于FP-tree的最大频繁模式挖掘算法是目前较为高效的频繁模式挖掘算法,针对这些算法需要递归生成条件FP-tree、产生大量候选最大频繁项集等问题,在分析FPMax、DMFIA算法的基础上,提出基于降维的最大频繁模式挖掘算法(BDRFI)。该算法改传统的FP-tree为数字频繁模式树DFP-tree,提高了超集检验的效率;采用的预测剪枝策略减少了挖掘的次数;基于降低项集维度的挖掘方式,减少了候选项的数目,避免了递归地产生条件频繁模式树,提高了算法的效率。实验结果表明,BDRFI的效率是同类算法的2~8倍。 相似文献
18.
不产生候选的快速投影频繁模式树挖掘算法 总被引:8,自引:0,他引:8
1.概述近年来,对事务数据库、时序数据库和各种其它类型数据库中的频繁模式挖掘的研究越来越普及。许多先前的研究都是采用Apriori或类似的候选产生—检查迭代算法,使用候选项集来找频繁项集。这些算法都基于一种重要的反单调的Apriori性质:任何非频繁的(k—1)-项集都不可能是频繁k-项集的子集。因此,如果一个候选k-项集的(k—1)-子集不在频繁(k—1)-项集中,则该候选也不可能是频繁的,从而可 相似文献
19.
Incrementally fast updated frequent pattern trees 总被引:3,自引:0,他引:3
The frequent-pattern-tree (FP-tree) is an efficient data structure for association-rule mining without generation of candidate itemsets. It was used to compress a database into a tree structure which stored only large items. It, however, needed to process all transactions in a batch way. In real-world applications, new transactions are usually inserted into databases. In this paper, we thus attempt to modify the FP-tree construction algorithm for efficiently handling new transactions. A fast updated FP-tree (FUFP-tree) structure is proposed, which makes the tree update process become easier. An incremental FUFP-tree maintenance algorithm is also proposed for reducing the execution time in reconstructing the tree when new transactions are inserted. Experimental results also show that the proposed FUFP-tree maintenance algorithm runs faster than the batch FP-tree construction algorithm for handling new transactions and generates nearly the same tree structure as the FP-tree algorithm. The proposed approach can thus achieve a good trade-off between execution time and tree complexity. 相似文献
20.
基于FP树的全局最大频繁项集挖掘算法 总被引:12,自引:1,他引:12
挖掘最大频繁项集是多种数据挖掘应用了更新最大频繁候选项集集合,需要反复地扫描整个数据库,而且大部分算法是单机算法,全局最大频繁项集挖掘算法并不多见.为此提出MGMF算法,该算法利用FP-树结构,类似FP-树挖掘方法,一遍就可以挖掘出所有的最大频繁项集,并且超集检测非常简单、快捷.另外MGMF算法采用了分布式PDDM算法播报消息的思想,具有很好的拓展性和并行性.实验证明MGMF算法是有效可行的. 相似文献