首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
关联规则挖掘Apriori算法的研究与改进   总被引:6,自引:1,他引:6  
关联规则挖掘是数据挖掘研究领域中的一个重要任务,旨在挖掘事务数据库中有趣的关联.Apriori算法是关联规则挖掘中的经典算法.然而Apriori算法存在着产生候选项目集效率低和频繁扫描数据等缺点.对Apriori算法的原理及效率进行分析,指出了一些不足,并且提出了改进的Apriori_LB算法.该算法基于新的数据结构,改进了产生候选项集的连接方法.在详细阐述了Apriori_LB算法后,对Apriori算法和Apriori_LB算法进行了分析和比较,实验结果表明改进的Apriori_LB算法优于Apriori算法,特别是对最小支持度较小或者项数较少的事务数据库进行挖掘时,效果更加显著.  相似文献   

2.
冯洁  陶宏才 《微计算机信息》2007,23(18):164-166
关联规则的发现是数据挖掘的一个重要方面,产生频繁项集是其中一个关键步骤。提出了一种基于十字链表快速挖掘频繁项集的算法,该算法只需扫描一次数据库,充分利用已有信息产生频繁项集,无需存储候选项集。通过与其它一些算法比较,说明该算法有更好的性能。  相似文献   

3.
关联规则的发现是数据挖掘中的一个重要问题,其核心是频繁模式的挖掘,通常采用的APriori算法要多次扫描数据库并产生大量的候选项集,开销很大。本文采用基于布尔矩阵关联挖掘的算法,只需扫描一次数据库而且不需要链接产生候选项集,从而提高算法的效率。并通过实例说明了它是一种有效的关联规则挖掘方法。  相似文献   

4.
针对Apriori算法中I/O负载大和减枝过程中生成大量中间结果两个性能瓶颈问题,提出了一种事务矩阵和项集矩阵的Apriori改进算法.算法的基本思想是:扫描数据库生成事务矩阵,通过事务矩阵和项集矩阵之间的运算代替Apriori算法中的数据库扫描得到频繁项集,减少I/O负载,加快候选项集的验证速度;通过对频繁项集矩阵的操作,减少生成候选频繁项集的数目,避免Apriori算法减枝步骤中对候选项集的分解和判断.通过仿真验证了改进算法的有效性.  相似文献   

5.
虽然FP-Growth算法能够有效地从数据库中挖掘频繁模式,但如何由其挖掘出的频繁模式中高效地产生关联规则仍是一个相当复杂的问题。该文提出了用于组织频繁模式的线索频繁模式树(TFPT)和一个从TFPT中挖掘关联规则的高效算法—最短模式优先算法(SPF)。挖掘模式Y的关联规则时,SPF算法应用了两个优化策略,避免了对大量的不可能成为规则XY-X左部的Y的子集的检查,从而获得了很好的性能。实验表明:与类FP-Growth算法结合时,SPF算法运行速度远远快于Apriori算法,并有相当好的可伸缩性。  相似文献   

6.
马慧  汤庸  潘炎 《计算机工程》2006,32(17):132-134
随着各种形式的数据的迅速增长,业务数据中的时态信息挖掘问题受到人们普遍关注。该文提出了一种带有效时间区间的时态关联规则,给出了一种基于FP-树的挖掘方法。该方法利用分区挖掘的思想,以分区为单位表示项集的有效时间区间,并为每个分区构建FP-树,大大简化了对某个项集在其有效时间区间中的出现次数的计算,从而更有效地计算时态置信度。最后用一个例子对该方法的执行过程进行了阐述。  相似文献   

7.
挖掘关联规则是数据挖掘领域的一个重要研究方向,人们已经提出了许多用于发现数据库中关联规则的算法,但对关联规则的增量维护问题的研究较少.深入分析了增量更新情况,使用了目前较高效的最大频繁模式挖掘算法FP-Max,并对其进行改进.基本思想:①基于FP-树;②考虑了数据集中,数据增加情况下FP-树的更新;③对FP-Max算法进行改进来更新、维护已经挖掘出来的最大频繁模式.  相似文献   

8.
基于数组的关联规则挖掘算法   总被引:12,自引:0,他引:12  
孟祥萍  钱进  刘大有 《计算机工程》2003,29(15):98-99,109
提高频繁项集挖掘算法的效率是关联规则挖掘研究的一个重点领域。文章提出了基于数组的关联规则挖掘算法,只需要扫描数据库1次,通过不断减少数据库中的事务个数,并且利用一维数组对候选2-项集进行计数来提高挖掘效率。实验表明,该文所提出的算法效率比经典Apriori算法快2~3倍。  相似文献   

9.
一种直接在Trans-树中挖掘频繁模式的新算法   总被引:6,自引:1,他引:5  
范明  王秉政 《计算机科学》2003,30(8):117-120
Frequent pattern mining plays an essential role in many important data mining tasks. FP-growth is a very efficient algorithm for frequent pattern mining. However, it still suffers from creating conditional FP-tree separately and recursively during the mining process. In this paper, we propose a new algorithm, called Least-Item-First Pat-tern Growth (LIFPG), for mining frequent patterns. LIFPG mines frequent patterns directly in Trans-tree withoutusing any additional data structures. The key idea is that least items are always considered first when the current pat-tern growth. By this way, conditional sub-tree can be created directly in Trans-tree by adjusting node-links and re-counting counts of some nodes. Experiments show that, in comparison with FP-Growth, our algorithm is about fourtimes faster and saves half of memory;it also has good time and space scalability with the number of transactions,and has an excellent performance in dense dataset mining as well.  相似文献   

10.
基于索引数组和复合频繁模式树的频繁闭项集挖掘算法   总被引:1,自引:0,他引:1  
频繁闭项集惟一确定频繁项集且规模小得多.CROP是一种基于复合频繁模式树的、频繁闭项集高效挖掘算法,但存在着候选结点过多的问题.这些非闭合结点的生成、检查和剪裁带来了大量不必要的操作.提出了一种改进的频繁闭项集挖掘算法CROP_Index.该算法用"索引数组"来组织数据,找到频繁共同出现的项集.基于二进制位图,给出了一个包含索引的计算方法,并利用索引启发信息合并,得到复合型频繁模式树的初始结点;同时给出一些新的性质,使得改进的算法只生成闭合结点,从而节省了大量不必要的操作,缩小了搜索空间.实验结果表明该算法效率较高.  相似文献   

11.
Temporal regularity of itemset appearance can be regarded as an important criterion for measuring the interestingness of itemsets in several applications. A frequent itemset can be said to be regular-frequent in a database if it appears at a regular period. Therefore, the problem of mining a complete set of regular-frequent itemsets requires the specification of a support and a regularity threshold. However, in practice, it is often difficult for users to provide an appropriate support threshold. In addition, the use of a support threshold tends to produce a large number of regular-frequent itemsets and it might be better to ask for the number of desired results. We thus propose an efficient algorithm for mining top-k regular-frequent itemsets without setting a support threshold. Based on database partitioning and support estimation techniques, the proposed algorithm also uses a best-first search strategy with only one database scan. We then compare our algorithm with the state-of-the-art algorithms for mining top-k regular-frequent itemsets. Our experimental studies on both synthetic and real data show that our proposal achieves high performance for small and large values of k.  相似文献   

12.
用VB对基于Apriori算法的数据挖掘的实现   总被引:4,自引:0,他引:4  
于卫红 《计算机工程》2004,30(2):196-196,F003
Apriori算法是一种最有影响的挖掘关联规则频繁项集的算法。文章以100期彩票的开奖结果作为挖掘对象,利用该算法从中找出相对频繁出现的数字组合,并用VB进行了程序实现。  相似文献   

13.
本文通过对关联规则挖掘中由候选项集生成频繁项集算法的分析.引入了格论的一些思想来改进算法,其中心思想是:通过在属性集和事务数据库的基础上进行建格,然后在格的基础上直接进行规则提取。在实验的基础上对Apriori算法和改进的算法进行了比较,实验结果表明.在特定的数据库中,改进的算法在挖掘效率上优于Apriori算法。  相似文献   

14.
The FP-growth algorithm using the FP-tree has been widely studied for frequent pattern mining because it can dramatically improve performance compared to the candidate generation-and-test paradigm of Apriori. However, it still requires two database scans, which are not consistent with efficient data stream processing. In this paper, we present a novel tree structure, called CP-tree (compact pattern tree), that captures database information with one scan (insertion phase) and provides the same mining performance as the FP-growth method (restructuring phase). The CP-tree introduces the concept of dynamic tree restructuring to produce a highly compact frequency-descending tree structure at runtime. An efficient tree restructuring method, called the branch sorting method, that restructures a prefix-tree branch-by-branch, is also proposed in this paper. Moreover, the CP-tree provides full functionality for interactive and incremental mining. Extensive experimental results show that the CP-tree is efficient for frequent pattern mining, interactive, and incremental mining with a single database scan.  相似文献   

15.
关联规则挖掘算法FP-Growth虽然效率比Apriori要快一个数量级,但存在频繁模式树可能过大而内存无法容纳和数据挖掘过程串行处理等两大缺点。提出一种分布式并行关联规则挖掘算法,该算法针对分布式应用数据架构,不需要产生全局FPtree,避免全局FP-tree可能过大而内存无法容纳的问题,算法在各个主要步骤上都实现了并行处理。算法测试结果和分析表明,与传统的关联规则挖掘算法FP-Growth相比,该算法通过多节点分布式并行处理显著提高了执行效率和处理能力。  相似文献   

16.
A core issue of the association rule extracting process in the data mining field is to find the frequent patterns in the database of operational transactions. If these patterns discovered, the decision making process and determining strategies in organizations will be accomplished with greater precision. Frequent pattern is a pattern seen in a significant number of transactions. Due to the properties of these data models which are unlimited and high-speed production, these data could not be stored in memory and for this reason it is necessary to develop techniques that enable them to be processed online and find repetitive patterns. Several mining methods have been proposed in the literature which attempt to efficiently extract a complete or a closed set of different types of frequent patterns from a dataset. In this paper, a method underpinned upon Cellular Learning Automata (CLA) is presented for mining frequent itemsets. The proposed method is compared with Apriori, FP-Growth and BitTable methods and it is ultimately concluded that the frequent itemset mining could be achieved in less running time. The experiments are conducted on several experimental data sets with different amounts of minsup for all the algorithms as well as the presented method individually. Eventually the results prod to the effectiveness of the proposed method.  相似文献   

17.
    
In this paper, we introduce item-centric mining, a new semantics for mining long-tailed datasets. Our algorithm, TopPI, finds for each item its top-k most frequent closed itemsets. While most mining algorithms focus on the globally most frequent itemsets, TopPI guarantees that each item is represented in the results, regardless of its frequency in the database.TopPI allows users to efficiently explore Web data, answering questions such as “what are the k most common sets of songs downloaded together with the ones of my favorite artist?”. When processing retail data consisting of 55 million supermarket receipts, TopPI finds the itemset “milk, puff pastry” that appears 10,315 times, but also “frangipane, puff pastry” and “nori seaweed, wasabi, sushi rice” that occur only 1120 and 163 times, respectively. Our experiments with analysts from the marketing department of our retail partner demonstrate that item-centric mining discover valuable itemsets. We also show that TopPI can serve as a building-block to approximate complex itemset ranking measures such as the p-value.Thanks to efficient enumeration and pruning strategies, TopPI avoids the search space explosion induced by mining low support itemsets. We show how TopPI can be parallelized on multi-cores and distributed on Hadoop clusters. Our experiments on datasets with different characteristics show the superiority of TopPI when compared to standard top-k solutions, and to Parallel FP-Growth, its closest competitor.  相似文献   

18.
频繁项集挖掘是数据挖掘中的一个经典的问题。然而,大部分算法需要扫描数据库多次,算法效率比较低。该文提出了一个效率比较好的挖掘频繁项集的新算法,在这个算法中,所有的事务都是以二进制的形式表示,所以挖掘极大频繁项集的任务就变成了从二进制集中发现频繁模式。而且,这种算法只需要扫描原始数据库一次。最后,利用试验来证明这种算法的效率和优势。  相似文献   

19.
In recent years, data stream mining has become an important research topic. With the emergence of new applications, the data we process are not again static, but the continuous dynamic data stream. Examples include network traffic analysis, Web click stream mining, network intrusion detection, and on-line transaction analysis. In this paper, we propose a new framework for data stream mining, called the weighted sliding window model. The proposed model allows the user to specify the number of windows for mining, the size of a window, and the weight for each window. Thus users can specify a higher weight to a more significant data section, which will make the mining result closer to user’s requirements. Based on the weighted sliding window model, we propose a single pass algorithm, called WSW, to efficiently discover all the frequent itemsets from data streams. By analyzing data characteristics, an improved algorithm, called WSW-Imp, is developed to further reduce the time of deciding whether a candidate itemset is frequent or not. Empirical results show that WSW-Imp outperforms WSW under the weighted sliding window model.  相似文献   

20.
在关联规则挖掘中,主要的问题是如何高效地产生频繁项集。对近年来一些基于十字链表的Apriori算法进行研究和分析,发现它们的候选频繁项集生成方法有很大的改进空间。提出一个基于十字链表的改进算法,优化候选频繁项集的生成方法,减少对事务数据库的扫描,大大提高了挖掘效率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号