首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于关联规则的数据挖掘技术综述   总被引:4,自引:0,他引:4  
阐述在数据挖掘领域中的四种常用的数据挖掘技术方法,以数据挖掘技术中的关联规则挖掘为基础,阐述关联规则挖掘的经典算法Apriori算法的基本思想。通过关联规则挖掘算法实验给出该算法的具体使用方法,总结该算法存在的不足。  相似文献   

2.
本文提供了在数据挖掘中的关联规则的分类方式,对一些典型算法进行了分析,最后提出了广义的关联规则挖掘算法AR_SET,利用集合“或”、“与”运算求解频集(FrequentItemset),提高了挖掘的效率和速度。  相似文献   

3.
基于哈希链结构的频繁模式挖掘   总被引:5,自引:0,他引:5  
研究工作者已经提出了一些频繁模式的挖掘算法,然而,已经提出的各种算法在不同的挖掘条件下,仍然存在一些不足之处。该文提出了一种改进的哈希链地址结构及一种新的数据挖掘算法,HCS-Mine,该算法采用哈希链结构,无需产生巨大的候选项集,且简单高效。  相似文献   

4.

The most time consuming process in discovering association rules is identifying the frequent patterns especially in the cases when the database contains long patterns. An algorithm called Flex for identifying frequent patterns especially efficient when the patterns are long is proposed by successive construction of the nodes lexicographic tree. The vertical counting strategy to facilitate fast discovery is used in support computation. The experimental result shows that Flex outperform Apriori, a well-known and widely used algorithm for patterns discovery.  相似文献   

5.
最大频繁项目集挖掘是多种数据挖掘应用研究的一个重要方面,最大频繁项目集的快速挖掘算法研究是当前研究的热点。传统的最大频繁项目集挖掘算法要多遍扫描数据库并产生大量的候选项目集。为此,该文提出了基于F-矩阵的最大频繁项目集快速挖掘算法FMMFIBFM,FMMFIBFM采用FP-tree的存储结构,仅须扫描数据库两遍且不产生候选频繁项目集,有效地提高了频繁项目集的挖掘效率。实验结果表明,FMMFIBFM算法是有效可行的。  相似文献   

6.
发现频繁项集是关联规则挖掘中最基本、最重要的问题.提出了一种基于二进制表示的频繁项集挖掘算法,并利用二进制的性质快速产生候选项集并计算其支持度.算法总体性能在一定程度上得到了提高.  相似文献   

7.
DRFP-tree: disk-resident frequent pattern tree   总被引:4,自引:3,他引:1  
Frequent itemset mining methods basically address time scalability and greatly rely on available physical memory. However, the size of real-world databases to be mined is exponentially increasing, and hence main memory size is a serious bottleneck of the existing methods. So, it is necessary to develop new methods that do not fully rely on physical memory; new methods that utilize the secondary storage in the mining process should be the target. This motivates the work described in this paper; we mainly propose (Disk Resident Frequent Pattern) DRFP-Growth as a disk based approach similar to FP-Growth. DRFP-growth uses DRFP-tree, which is treated exactly as FP-tree when constructed in main memory and gets into a modified structure when it turns into disk resident to overcome the main memory bottleneck. This way, we are able to mine for frequent itemsets from databases of arbitrary sizes without being restricted by the available physical memory. In other words, we initially try to mine the database using the original FP-growth; we expand into the secondary memory only if we run out of physical memory. So, DRFP-growth is very comparable to FP-growth for small databases and high support threshold values. On the other hand, using DRFP-growth, we are still able to mine huge databases for low support threshold values (the only limitation is the available secondary storage rather than physical memory). The reported test results demonstrate how the proposed approach succeeds for cases where main memory based approaches fail.  相似文献   

8.
数据库中关联规则的并行挖掘算法   总被引:2,自引:1,他引:1  
提出了数据库中挖掘关联规则的并行算法,探讨了相关的数据结构,并对算法进行了定性分析。该算法不仅适用于布尔型属性,而且也适用于非布尔型属性。  相似文献   

9.
Web Usage Mining is the application of data mining techniques to large web log databases in order to extract usage patterns. However, most of the previous studies on usage patterns discovery just focus on mining intra-transaction associations, i.e., the associations among items within the same user's transactions, m cross-transaction association rule describes the association relationships among different users' transactions. In this paper, the closure property of frequent itemsets, which can determine the complete set of all frequent items exactly and is usually much smaller than the latter, is used to mine cross-transaction association rules from web log databases. We give the basic notion of frequent cross-transaction closed itemsets and prove the related necessary theories. And an efficient algorithm, i.e. MFCCPS(Mining Frequent Cross-Transaction Closed Pageviews Sets), is designed and implemented. At last, an extensive experimental result on two synthetic datasets shows that our approach outperforms previous methods.  相似文献   

10.
Based on the analysis of current Intrusion Detection technologies, this paper introduces the Data Mining Technology to the Intrusion Detection System (IDS), and proposes system architecture as well as a pattern strategy of automatic update. By adopting the Data Mining Technology, the frequency patterns can be dug out from a lot of network events. So, effective examination rules can be discovered, which will be used to instruct the analysis of IDS network intrusion. Meanwhile, the usage of the pattern strategy of automatic update that adopts the ways of network real-time analysis intproves the efficiency and the veracity of the mining greatly. The integration of them will be effective in solving the problems of high misreport and false alerts rate in the traditional Intrusion Detection Systems.  相似文献   

11.
本文根据关联规则和分类规则的概念与表示形式,指出在关联规则挖掘过程中如果指定挖掘与一个确定的项相关联,那么就是分类规则挖掘了,论述了分类规则是特殊情况下的关联规则,并指出在这种特殊情况下,关联规则所具有的特征;然后根据这一论述,提出了一种在关联规则挖掘算法中利用限制条件概率分布来发现分类规则的算法。  相似文献   

12.
关联规则挖掘是最常用、最重要的数据挖掘任务之一,经典的关联规则挖掘算法有Apriori、FP-Growth、Eclat等。随着数据的爆炸式增长,传统的算法已不能适应大数据挖掘的需要,需要分布式、并行的关联规则挖掘算法来解决上述问题。MapReduce是一种流行的分布式并行计算模型,因其使用简单、伸缩性好、自动负载均衡和自动容错等优点,得到了广泛的应用。本文对已有的基于MapReduce计算模型的并行关联规则挖掘算法进行了分类和综述,对其各自的优缺点和适用范围进行了总结,并对下一步的研究进行了展望。  相似文献   

13.
随着数据库技术的不断发展及数据库管理系统的广泛应用,大型数据库系统己经在各行各业普及,数据库中存储的数据量急剧增大,数据挖掘便是从海量数据库中挖掘有效或重要信息的过程。关联规则挖掘是数据挖掘领域一个非常重要的研究课题,被广泛地应用于商业界、医疗保险、金融业、电信部门等。随着时间的推移,挖掘数据库的规模会发生不断变化,人们对数据的需求也会有所不同,因此如何从扩展数据库中高效地对已经推导出的关联规则进行更新具有非常重要的应用价值,这就是所谓的增量式挖掘关联规则的问题。  相似文献   

14.
关联规则挖掘研究   总被引:2,自引:0,他引:2  
介绍了关联规则挖掘的基本概念。提出了关联规则的分类方法.对一些典型算法进行了分析和评价。  相似文献   

15.
一种新的关联规则的高效挖掘算法   总被引:1,自引:2,他引:1  
在Apriori算法的基础上,提出了一种新的算法,该算法在运行过程中根据支持度来不断缩小原有事务数据库,同时采用了一种新的方法产生候选集,促进了关联规则挖掘中效率的提高。  相似文献   

16.
关联规则挖掘是数据挖掘研究中的一个重要方面,而其中一个重要问题是对挖掘出的规则的兴趣度的评估,过去的研究发现,在实际应用中往往很容易从数据源中挖掘出大量的规则,但这些规则中的大部分对用户来说是不感兴趣的,本文对规则的兴趣度度量的两个方面作了讨论:一个是主观兴趣度度量,另一个是客观兴趣度度量,最后介绍了如何利用模板进行挖掘有趣的规则。  相似文献   

17.
挖掘关联规则算法的优化处理   总被引:9,自引:0,他引:9  
在挖掘关联规则的执行过程中,早期循环生成最大项目集的过程是很重要的。文中提出基于哈希表的算法,对生成侯选项目集的过程进行了优化,尤其是对生成二维侯选项目集更是有效。由于在早期循环中,生成侯选项目集的势较小,使得能更有效地修剪数据库,从而减小了后期循环的计算代价,同时也减小了I/O请求。  相似文献   

18.
19.
关联规则挖掘向来是数据挖掘的一个重要领域,挖掘算法也层出不穷.本文在深入分析了FP树特性的基础上,改进了FP树构造过程,通过一次扫描事务数据库生成FP树.缩短了关联规则挖掘时间,提高了效率,实验验证了其有效性.  相似文献   

20.
Designing Templates for Mining Association Rules   总被引:3,自引:0,他引:3  
Current approaches to data mining usually address specific userrequests, while no general design criteria for the extraction of associationrules are available for the end-user. In this paper, we propose aclassification of association rule types, which provides a general frameworkfor the design of association rule mining applications. Based on theidentified association rule types, we introduce predefined templates as ameans to capture the user specification of mining applications. Furthermore,we propose a general language to design templates for the extraction ofarbitrary association rule types.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号