共查询到20条相似文献,搜索用时 15 毫秒
1.
基于关联规则的数据挖掘技术综述 总被引:4,自引:0,他引:4
阐述在数据挖掘领域中的四种常用的数据挖掘技术方法,以数据挖掘技术中的关联规则挖掘为基础,阐述关联规则挖掘的经典算法Apriori算法的基本思想。通过关联规则挖掘算法实验给出该算法的具体使用方法,总结该算法存在的不足。 相似文献
2.
3.
基于哈希链结构的频繁模式挖掘 总被引:5,自引:0,他引:5
研究工作者已经提出了一些频繁模式的挖掘算法,然而,已经提出的各种算法在不同的挖掘条件下,仍然存在一些不足之处。该文提出了一种改进的哈希链地址结构及一种新的数据挖掘算法,HCS-Mine,该算法采用哈希链结构,无需产生巨大的候选项集,且简单高效。 相似文献
4.
《国际计算机数学杂志》2012,89(8):967-976
The most time consuming process in discovering association rules is identifying the frequent patterns especially in the cases when the database contains long patterns. An algorithm called Flex for identifying frequent patterns especially efficient when the patterns are long is proposed by successive construction of the nodes lexicographic tree. The vertical counting strategy to facilitate fast discovery is used in support computation. The experimental result shows that Flex outperform Apriori, a well-known and widely used algorithm for patterns discovery. 相似文献
5.
杨萍 《计算机工程与应用》2003,39(34):197-200
最大频繁项目集挖掘是多种数据挖掘应用研究的一个重要方面,最大频繁项目集的快速挖掘算法研究是当前研究的热点。传统的最大频繁项目集挖掘算法要多遍扫描数据库并产生大量的候选项目集。为此,该文提出了基于F-矩阵的最大频繁项目集快速挖掘算法FMMFIBFM,FMMFIBFM采用FP-tree的存储结构,仅须扫描数据库两遍且不产生候选频繁项目集,有效地提高了频繁项目集的挖掘效率。实验结果表明,FMMFIBFM算法是有效可行的。 相似文献
6.
发现频繁项集是关联规则挖掘中最基本、最重要的问题.提出了一种基于二进制表示的频繁项集挖掘算法,并利用二进制的性质快速产生候选项集并计算其支持度.算法总体性能在一定程度上得到了提高. 相似文献
7.
DRFP-tree: disk-resident frequent pattern tree 总被引:4,自引:3,他引:1
Frequent itemset mining methods basically address time scalability and greatly rely on available physical memory. However,
the size of real-world databases to be mined is exponentially increasing, and hence main memory size is a serious bottleneck
of the existing methods. So, it is necessary to develop new methods that do not fully rely on physical memory; new methods
that utilize the secondary storage in the mining process should be the target. This motivates the work described in this paper;
we mainly propose (Disk Resident Frequent Pattern) DRFP-Growth as a disk based approach similar to FP-Growth. DRFP-growth uses DRFP-tree, which is treated exactly as
FP-tree when constructed in main memory and gets into a modified structure when it turns into disk resident to overcome the
main memory bottleneck. This way, we are able to mine for frequent itemsets from databases of arbitrary sizes without being
restricted by the available physical memory. In other words, we initially try to mine the database using the original FP-growth;
we expand into the secondary memory only if we run out of physical memory. So, DRFP-growth is very comparable to FP-growth
for small databases and high support threshold values. On the other hand, using DRFP-growth, we are still able to mine huge
databases for low support threshold values (the only limitation is the available secondary storage rather than physical memory).
The reported test results demonstrate how the proposed approach succeeds for cases where main memory based approaches fail. 相似文献
8.
数据库中关联规则的并行挖掘算法 总被引:2,自引:1,他引:1
提出了数据库中挖掘关联规则的并行算法,探讨了相关的数据结构,并对算法进行了定性分析。该算法不仅适用于布尔型属性,而且也适用于非布尔型属性。 相似文献
9.
Jian Chen Jian Yin Jin Huang Liangyi Ou 《通讯和计算机》2005,2(5):6-11,81
Web Usage Mining is the application of data mining techniques to large web log databases in order to extract usage patterns. However, most of the previous studies on usage patterns discovery just focus on mining intra-transaction associations, i.e., the associations among items within the same user's transactions, m cross-transaction association rule describes the association relationships among different users' transactions. In this paper, the closure property of frequent itemsets, which can determine the complete set of all frequent items exactly and is usually much smaller than the latter, is used to mine cross-transaction association rules from web log databases. We give the basic notion of frequent cross-transaction closed itemsets and prove the related necessary theories. And an efficient algorithm, i.e. MFCCPS(Mining Frequent Cross-Transaction Closed Pageviews Sets), is designed and implemented. At last, an extensive experimental result on two synthetic datasets shows that our approach outperforms previous methods. 相似文献
10.
XIA Hongxia SHEN Qi HAO Rui 《通讯和计算机》2005,2(3):29-33,55
Based on the analysis of current Intrusion Detection technologies, this paper introduces the Data Mining Technology to the Intrusion Detection System (IDS), and proposes system architecture as well as a pattern strategy of automatic update. By adopting the Data Mining Technology, the frequency patterns can be dug out from a lot of network events. So, effective examination rules can be discovered, which will be used to instruct the analysis of IDS network intrusion. Meanwhile, the usage of the pattern strategy of automatic update that adopts the ways of network real-time analysis intproves the efficiency and the veracity of the mining greatly. The integration of them will be effective in solving the problems of high misreport and false alerts rate in the traditional Intrusion Detection Systems. 相似文献
11.
12.
关联规则挖掘是最常用、最重要的数据挖掘任务之一,经典的关联规则挖掘算法有Apriori、FP-Growth、Eclat等。随着数据的爆炸式增长,传统的算法已不能适应大数据挖掘的需要,需要分布式、并行的关联规则挖掘算法来解决上述问题。MapReduce是一种流行的分布式并行计算模型,因其使用简单、伸缩性好、自动负载均衡和自动容错等优点,得到了广泛的应用。本文对已有的基于MapReduce计算模型的并行关联规则挖掘算法进行了分类和综述,对其各自的优缺点和适用范围进行了总结,并对下一步的研究进行了展望。 相似文献
13.
Feng Song 《数字社区&智能家居》2008,(Z1)
随着数据库技术的不断发展及数据库管理系统的广泛应用,大型数据库系统己经在各行各业普及,数据库中存储的数据量急剧增大,数据挖掘便是从海量数据库中挖掘有效或重要信息的过程。关联规则挖掘是数据挖掘领域一个非常重要的研究课题,被广泛地应用于商业界、医疗保险、金融业、电信部门等。随着时间的推移,挖掘数据库的规模会发生不断变化,人们对数据的需求也会有所不同,因此如何从扩展数据库中高效地对已经推导出的关联规则进行更新具有非常重要的应用价值,这就是所谓的增量式挖掘关联规则的问题。 相似文献
14.
15.
16.
17.
挖掘关联规则算法的优化处理 总被引:9,自引:0,他引:9
在挖掘关联规则的执行过程中,早期循环生成最大项目集的过程是很重要的。文中提出基于哈希表的算法,对生成侯选项目集的过程进行了优化,尤其是对生成二维侯选项目集更是有效。由于在早期循环中,生成侯选项目集的势较小,使得能更有效地修剪数据库,从而减小了后期循环的计算代价,同时也减小了I/O请求。 相似文献
18.
19.
李涛 《数字社区&智能家居》2007,(18)
关联规则挖掘向来是数据挖掘的一个重要领域,挖掘算法也层出不穷.本文在深入分析了FP树特性的基础上,改进了FP树构造过程,通过一次扫描事务数据库生成FP树.缩短了关联规则挖掘时间,提高了效率,实验验证了其有效性. 相似文献
20.
Designing Templates for Mining Association Rules 总被引:3,自引:0,他引:3
Current approaches to data mining usually address specific userrequests, while no general design criteria for the extraction of associationrules are available for the end-user. In this paper, we propose aclassification of association rule types, which provides a general frameworkfor the design of association rule mining applications. Based on theidentified association rule types, we introduce predefined templates as ameans to capture the user specification of mining applications. Furthermore,we propose a general language to design templates for the extraction ofarbitrary association rule types. 相似文献