共查询到18条相似文献,搜索用时 143 毫秒
1.
2.
通过引入项目集投影的概念和计数因子的概念,提出了基于隐私保护的跨表关联规则挖掘算法.该算法首先自下而上,逐层求出跨表大项目集在单表内的投影及投影的计数集,然后按照支持数计算协议,通过投影的计数集计算大跨表项目集的支持率,而不泄露各原始表的数据信息.算法面向大数据库,考虑表间以语义相关属性表示的一般关系.实验表明,算法是有效的. 相似文献
3.
4.
无重复投影数据库扫描的序列模式挖掘算法 总被引:5,自引:0,他引:5
序列模式挖掘在Web点击流分析、自然灾害预测、DNA和蛋白质序列模式发现等领域有着广泛应用.基于频繁模式增长的PrefixSpan是目前性能最好的序列模式挖掘算法之一.然而在密数据集和长序列模式挖掘过程中会出现大量的重复投影数据库,使得这类算法性能下降.算法SPMDS通过对投影数据库的伪投影做单项杂凑函数,如MD5等,检查是否存在重复的投影数据库,避免大量重复数据库的扫描,并采用一些必要条件简化投影数据库的搜索,进而提高算法的性能.实验和分析都表明SPMDS性能优于PrefixSpan. 相似文献
5.
6.
7.
序列模式挖掘是从序列数据库中挖掘相对时间或其他模式出现频率高的模式。针对PrefixSpan算法构造投影数据库时开销巨大、扫描效率不高的问题,通过以序列扩展代替项集进行扩展、放弃挖掘序列数小于阈值min_support的投影数据库以及直接递归局部频繁项等方式进行改进,并将改进方法应用于Web用户行为模式挖掘中,对日志记录中的规律进行分析和研究。实验分析表明,相比PrefixSpan算法,该改进算法在算法效率方面有一定的提高。 相似文献
8.
9.
VF算法在化学结构检索中的应用 总被引:5,自引:3,他引:2
实现了一种基于属性关系图的二维子结构匹配算法 (VF算法),该算法运行时所需的存储空间较小,适合于处 理大批量数据。算法程序用Java语言实现,在NCI开放数据库中经过了3DFS程序的检验,并在已建立的化学结构数 据库中作为子结构检索的工具得到了应用。 相似文献
10.
为对现有的高性能正则表达式匹配算法进行综合比较与分析,实现诸如DFA、D2FA、CD2FA、mDFA及XFA等最新算法,采用Snort规则集综合评估这些算法的存储空间和匹配时间。实验结果表明,在存储空间方面,与mDFA相比,XFA的存储空间减少84.9%~89.9%;在匹配效率方面,与mDFA相比,XFA的匹配时间增加了38.9%~174.6%;XFA在存储空间和匹配效率上具有良好的可伸缩性,即当规则数增加到8倍时,mDFA的存储空间增长了64倍,而XFA的存储空间仅增加了16倍,匹配时间仅增加了61.3%。 相似文献
11.
12.
In this paper, we deal with mining sequential patterns in multiple time sequences. Building on a state-of-the-art sequential
pattern mining algorithm PrefixSpan for mining transaction databases, we propose MILE (MIning in muLtiple sEquences), an efficient algorithm to facilitate the mining process. MILE recursively utilizes the knowledge of existing patterns
to avoid redundant data scanning, and therefore can effectively speed up the new patterns’ discovery process. Another unique
feature of MILE is that it can incorporate prior knowledge of the data distribution in time sequences into the mining process
to further improve the performance. Extensive empirical results show that MILE is significantly faster than PrefixSpan. As
MILE consumes more memory than PrefixSpan, we also present a solution to trade time efficiency in memory constrained environments.
相似文献
Xingquan ZhuEmail: |
13.
14.
Mining sequential patterns by pattern-growth: the PrefixSpan approach 总被引:12,自引:0,他引:12
Jian Pei Jiawei Han Mortazavi-Asl B. Jianyong Wang Pinto H. Qiming Chen Dayal U. Mei-Chun Hsu 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(11):1424-1440
Sequential pattern mining is an important data mining problem with broad applications. However, it is also a difficult problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Most of the previously developed sequential pattern mining methods, such as GSP, explore a candidate generation-and-test approach [R. Agrawal et al. (1994)] to reduce the number of candidates to be examined. However, this approach may not be efficient in mining large sequence databases having numerous patterns and/or long patterns. In this paper, we propose a projection-based, sequential pattern-growth approach for efficient mining of sequential patterns. In this approach, a sequence database is recursively projected into a set of smaller projected databases, and sequential patterns are grown in each projected database by exploring only locally frequent fragments. Based on an initial study of the pattern growth-based sequential pattern mining, FreeSpan [J. Han et al. (2000)], we propose a more efficient method, called PSP, which offers ordered growth and reduced projected databases. To further improve the performance, a pseudoprojection technique is developed in PrefixSpan. A comprehensive performance study shows that PrefixSpan, in most cases, outperforms the a priori-based algorithm GSP, FreeSpan, and SPADE [M. Zaki, (2001)] (a sequential pattern mining algorithm that adopts vertical data format), and PrefixSpan integrated with pseudoprojection is the fastest among all the tested algorithms. Furthermore, this mining methodology can be extended to mining sequential patterns with user-specified constraints. The high promise of the pattern-growth approach may lead to its further extension toward efficient mining of other kinds of frequent patterns, such as frequent substructures. 相似文献
15.
传统数据挖掘算法在处理海量数据集时计算能力有限。为解决该问题,提出一种基于Map Reduce的分布式序列模式挖掘算法MR-PrefixSpan。在PrefixSpan算法的基础上,对模式挖掘任务进行分割,利用Map函数处理由不同前缀得到的序列模式,并行构造投影数据库,从而提高挖掘效率及简化搜索空间。采用Reduce函数对中间结果进行规约,得到全局序列模式。在Hadoop集群上的实验结果表明,MR-PrefixSpan能减少数据库扫描时间,具有较高的并行加速比和较好的可扩展性。 相似文献
16.
17.
PrefixSpan算法与CloSpan算法的分析与研究 总被引:1,自引:0,他引:1
数据挖掘领域的一个活跃分支就是序列模式的发现,即在序列数据库中找出所有的频繁子序列.介绍序列模式挖掘的基本概念,然后对序列模式中的经典算法PrefixSpan算法和基于PrefixSpan框架的闭合序列模式CloSpan算法进行了描述,并对它们的执行过程及其特点进行了分析与比较,总结了各自的优缺点,指出PrefixSpan算法适用于短序列方面挖掘,而CloSpan算法在长序列或者阈值较低时胜过PrefixSpan算法且CloSpan算法挖掘大型的数据库有更好的性能,得出的结果对序列模式挖掘的设计有重要的参考价值. 相似文献