首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 93 毫秒
1.
鉴于频繁项集存在数据和模式冗余的问题,挖掘数据流最大频繁项集的算法引起了极大的关注,本文提出了一种挖掘数据流滑动窗口内最大频繁项集算法——MMFI-SW算法。该算法首先使用类似FP-tree的数据结构记录最新到达的数据流信息,同时删除过时的数据和大量的不频繁项目,然后设计一个创新的方法有效地从数据流滑动窗口中输出最大频繁项集。理论分析与实验结果表明,MMFI-SW算法具有较低的时间复杂度。  相似文献   

2.
丁邦旭 《硅谷》2012,(5):152-153
数据流的特点要求挖掘算法只能经过一次扫描获得挖掘结果,并且要求较低的空间复杂度。结合数据流的特点,提出一种基于滑动窗口的数据流频繁项集挖掘新算法MFIM。该算法采用二进制向量矩阵表示滑动窗口中的事务序列,以这种新的结构来记录频繁项集的动态变化,有效地挖掘数据流频繁项集。理论分析与实验结果表明该算法能获得较好的时间复杂度与空间复杂度。  相似文献   

3.
针对用于数据流频繁项集挖掘的现有方法存在引入过多次频繁项集以及时空性能与输出精度较低的问题,利用Chebyshev不等式,构造了项集频度周期采样的概率误差边界,给出了动态检测项集支持度变化方法.提出了一种基于周期采样的数据流频繁项集挖掘算法FI-PS,该算法通过跟踪项集支持度变化确定项集支持度的稳定性,并以此作为调整窗口大小以及采样周期的依据,从而以一个较大的概率保证项集支持度误差有上界.理论分析及实验证明该算法有效,在保证挖掘结果准确度相对较好的条件下,可获得较优执行性能.  相似文献   

4.
王希馗 《硅谷》2011,(10):191-192,157
利用Apriori算法和FP-growth算法挖掘密集型数据集的全部频繁项集代价高昂,针对该问题提出一种基于链表数组的关联规则挖掘算法,该方法使用链表数组为每个项目建立事务链表,只需要扫描数据库1次,就能够快速得到每个候选项的支持度,从而有效的发现频繁项集。通过与经典算法分析对比表明,该算法具有较快的挖掘速度。  相似文献   

5.
为改进基于数据库垂直表示的频繁项集挖掘算法的性能,给出了用索引数组方法来改进计算性能的思路.提出了索引数组的概念及其计算方法,并提出了一种新的高效的频繁项集挖掘算法Index-FIMiner.该算法大大减少了不必要的tidset求交及相应的频繁性判断操作,同时也论证了代表项可直接与其包含索引中的所有项集的组合进行连接,这些结果项集的支持度均与代表项的支持度相等,从而降低了这些频繁项集的处理代价,提高了算法的性能.实验结果表明,Index-FIMiner算法具有较高的挖掘效率.  相似文献   

6.
对于连续频繁访问路径的挖掘如果采用常见的序列模式挖掘算法, 挖掘效率是比较低的, 而且只能得到频繁访问路径. 本文在研究访问路径性质的基础上给出了一种能从普通 Web 日志中挖掘出连续频繁访问路径的算法. 设计了一种新颖的数据结构压缩存储空间及存储所需挖掘信息. 同时采用分区搜索的方式, 为每个频繁节点构造一棵后缀树, 通过遍历该后缀树挖掘出连续频繁访问路径. 采用这种方法进行挖掘, 无需生成候选集, 而且一次就可以挖掘出所有以根节点为后缀的连续频繁访问路径.  相似文献   

7.
Apriori算法是当前使用最广泛的关联规则挖掘方法中最为经典的算法之一;但是该算法需要反复的扫描数据库,在I/O上花消很大,并且在得到频繁-2项集的过程中会产生庞大的候选-2项集,其次在筛选得到频繁-k项集时,并没排除那些不应该参组合的元素,而导致该算法效率很低,针对上面影响计算效率的三个方面提出基于压缩事务矩阵相乘得到频繁项目集的算法,只需一次扫描数据库,经过压缩处理产生产生事务矩阵,通过矩阵间运算得到频繁项目集,有效提高了关联规则的挖掘效率。  相似文献   

8.
针对装备认知测试性智能决策问题,提出基于云和频繁项集的认知测试性诊断方案权衡优化方法。研究装备认知测试性中信息流在定性域和定量域的描述和转换方法,给出基于数据概要的中心云产生方法,实现事务数据清洗与筛选;研究基于频繁项集和新增项集的数据挖掘方法,提出基于2-范数及协方差的数据相关性分析方法,实现基于云和频繁项集的认知测试性诊断方案权衡优化的数据挖掘过程;得到基于存储层-云层-应用层-决策层的认知测试性仿真诊断与权衡优化模型,并对该模型进行补充说明。该方案可为装备认知测试性诊断方案权衡优化的智能化发展奠定基础。  相似文献   

9.
针对生物网络中频繁子图的挖掘问题,提出了一种基于FP-树结构的MaxFP算法.此算法以代谢路径作为研究对象,在适合于生物网络图简化模型的基础上,采用一种不产生候选集的改进FP-growth算法挖掘生物网络中的闭合频繁子图.此算法考虑了基于频繁项目集的算法应用于网络的缺陷,根据生物网络的特点对FP-growth算法进行了改进.实验证明,提出的MaxFP算法比基于Apriori的频繁模式挖掘算法运行速度快,不仅能挖掘出最大的频繁子图,且能找到更多具有生物意义的频繁子图.  相似文献   

10.
袁鸿雁 《硅谷》2010,(5):70-70,39
在数据挖掘研究中,关联规则挖掘作为数据挖掘研究中的一个重要部分,引起越来越多的关注。因此,主要研究关联规则挖掘,首先介绍关联规则挖掘的一些基础知识、概念描述等,然后对关联规则挖掘的常用算法进行分类探讨,最后分析其中的几种典型算法。  相似文献   

11.
Frequent pattern mining is the most important phase of association rule mining process because of its time and space complexity. Several methods have attempted to improve the performance of association rule mining by enhancing frequent pattern mining efficiency. Due to the large size of the data-sets and huge amounts of data which should be mined, many parallel and distributed mining approaches have been introduced to divide data-sets or to distribute mining processes between multiple processors or computers and thus, improve the efficiency of the mining process. In this paper, we propose a hadoop-based parallel implementation of PrePost+ algorithm for frequent itemset mining. In our parallel approach, the process of constructing N-Lists of itemsets has been distributed between the mappers and the operation of the final pruning process and extracting frequent itemsets has been carried out by reducers in a map-reduce parallel programming model. The experimental results show that our hadoop-based PrePost+(HBPrePost+) algorithm outperforms one of the best existing parallel methods of frequent itemset mining (PARMA) in terms of execution time.  相似文献   

12.
Uncertain frequent pattern mining has been much discussed in recent decades. It is widely used in various fields and helps analysts to comprehend the deep meaning of collected data from the frequencies of items. In past studies, researchers have focused on discrete models. However, a discrete model only explains the presence of combinations of items without giving specific data intervals. To compensate for the drawbacks of discrete models, we focus on continuous uncertain data and improve a continuous uncertain frequent tree for the extraction of frequent patterns, notably time costs. Attribute overlapping usually causes the high time cost in the extraction phase. To avoid long branches in the tree, two approaches are proposed. The first approach is to name each attribute at given level with an uncertain frequent pattern. By using links and reshaping the uncertain frequency tree, the number of combinations decreases. The second approach is called uncertain frequent pattern map transforming. It uses a discrete transformation to decrease the time cost. In experiments, our two approaches were compared with different mainstream approaches. According to the results, our approaches not only cost less time to explore frequent patterns but also exhibited high accuracy for continuous uncertain data.  相似文献   

13.
《中国工程学刊》2012,35(5):547-554
Development of least association rules (ARs) mining algorithms is one of the more challenging areas in data mining. Exclusive measurements, complexity and excessive computational cost are the main obstacles as compared to frequent pattern mining. Indeed, most previous studies still use the Apriori-like algorithms. To address this issue, this article proposes a new correlation measurement called definite factor (DF) and a scalable trie-based algorithm named significant least pattern growth (SLP-Growth). This algorithm generates the least patterns based on interval support and finally determines it significances using DF. Experiments with the real datasets show that the SLP-Growth can discover highly positive correlated and significant of least ARs. Indeed, it also outperforms the fast frequent pattern-Growth algorithm up to two times, thus verifying its efficiency.  相似文献   

14.
多关系频繁模式发现能够直接从复杂结构化数据中发现涉及多个关系的复杂频繁模式,避免了传统方法的局限。有别于主流基于归纳逻辑程序设计技术的方法,提出了基于合取查询包含关系的面向语义的精简化多关系频繁模式发现方法,具有理论与技术基础的新颖性,解决了两种语义冗余问题。实验表明,该方法在可理解性、功能、效率以及可扩展性方面具有优势。  相似文献   

15.
Multiple streams of binary data occur commonly in practice. In this paper, we propose a hierarchical statistical model to describe multi‐stream binary data that demonstrate over‐dispersion. In such a model, a group of binary streams in a multi‐stream dataset is modeled by a beta‐binominal hierarchical mixture distribution. Using this hierarchical model structure, a cumulative sum (CUSUM) chart based on the log‐likelihood ratio is developed to monitor all the data streams simultaneously. The performance of the CUSUM chart is investigated and compared to conventional monitoring schemes through numerical studies and a real‐world dataset. It is shown that the CUSUM method using the hierarchical model is effective and advantageous over the conventional methods. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

16.
“Sequential pattern mining” is a prominent and significant method to explore the knowledge and innovation from the large database. Common sequential pattern mining algorithms handle static databases. Pragmatically, looking into the functional and actual execution, the database grows exponentially thereby leading to the necessity and requirement of such innovation, research, and development culminating into the designing of mining algorithm. Once the database is updated, the previous mining result will be incorrect, and we need to restart and trigger the entire mining process for the new updated sequential database. To overcome and avoid the process of rescanning of the entire database, this unique system of incremental mining of sequential pattern is available. The previous approaches, system, and techniques are a priori-based frameworks but mine patterns is an advanced and sophisticated technique giving the desired solution. We propose and incorporate an algorithm called STISPM for incremental mining of sequential patterns using the sequence tree space structure. STISPM uses the depth-first approach along with backward tracking and the dynamic lookahead pruning strategy that removes infrequent and irregular patterns. The process and approach from the root node to any leaf node depict a sequential pattern in the database. The structural characteristic of the sequence tree makes it convenient and appropriate for incremental sequential pattern mining. The sequence tree also stores all the sequential patterns with its count and statistics, so whenever the support system is withdrawn or changed, our algorithm using frequent sequence tree as the storage structure can find and detect all the sequential patterns without mining the database once again.  相似文献   

17.
为实现在大型事务数据库中挖掘有价值的序列数据,提出了一种基于位图的高效的序列模式挖掘算法(SMBR)。SMBR算法采用位图表示数据库的方法,提出一种简化的位图表示结构。该算法首先由序列扩展和项扩展产生候选序列,然后通过原序列位图和被扩展项位图位置快速运算生成频繁序列。实验表明,应用于大型事务数据库,该方法不仅能有效地提高挖掘效率,而且挖掘处理过程中产生的临时数据所需的内存大大降低,能够高效地挖掘序列模式。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号