首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Discovery of maximum length frequent itemsets   总被引:1,自引:0,他引:1  
The use of frequent itemsets has been limited by the high computational cost as well as the large number of resulting itemsets. In many real-world scenarios, however, it is often sufficient to mine a small representative subset of frequent itemsets with low computational cost. To that end, in this paper, we define a new problem of finding the frequent itemsets with a maximum length and present a novel algorithm to solve this problem. Indeed, maximum length frequent itemsets can be efficiently identified in very large data sets and are useful in many application domains. Our algorithm generates the maximum length frequent itemsets by adapting a pattern fragment growth methodology based on the FP-tree structure. Also, a number of optimization techniques have been exploited to prune the search space. Finally, extensive experiments on real-world data sets validate the proposed algorithm.  相似文献   

2.
基于FP-tree的最大频繁项目集挖掘算法   总被引:1,自引:0,他引:1  
最大频繁项目集挖掘是数据挖掘领域最重要的基本问题之一,在分析已有算法的基础上提出了FP-MMFI算法,它是对FP-growth算法在最大频繁项目集挖掘上的扩展.提出了频繁路径的概念,用它可以有效地对FP-tree进行压缩和缩小搜索空间,同时使用投影的方法对超集检测进行了优化,减少了项目匹配的次数.最后实验结果表明,该算法在性能上优于已有的同类算法.  相似文献   

3.
事务间频繁项集将传统的单维事务内关联规则扩展到多维跨事务关联规则,但事务问频繁项集的数量随滑 动时同间窗口的增大而迅速增加.利用频繁闭项集的特点.提出事务间频繁闭项集的概念及其挖掘算法(FCITA).该算法采用分割和条件数据库技术,避免生成庞大的扩展数据库;利用扩展二进制形武压缩事务,从而提高支持度的计算效事.此外,动态排序和哈希表极大地减少了频繁闭项集的测试次数.仿真比较表明,FCITA算法具有较高的挖掘效率.  相似文献   

4.
一种新的最大频繁项目集挖掘算法   总被引:5,自引:0,他引:5  
马丽生  邓辉文  齐逸 《计算机应用》2006,26(11):2670-2673
最大频繁项目集挖掘是数据挖掘领域最重要的基本问题之一,在分析已有算法的基础上,提出了一种新的挖掘最大频繁项目集的算法,实验表明该算法在性能上优于已有的同类算法。  相似文献   

5.
为提高不确定数据集上频繁模式挖掘的效率, 针对已有算法在判断是否需要为头表中的某项创建子头表时的计算量比较大的问题, 给出一个近似挖掘策略AAT-Mine, 以损失小部分频繁项集为代价, 提高整个算法的挖掘效率。采用三个不同的典型数据集进行了算法的测试, 分别与目前最好的算法和典型算法进行性能对比。实验结果验证了近似算法AAT-Mine的时空效率都得到了提高。  相似文献   

6.
一种有效的并行频繁项集挖掘算法   总被引:1,自引:0,他引:1  
传统的挖掘频繁项集的并行算法存在各节点间负载不均衡、同步开销过大、通信量大等问题。针对这些问题,提出了一种多次传送重新分配数据的并行算法(MRPD)。MRPD算法在第l步时将数据库重新划分成若干组,并根据各节点的需要多次传送分组;各节点获得完整分组后异步地计算频繁项集;所有节点计算完成后,得到全部频繁项集。理论分析和实验结果表明MRPD算法是有效的。  相似文献   

7.
DSM-FI: an efficient algorithm for mining frequent itemsets in data streams   总被引:4,自引:4,他引:0  
Online mining of data streams is an important data mining problem with broad applications. However, it is also a difficult problem since the streaming data possess some inherent characteristics. In this paper, we propose a new single-pass algorithm, called DSM-FI (data stream mining for frequent itemsets), for online incremental mining of frequent itemsets over a continuous stream of online transactions. According to the proposed algorithm, each transaction of the stream is projected into a set of sub-transactions, and these sub-transactions are inserted into a new in-memory summary data structure, called SFI-forest (summary frequent itemset forest) for maintaining the set of all frequent itemsets embedded in the transaction data stream generated so far. Finally, the set of all frequent itemsets is determined from the current SFI-forest. Theoretical analysis and experimental studies show that the proposed DSM-FI algorithm uses stable memory, makes only one pass over an online transactional data stream, and outperforms the existing algorithms of one-pass mining of frequent itemsets.
Suh-Yin LeeEmail:
  相似文献   

8.
快速挖掘频繁项集的并行算法   总被引:3,自引:0,他引:3  
何波  王华秋  刘贞  王越 《计算机应用》2006,26(2):391-0392
传统的挖掘频繁项集的并行算法存在数据偏移、通信量大、同步次数较多和扫描数据库次数较多等问题。针对这些问题,提出了一种快速挖掘频繁项集的并行算法(FPMFI)。FPMFI算法让各计算机节点独立地计算局部频繁项集,然后与中心节点交互实现数据汇总,最终获得全局频繁项集。理论分析和实验结果表明FPMFI算法是有效的。  相似文献   

9.
频繁项集的挖掘受到大量候选频繁项集和较高计算花费的限制,只挖掘最大长度频繁项集已满足很多应用。提出一种基于有序FP-tree结构挖掘最大长度频繁项集的算法。即对有序FP-tree的头表进行改造,增加一个max-level域,记录该项在有序FP-tree中的最大高度。挖掘时仅对max-level 大于等于已有最大长度频繁项集长度的项进行遍历,不产生条件模式基,无需递归构造条件FP-tree,且计算出最大长度频繁项集的支持度。实验结果表明该算法挖掘效率高、速度快。  相似文献   

10.
一种基于变尺度滑动窗口的数据流频繁集挖掘算法   总被引:2,自引:0,他引:2  
基干传统滑动窗口机制的数据流频繁集挖掘算法较多地考虑快速且精确的效果,而较少考虑数据流的时变特性,对传统的滑动窗口机制进行改进.同时考虑数据流的海量特性和时变特性,提出一种基于变尺度滑动窗口机制的数据流频繁集挖掘算法V-Stream.该算法采用事务链表组的概要数据结构.能够根据数据流的数据分布变化自适应调整窗口大小.Eclipse上的仿真实验结果表明,V-Stream相比Manku算法提高了挖掘数据流频繁集的时间与空间效率.  相似文献   

11.
通过对关联规则挖掘技术及经典算法Apriori和FP-growth的研究和分析,提出了一种改进的频繁项集挖掘算法。该算法利用矩阵存储数据,并结合矩阵运算求项集的支持数,有效减少了事务数据库的扫描次数;利用有序频繁项目邻接矩阵创建频繁模式树,有效减少了频繁模式树的分支和层数。通过实例分析了频繁项集的挖掘过程。  相似文献   

12.
近几年来,不确定数据广泛出现在传感器网络、Web应用等领域中。不确定数据挖掘已经成为了新的研究热点,主要包括聚类、分类、频繁项集挖掘、孤立点检测等方面,其中频繁项集挖掘是重点研究的问题之一。综述了传统的频繁项集挖掘的两类基本算法,分析了在此基础上提出的适用于不确定数据以及不确定数据流的频繁项集挖掘的方法,并探讨了今后可能的研究方向。  相似文献   

13.
于红  王秀坤  孟军 《控制与决策》2007,22(5):520-524
提出了完全前缀路径和有序FP-tree的概念,给出根据数据项所在的层建立有序FP-tree的方法,利用有序FP-tree表示数据.提出用有序FP-tree中的完全前缀路径进行最大频繁项集挖掘的算法——MFIM算法,该算法利用有序FP-tree中的完全前缀路径对挖掘算法进行优化.实验结果表明,该算法对于浓密数据集中挖掘长模式具有较好的性能.  相似文献   

14.
Mining frequent itemsets from transactional data streams is challenging due to the nature of the exponential explosion of itemsets and the limit memory space required for mining frequent itemsets. Given a domain of I unique items, the possible number of itemsets can be up to 2I − 1. When the length of data streams approaches to a very large number N, the possibility of an itemset to be frequent becomes larger and difficult to track with limited memory. The existing studies on finding frequent items from high speed data streams are false-positive oriented. That is, they control memory consumption in the counting processes by an error parameter ?, and allow items with support below the specified minimum support s but above s − ? counted as frequent ones. However, such false-positive oriented approaches cannot be effectively applied to frequent itemsets mining for two reasons. First, false-positive items found increase the number of false-positive frequent itemsets exponentially. Second, minimization of the number of false-positive items found, by using a small ?, will make memory consumption large. Therefore, such approaches may make the problem computationally intractable with bounded memory consumption. In this paper, we developed algorithms that can effectively mine frequent item(set)s from high speed transactional data streams with a bound of memory consumption. Our algorithms are based on Chernoff bound in which we use a running error parameter to prune item(set)s and use a reliability parameter to control memory. While our algorithms are false-negative oriented, that is, certain frequent itemsets may not appear in the results, the number of false-negative itemsets can be controlled by a predefined parameter so that desired recall rate of frequent itemsets can be guaranteed. Our extensive experimental studies show that the proposed algorithms have high accuracy, require less memory, and consume less CPU time. They significantly outperform the existing false-positive algorithms.  相似文献   

15.
近年来随着新的应用的出现,比如网络流量分析、在线事物分析和网络欺诈检测等,对数据流的挖掘成了一个越来越重要的课题。对于数据流频繁项集的挖掘,目前绝大部分的研究都集中在传统的窗口模式下进行,即时间衰退窗口模式、界标窗口模式和滑动窗口模式。Pauray S.M.Tsai于2009年提出了一种新的窗口模式:加权滑动窗口模式,并设计了两个基于此窗口模式的数据流频繁项集挖掘算法WSW和WSW-Imp,其中WSW-Imp是对WSW算法的改进。在研究了加权滑动窗口模式以及WSW-Imp算法的基础上,对WSW-Imp算法作了进一步的改进,设计了算法WSW-Imp2,并从理论上证明了WSW-Imp2算法比WSW-Imp算法更高效,实验结果也表明了这一点。  相似文献   

16.
基于改进FP-树的最大项目集挖掘算法*   总被引:1,自引:0,他引:1  
挖掘最大频繁项目集是多种数据挖掘应用中的关键问题。FP-growth算法是目前最有效的频繁模式挖掘算法之一,其在挖掘最大项目集时要递归生成大量的条件FP-树,存在时空效率不高的问题。于是结合改进的FP-树,提出了一种快速挖掘最大项目集的算法。该算法利用改进的FP-树是单向的且每个节点只保留指向父节点的指针,可以节约大量的存储空间;同时引入项目序列集和它的基本操作,使挖掘最大频繁项目集时不生成含大量候选项目的集合或条件FP-树,可以快速地挖掘出所有的最大频繁项目集。实例分析证明所提出的算法是可行的。  相似文献   

17.
网络流数据频繁项集挖掘是网络流量分析的重要基础。提出一种新颖的基于字典顺序前缀树LOP-Tree的频繁项集挖掘算法STFWFI,该算法采用更符合网络流特点的滑动时间衰减窗口模型,有效降低挖掘频繁项集的时间和空间复杂度;在该树结构上提出一种新的基于统计分布的节点权值计算方法SDNW代替传统的统计计算方法,提高了网络流节点估值的精确度。实验结果表明该算法在网络流频繁项集挖掘过程中获得了良好的效果。  相似文献   

18.
为了提高经典关联规则Apriori算法的挖掘效率,针对Apriori算法的瓶颈问题,提出了一种链式结构存储频繁项目集并生成最大频繁项目集的关联规则算法.该算法采用比特向量方式存储事务,生成频繁项目集的同时,把包含此频繁项目的事务作为链表连接到频繁项目之后,生成最大频繁项目集.该算法能够减小扫描事物数据库的次数和生成候选项目集的数量,从而减少了生成最大频繁项目集的时间,实验结果表明,该算法提高了运算效率.  相似文献   

19.
A data stream is a massive, open-ended sequence of data elements continuously generated at a rapid rate. Mining data streams is more difficult than mining static databases because the huge, high-speed and continuous characteristics of streaming data. In this paper, we propose a new one-pass algorithm called DSM-MFI (stands for Data Stream Mining for Maximal Frequent Itemsets), which mines the set of all maximal frequent itemsets in landmark windows over data streams. A new summary data structure called summary frequent itemset forest (abbreviated as SFI-forest) is developed for incremental maintaining the essential information about maximal frequent itemsets embedded in the stream so far. Theoretical analysis and experimental studies show that the proposed algorithm is efficient and scalable for mining the set of all maximal frequent itemsets over the entire history of the data streams.  相似文献   

20.
Frequent closed itemsets (FCI) play an important role in pruning redundant rules fast. Therefore, a lot of algorithms for mining FCI have been developed. Algorithms based on vertical data formats have some advantages in that they require scan databases once and compute the support of itemsets fast. Recent years, BitTable (Dong & Han, 2007) and IndexBitTable (Song, Yang, & Xu, 2008) approaches have been applied for mining frequent itemsets and results are significant. However, they always use a fixed size of Bit-Vector for each item (equal to number of transactions in a database). It leads to consume more memory for storage Bit-Vectors and the time for computing the intersection among Bit-Vectors. Besides, they only apply for mining frequent itemsets, algorithm for mining FCI based on BitTable is not proposed. This paper introduces a new method for mining FCI from transaction databases. Firstly, Dynamic Bit-Vector (DBV) approach will be presented and algorithms for fast computing the intersection between two DBVs are also proposed. Lookup table is used for fast computing the support (number of bits 1 in a DBV) of itemsets. Next, subsumption concept for memory and computing time saving will be discussed. Finally, an algorithm based on DBV and subsumption concept for mining frequent closed itemsets fast is proposed. We compare our method with CHARM, and recognize that the proposed algorithm is more efficient than CHARM in both the mining time and the memory usage.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号