首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 30 毫秒
1.
针对已有概率频繁项集挖掘算法采用模式增长的方式构建树时产生大量树节点,导致内存空间占用较大以及发现概率频繁项集效率低等问题,提出了改进的不确定数据频繁模式增长(PUFP-Growth)算法。该算法通过逐条读取不确定事务数据库中数据,构造类似频繁模式树(FP-Tree)的紧凑树结构,同时更新项头表中保存所有尾节点相同项集的期望值的动态数组。当所有事务数据插入到改进的不确定数据频繁模式树(PUFP-Tree)中以后,通过遍历数组得到所有的概率频繁项集。最后通过实验结果和理论分析表明:PUFP-Growth算法可以有效地发现概率频繁项集;与不确定数据频繁模式增长(UF-Growth)算法和压缩的不确定频繁模式挖掘(CUFP-Mine)算法相比,提出的PUFP-Growth算法能够提高不确定数据概率频繁项集挖掘的效率,并且减少了内存空间的使用。  相似文献   

2.
In the past, many algorithms were proposed to adopt fuzzy-set theory for discovering fuzzy association rules from quantitative databases. The fuzzy frequent pattern (FFP)-tree and the compressed fuzzy frequent pattern (CFFP)-tree algorithms were respectively proposed to mine the incomplete fuzzy frequent itemsets from the tree-based structures. In the past, multiple fuzzy frequent pattern (MFFP)-tree algorithm was proposed to keep more linguistic terms for mining fuzzy frequent itemsets. Since the MFFP-tree algorithm inherits the property of the FFP-tree algorithm, numerous tree nodes are thus required to build the MFFP-tree structure for mining the desired multiple fuzzy frequent itemsets. In this paper, the compressed multiple fuzzy frequent pattern (CMFFP)-tree algorithm is designed to keep not only the linguistic term with maximum membership value but also the other frequent linguistic terms for mining the completely fuzzy frequent itemsets. In the designed CMFFP-tree algorithm, the multiple frequent linguistic terms are sorted in descending order of their occurrence frequencies to build the CMFFP-tree structure. The construction process is the same as the CFFP-tree algorithm except more information are kept for later mining process to discover the completely fuzzy frequent itemsets. Each node in the CMFFP-tree uses the additional array to keep the membership values of its prefix path by intersection operation. A CMFFP-mine algorithm is also designed to efficiently mine the multiple fuzzy frequent itemsets from the developed CMFFP-tree structure. Experiments are then conducted to show the performance of the proposed CMFFP-tree algorithm in terms of execution time and the number of tree nodes, compared to those of the MFFP-tree and CFFP-tree algorithms.  相似文献   

3.
Frequent itemset mining (FIM) is a fundamental research topic, which consists of discovering useful and meaningful relationships between items in transaction databases. However, FIM suffers from two important limitations. First, it assumes that all items have the same importance. Second, it ignores the fact that data collected in a real-life environment is often inaccurate, imprecise, or incomplete. To address these issues and mine more useful and meaningful knowledge, the problems of weighted and uncertain itemset mining have been respectively proposed, where a user may respectively assign weights to items to specify their relative importance, and specify existential probabilities to represent uncertainty in transactions. However, no work has addressed both of these issues at the same time. In this paper, we address this important research problem by designing a new type of patterns named high expected weighted itemset (HEWI) and the HEWI-Uapriori algorithm to efficiently discover HEWIs. The HEWI-Uapriori finds HEWIs using an Apriori-like two-phase approach. The algorithm introduces a property named high upper-bound expected weighted downward closure (HUBEWDC) to early prune the search space and unpromising itemsets. Substantial experiments on real-life and synthetic datasets are conducted to evaluate the performance of the proposed algorithm in terms of runtime, memory consumption, and number of patterns found. Results show that the proposed algorithm has excellent performance and scalability compared with traditional methods for weighted-itemset mining and uncertain itemset mining.  相似文献   

4.
最大频繁项目集的快速更新   总被引:29,自引:0,他引:29  
挖掘最大频繁项目集是多种数据挖掘应用中的关键问题.为克服基于Apriori的最大频繁项目集挖掘算法存在的不足,DMFIA采用FP-tree存储结构及自顶向下的搜索策略,有效地提高了最大频繁项目集的挖掘效率.但对于频繁项目多而最大频繁项目集维数相对较小的情况,DMFIA要经过多层搜索且在每一层产生大量的候选项目集,因而影响算法的执行效率.为此,该文提出了DMFIA的改进算法IDMFIA(the Improved algorithm of DMFIA).IDMFIA采用自顶向下和自底向上双向搜索策略,可尽早修剪掉较短最大频繁项目集的超集和较长最大频繁项目集的子集.另外,该文还提出最大频繁项目集更新算法FUMFIA(Fast Updating Maximum Frequent Itemsets Algorithm),该算法充分利用已建立的FP-tree和已挖掘的最大频繁项目集,可对已挖掘的最大频繁项目集进行高效维护.实验结果表明,IDMFIA和FUMFIA可有效提高最大频繁项目集的挖掘和更新效率.  相似文献   

5.
针对最大频繁项目集挖掘算法(DMFIA)当候选项目集维数高而最大频繁项目集维数较低的情况下要产生大量的候选项目集的缺点,提出了一种改进的基于频繁模式树(FP-tree)结构的最大频繁项目集挖掘算法--FP-MFIA。该算法根据FP-tree的项目头表,采用自底向上的搜索策略逐层挖掘最大频繁项目集,从而加速每次对候选集计数的操作。在挖掘时根据每层的条件模式基产生维数较低的非频繁项目集,尽早对候选项目集进行剪枝和降维,可大量减少候选项目集的数量。同时在挖掘时充分利用最大频繁项集的性质,减少搜索空间。通过算法在不同支持度下挖掘时间的对比可知,算法FP-MFIA在最小支持度较低的情况下时间效率是DMFIA以及基于降维的最大频繁模式挖掘算法(BDRFI)的2倍以上,说明FP-MFIA在候选集维数较高的时候优势明显。  相似文献   

6.
对于不确定性数据,传统判断项集是否频繁的方法并不能准确表达项集的频繁性,同样对于大型数据,频繁项集显得庞大和冗余。针对上述不足,在水平挖掘算法Apriori的基础上,提出一种基于不确定性数据的频繁闭项集挖掘算法UFCIM。利用置信度概率表达项集频繁的准确性,置信度越高,项集为频繁的准确性也越高,且由于频繁闭项集是频繁项集的一种无损压缩表示,因此利用压缩形式的频繁闭项集替代庞大的频繁项集。实验结果表明,该算法能够快速地挖掘出不确定性数据中的频繁闭项集,在减少项集冗余的同时保证项集的准确性和完整性。  相似文献   

7.
高效用序列模式挖掘是数据挖掘领域的一项重要内容, 在生物信息学、消费行为分析等方面具有重要的应用.与传统基于频繁项模式挖掘方法不同, 高效用序列模式挖掘不仅考虑项集的内外效用, 更突出项集的时间序列含义, 计算复杂度较高.尽管已经有一定数量的算法被提出应用于解决该类问题, 挖掘算法的时空效率依然成为该领域的主要研究热点问题.鉴于此, 本文提出一个基于模式增长的高效用序列模式挖掘算法HUSP-FP.依据高效用序列项集必须满足事务效用闭包属性要求, 算法首先在去除无用项后建立全局树, 进而采用模式增长方法从全局树上获取全部高效用序列模式, 避免产生候选项集. 在实验环节与目前效率较好的HUSP-Miner、USPAN、HUS-Span三类算法进行了时空计算对比, 实验结果表明本文给出算法在较小阈值下仍能有效挖掘到相关序列模式, 并且在计算时间和空间使用效率两方面取得了较大的提高.  相似文献   

8.
基于索引数组和复合频繁模式树的频繁闭项集挖掘算法   总被引:1,自引:0,他引:1  
频繁闭项集惟一确定频繁项集且规模小得多.CROP是一种基于复合频繁模式树的、频繁闭项集高效挖掘算法,但存在着候选结点过多的问题.这些非闭合结点的生成、检查和剪裁带来了大量不必要的操作.提出了一种改进的频繁闭项集挖掘算法CROP_Index.该算法用"索引数组"来组织数据,找到频繁共同出现的项集.基于二进制位图,给出了一个包含索引的计算方法,并利用索引启发信息合并,得到复合型频繁模式树的初始结点;同时给出一些新的性质,使得改进的算法只生成闭合结点,从而节省了大量不必要的操作,缩小了搜索空间.实验结果表明该算法效率较高.  相似文献   

9.
针对UF-tree中项集存在的数据和路径冗余的问题,设计了有序的压缩不确定树SCUF-tree,在节点中存储元素的不同支持度,达到压缩存储空间和方便移植已有的确定数据最大频繁项集算法的目的。结合最大频繁项集挖掘算法MMFI的设计思想,提出了一种挖掘不确定最大频繁项集算法UMMFI算法,并采取逐层逐个的NBN策略挖掘不确定最大频繁项集。实验结果表明,UMMFI算法具有较好的时空效益和适应性。  相似文献   

10.
针对UF-growth算法构造大量树节点和分支的局限性, 且不断计算候选数据项支持度的不足, 提出压缩UF-tree算法。压缩UF-tree算法改变建树条件:事务中数据项与树中某个分支节点的数据项匹配时, 将该数据项合并到分支中; 否则, 从该分支节点创建新的分支, 叶节点保存当前事务编号。构建单项数据项的概率向量, 搜索树分支产生候选项, 通过事务编号和概率向量计算候选数据项的支持度进而挖掘频繁项。通过实验对比与分析, 压缩UF-tree算法可行且更高效。  相似文献   

11.
基于Apriori的加权频繁项集挖掘算法存在扫描数据集次数多的问题。为此,提出一种基于动态项集计数的加权频繁项集算法。该算法采用权值键树的数据结构和动态项集计数的方法,满足向下闭合特性,并且动态生成候选频繁项集,从而减少扫描数据集的次数。实验结果证明,该算法生成的加权频繁项集具有较高的效率和时间性能。  相似文献   

12.
针对目前时态关联规则研究中存在的挖掘效率不高、规则可解释性低、未考虑项集时间关联关系等问题,在原有相关研究的基础上,提出一种新的基于频繁项集树的时态关联规则挖掘算法.通过对时间序列数据进行降维离散化处理,采用向量运算生成频繁项集,提高频繁项集挖掘效率.考虑到项集之间的时态关系以及树结构的优势,提出一种新的频繁项集树结构挖掘时态关联规则,其挖掘频繁项集与树结构构建同时进行,无需产生候选项集,提高了规则挖掘效率.实验表明,对比于其他算法,所提出算法在挖掘效率和规则解释性方面效果更好,具有较好的应用前景.  相似文献   

13.
概化关联规则挖掘作为数据挖掘领域一个重要的拓展性研究课题,首先提出了一种概化扩展自然序树(generalized extended canonical-order tree,GECT)结构及其增量挖掘算法GECT-IM.该算法对原始分类事务数据库只扫描一次,就可以将所有交易信息映射至一棵压缩格式的GECT,然后通过对更新交易数据集扫描得到更新数据集中各项集的计数,结合相关性质及运算就可以发现大部分更新后的概化频繁项集;其次,针对GECT规模较大以及GECT-IM 算法仍然可能需要遍历初始GECT树的局限,在界定数据库更新和重构概念的基础上,基于一种可量化度量的准最小支持度阈值,提出了一种改进的准频繁概化扩展自然序树(pre-large generalized extended canonical-order tree,PGECT)结构及其增量挖掘算法PGECT-IM.由于有效避免了对初始GECT进行遍历的情形,从而进一步提升了概化关联规则增量挖掘效率.实验证明,提出的概化关联规则增量挖掘算法 GECT-IM 及其优化算法PGECT-IM,比现有增量挖掘算法具有更高的挖掘效率和更好的扩展性.  相似文献   

14.
Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks.In this paper,we propose a novel vertical data representation called N-list,which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about frequent itemsets.Based on the N-list data structure,we develop an efficient mining algorithm,PrePost,for mining all frequent itemsets.Efficiency of PrePost is achieved by the following three reasons.First,N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree.Second,the counting of itemsets’ supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O(m + n) by an efficient strategy,where m and n are the cardinalities of the two N-lists respectively.Third,PrePost can directly find frequent itemsets without generating candidate itemsets in some cases by making use of the single path property of N-list.We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining frequent itemsets on a variety of real and synthetic datasets.The experimental results show that the PrePost algorithm is the fastest in most cases.Even though the algorithm consumes more memory when the datasets are sparse,it is still the fastest one.  相似文献   

15.
The frequent pattern tree (FP-tree. is an efficient data structure for association-rule mining without generation of candidate itemsets. It was used to compress a database into a tree structure which stored only large items. It, however, needed to process all transactions in a batch way. In the past, we proposed a Fast Updated FP-tree (FUFP-tree. structure to efficiently handle new transactions and to make the tree update process become easier. In this paper, we propose the structure of prelarge trees to incrementally mine association rules based on the concept of pre-large itemsets. Due to the properties of pre-large concepts, the proposed approach does not need to rescan the original database until a number of new transactions have been inserted. The proposed approach can thus achieve a good execution time for tree construction especially when a small number of transactions are inserted each time. Experimental results also show that the proposed approach has a good performance for incrementally handling new transactions.  相似文献   

16.
基于矩阵的频繁项集挖掘算法   总被引:9,自引:3,他引:6       下载免费PDF全文
如何高效地挖掘频繁项集是关联规则挖掘的主要问题。该文根据集合论和矩阵理论,提出一种基于矩阵的频繁项集挖掘算法。该算法只需扫描数据库一次,就能把所有事务转化为矩阵的行,把所有项和项集转化为矩阵的列,在对矩阵操作时能一次性产生所有频繁项集,且当支持度阈值改变时无需重新扫描数据库。实验结果表明,该算法的挖掘效率高于Apriori算法。  相似文献   

17.
增量式频繁项集挖掘是当前研究的热点,基于FP-Growth的Pre-FUFP算法有效处理了频繁模式的更新,但需递归遍历FP-tree,导致效率较低。提出Pre-FIUT算法,引入频繁超度量树结构,提高了获得频繁项集挖掘效率;基于FIUT的Pre-FIUT可通过查看频繁超度量树叶子结点的支持度确定频繁项集,并与次频繁项集概念相结合进行增量式频繁项集挖掘。实验表明,Pre-FIUT算法能快速扫描和更新数据,合理利用内存,精确获得频繁项集。  相似文献   

18.
This paper proposes an efficient method, the frequent items ultrametric trees (FIUT), for mining frequent itemsets in a database. FIUT uses a special frequent items ultrametric tree (FIU-tree) structure to enhance its efficiency in obtaining frequent itemsets. Compared to related work, FIUT has four major advantages. First, it minimizes I/O overhead by scanning the database only twice. Second, the FIU-tree is an improved way to partition a database, which results from clustering transactions, and significantly reduces the search space. Third, only frequent items in each transaction are inserted as nodes into the FIU-tree for compressed storage. Finally, all frequent itemsets are generated by checking the leaves of each FIU-tree, without traversing the tree recursively, which significantly reduces computing time. FIUT was compared with FP-growth, a well-known and widely used algorithm, and the simulation results showed that the FIUT outperforms the FP-growth. In addition, further extensions of this approach and their implications are discussed.  相似文献   

19.
高速边界扫描主控器设计   总被引:2,自引:1,他引:1       下载免费PDF全文
分析边界扫描测试技术的工作机制和对测试支撑系统的功能需求,提出一种基于USB总线的高速边界扫描测试主控器的设计方案。利用CY7C68013作为USB2.0接口控制器,使用CPLD实现JTAG主控硬核,完成JTAG协议和USB总线协议的相互转换。JTAG的TCK时钟频率可调,最高可达48MHz。用户可利用该边界扫描控制器方便高效地进行边界扫描测试。  相似文献   

20.
The frequent pattern tree (FP-tree) is an efficient data structure for association-rule mining without generation of candidate itemsets. It was used to compress a database into a tree structure which stored only large items. It, however, needed to process all transactions in a batch way. In real-world applications, new transactions are usually incrementally inserted into databases. In the past, we proposed a Fast Updated FP-tree (FUFP-tree) structure to efficiently handle new transactions and to make the tree update process become easier. In this paper, we attempt to modify the FUFP-tree construction based on the concept of pre-large itemsets. Pre-large itemsets are defined by a lower support threshold and an upper support threshold. It does not need to rescan the original database until a number of new transactions have been inserted. The proposed approach can thus achieve a good execution time for tree construction especially when each time a small number of transactions are inserted. Experimental results also show that the proposed Pre-FUFP maintenance algorithm has a good performance for incrementally handling new transactions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号