首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 171 毫秒
1.
王乐  熊松泉  常艳芬  王水 《自动化学报》2015,41(9):1616-1626
高效用模式挖掘是数据挖掘领域的一个重要研究内容; 由于其计算过程包含对模式的内、外效用值的处理, 计算复杂度较大, 因此挖掘算法的主要研究热点问题就是提高算法的时间效率.针对此问题, 本文给出一个基于模式增长方式的高效用模式挖掘算法HUPM-FP, 该算法可以从全局树上挖掘高效用模式, 避免产生候选项集.实验中, 采用6个典型数据集进行实验, 并和目前效率较好的算法FHM (Faster high-utility itemset mining)做了对比, 实验结果表明本文给出的算法时空效率都有较大的提高, 特别是时间效率提高较大, 可以达到1个数量级以上.  相似文献   

2.
大数据环境下高效用项集挖掘算法中过多的候选项集极大地降低了算法的时空效率,提出了一种减少候选项集的数据流高效用项集挖掘算法。首先,通过数据流中当前窗口的一次扫描建立一个全局树,并降低全局树中头表入口与节点的冗余效用值;然后,基于全局树生成候选模式,基于增长算法降低局部树的候选项集效用;最终,从候选模式中选出高效用模式。基于真实数据流的实验结果表明,本算法的时空效率与内存占用比均优于其他数据流的高效用模式挖掘算法。  相似文献   

3.
目前TOP-K高效用模式挖掘算法需要产生候选项集,特别是当数据集比较大或者数据集中包含较多长事务项集时,算法的时间和空间效率会受到更大的影响.针对此问题,通过将事务项集和项集效用信息有效地保存到树结构HUP-Tree,给出一个不需要候选项集的挖掘算法TOPKHUP;HUPTree树能保证从中计算到每个模式的效用值,不需要再扫描数据集来计算模式的效用值,从而使挖掘算法的时空效率得到较大的提高.采用7个典型数据集对算法的性能进行测试,实验结果证明TOPKHUP的时间和空间效率都优于已有算法,并对K值的变化保持平稳.  相似文献   

4.
由于数据规模的快速增长,高效用序列模式挖掘算法效率严重下降.针对这种情况,提出基于MapReduce的高效用序列模式挖掘算法HusMaR.算法基于MapReduce框架,使用效用矩阵高效地生成候选项;使用随机映射策略均衡计算资源;使用基于领域的剪枝策略来防止组合爆炸.实验结果表明,在大规模数据集下,算法取得了较高的并行效率.  相似文献   

5.
针对序列模式的高效用模式挖掘过程中搜索空间大、计算复杂度高的问题,提出一种基于多效用阈值的分布式高效用序列模式挖掘算法。采用数组结构保存模式的效用信息,解决效用矩阵导致的内存消耗大的缺点。设计1-项集与2-项集的深度剪枝策略,深入地缩小候选模式的搜索空间,减少搜索时间成本与缓存成本。提出挖掘算法的分布式实现方案,通过并行处理进一步降低模式挖掘的时间。基于中等规模与大规模的序列数据集分别进行实验,实验结果表明,该算法有效减少了候选模式的数量,降低了挖掘的时间成本与存储成本,对于大数据集表现出较好的可扩展能力与稳定性。  相似文献   

6.
摘 要: 高效用模式挖掘被广泛应用于数据挖掘领域。为了挖掘指定数量的高效用模式,一些基于树结构和效用表结构的top-k高效用挖掘算法被提出,但前者在挖掘过程中产生了大量候选模式,后者在效用模式增长时需要进行多次比较。同时,由于在信息社会,数据量呈爆炸性增长。因此,在数据集过大的情况下,挖掘高效用模式需以大量存储空间以及计算开销为代价。为了解决这两个问题,基于MapReduce的top-k高效用模式挖掘算法(TKHUP_MaR)被提出。该算法通过两次扫描数据库,利用三次MapReduce来实现并行top-k高效用模式的挖掘。通过实验表明TKHUP_MaR 算法在并行挖掘top-k高效用模式的过程中是有效的。  相似文献   

7.
《微型机与应用》2016,(22):22-25
由于高效用模式挖掘较为复杂,提高其挖掘算法的效率是数据挖掘的研究热点。HUP-miner算法是典型的基于垂直模式类的高效用模式挖掘算法,虽然能够有效地减少效用列表的总个数,但对于项集的划分,效用列表需要更多的空间。针对该问题,在HUI-miner算法的基础上充分考虑了1-扩展集中项集的关联性,减少了效用列表个数,提出了改进的IHUI-miner算法。实验结果表明,改进算法IHUI-miner在时间效率和减少效用列表的个数上都优于HUP-miner与HUI-miner算法。  相似文献   

8.
为了提高大规模消息流话题预测的准确性与效率,提出基于高效用项集挖掘的消息流话题预测算法。计算时间窗口中词汇的内部效用与外部效用,根据会话内所有词汇的效用计算最小效用值;采用高效用项集挖掘算法产生候选话题模式集,随之提取最终的话题模式。为了提高高效用项集挖掘的时间效率与存储效率,设计三角项集效用树保存项集的效用信息,设计话题搜索树保存候选话题模式集。最终基于真实消息流数据集进行实验,结果显示该算法有效地提高了话题预测的准确率,并且实现了较快的响应时间。  相似文献   

9.
如何在海量数据集中提高频繁项集的挖掘效率是目前研究的热点.随着数据量的不断增长,使用传统算法产生频繁项集的计算代价依然很高.为此,提出一种基于Spark的频繁项集快速挖掘算法(fast mining algorithm of frequent itemset based on spark,Fmafibs),利用位运算速度快的特点,设计了一种新颖的模式增长策略.该算法首先采用位串表达项集,利用位运算来快速生成候选项集;其次,针对超长位串计算效率低的问题,考虑将事务垂直分组处理,将同一事务不同组之间的频繁项集通过连接获得候选项集,最后进行聚合筛选得到最终频繁项集.算法在Spark环境下,以频繁项集挖掘领域基准数据集进行实验验证.实验结果表明所提方法在保证挖掘结果准确的同时,有效地提高了挖掘效率.  相似文献   

10.
全集高效用模式挖掘算法存在的关键问题之一是会产生冗余的高效用项集,这将导致用户很难在大量的高效用项集中发现有用的信息,严重降低了高效用模式挖掘算法的性能。为解决这一问题,衍生出了精简高效用模式挖掘算法,其主要包括最大高效用模式、闭合高效用模式、top-k高效用模式以及三者之间的组合高效用模式挖掘算法等。首先,介绍了精简高效用模式的相关问题描述;然后,从有无候选项集生成、一两阶段挖掘方法、数据结构类型和剪枝策略等角度,重点分类总结了精简高效用模式挖掘方法;最后,给出了精简高效用模式的进一步研究方向,包括处理基于负项的高效用精简模式、处理基于时间的高效用精简模式及处理动态复杂的数据等。  相似文献   

11.
Mining high utility itemsets by dynamically pruning the tree structure   总被引:2,自引:2,他引:0  
Mining high utility itemsets is one of the most important research issues in data mining owing to its ability to consider nonbinary frequency values of items in transactions and different profit values for each item. Mining such itemsets from a transaction database involves finding those itemsets with utility above a user-specified threshold. In this paper, we propose an efficient concurrent algorithm, called CHUI-Mine (Concurrent High Utility Itemsets Mine), for mining high utility itemsets by dynamically pruning the tree structure. A tree structure, called the CHUI-Tree, is introduced to capture the important utility information of the candidate itemsets. By recording changes in support counts of candidate high utility items during the tree construction process, we implement dynamic CHUI-Tree pruning, and discuss the rationality thereof. The CHUI-Mine algorithm makes use of a concurrent strategy, enabling the simultaneous construction of a CHUI-Tree and the discovery of high utility itemsets. Our algorithm reduces the problem of huge memory usage for tree construction and traversal in tree-based algorithms for mining high utility itemsets. Extensive experimental results show that the CHUI-Mine algorithm is both efficient and scalable.  相似文献   

12.
Incrementally mining high utility patterns based on pre-large concept   总被引:1,自引:1,他引:0  
In traditional association rule mining, most algorithms are designed to discover frequent itemsets from a binary database. Utility mining was thus proposed to measure the utility values of purchased items for revealing high utility itemsets from a quantitative database. In the past, a two-phase high utility mining algorithm was thus proposed for efficiently discovering high utility itemsets from a quantitative database. In dynamic data mining, transactions may be inserted, deleted, or modified from a database. In this case, a batch mining procedure must rescan the whole updated database to maintain the up-to-date information. Designing an efficient approach for handling dynamic databases is thus a critical research issue in utility mining. In this paper, an incremental mining algorithm is proposed for efficiently maintaining discovered high utility itemsets based on pre-large concepts. Itemsets are first partitioned into three parts according to whether they have large (high), pre-large, or small transaction-weighted utilization in the original database and in inserted transactions. Individual procedures are then executed for each part. Experimental results show that the proposed incremental high utility mining algorithm outperforms existing algorithms.  相似文献   

13.
Traditional frequent pattern mining methods consider an equal profit/weight for all items and only binary occurrences (0/1) of the items in transactions. High utility pattern mining becomes a very important research issue in data mining by considering the non-binary frequency values of items in transactions and different profit values for each item. However, most of the existing high utility pattern mining algorithms suffer in the level-wise candidate generation-and-test problem and generate too many candidate patterns. Moreover, they need several database scans which are directly dependent on the maximum candidate length. In this paper, we present a novel tree-based candidate pruning technique, called HUC-Prune (High Utility Candidates Prune), to solve these problems. Our technique uses a novel tree structure, called HUC-tree (High Utility Candidates tree), to capture important utility information of the candidate patterns. HUC-Prune avoids the level-wise candidate generation process by adopting a pattern growth approach. In contrast to the existing algorithms, its number of database scans is completely independent of the maximum candidate length. Extensive experimental results show that our algorithm is very efficient for high utility pattern mining and it outperforms the existing algorithms.  相似文献   

14.
High utility itemset mining considers the importance of items such as profit and item quantities in transactions. Recently, mining high utility itemsets has emerged as one of the most significant research issues due to a huge range of real world applications such as retail market data analysis and stock market prediction. Although many relevant algorithms have been proposed in recent years, they incur the problem of generating a large number of candidate itemsets, which degrade mining performance. In this paper, we propose an algorithm named MU-Growth (Maximum Utility Growth) with two techniques for pruning candidates effectively in mining process. Moreover, we suggest a tree structure, named MIQ-Tree (Maximum Item Quantity Tree), which captures database information with a single-pass. The proposed data structure is restructured for reducing overestimated utilities. Performance evaluation shows that MU-Growth not only decreases the number of candidates but also outperforms state-of-the-art tree-based algorithms with overestimated methods in terms of runtime with a similar memory usage.  相似文献   

15.
Association-rule mining, which is based on frequency values of items, is the most common topic in data mining. In real-world applications, customers may, however, buy many copies of products and each product may have different factors, such as profits and prices. Only mining frequent itemsets in binary databases is thus not suitable for some applications. Utility mining is thus presented to consider additional measures, such as profits or costs according to user preference. In the past, a two-phase mining algorithm was designed for fast discovering high utility itemsets from databases. When data come intermittently, the approach needs to process all the transactions in a batch way. In this paper, an incremental mining algorithm for efficiently mining high utility itemsets is proposed to handle the above situation. It is based on the concept of the fast-update (FUP) approach, which was originally designed for association mining. The proposed approach first partitions itemsets into four parts according to whether they are high transaction-weighted utilization itemsets in the original database and in the newly inserted transactions. Each part is then executed by its own procedure. Experimental results also show that the proposed algorithm executes faster than the two-phase batch mining algorithm in the intermittent data environment  相似文献   

16.
A new mining approach for uncertain databases using CUFP trees   总被引:1,自引:0,他引:1  
In the past, many algorithms have been proposed to mine frequent itemsets from transactional databases, in which the presence or absence of items in transactions was certainly known. In some applications, items may also be uncertain in transactions with their existential probabilities ranging from 0 to 1 in the uncertain dataset. Apparently, the processing in uncertain datasets is quite different from those in certain datasets. The UF-tree algorithm was proposed to construct the UF-tree structure from an uncertain dataset and mine frequent itemsets from the tree. In the UF-tree construction process, however, only the same items with the same existential probabilities in transactions were merged together in the tree, thus causing many redundant nodes in the tree. In this paper, a new tree structure called the compressed uncertain frequent-pattern tree (CUFP tree) is designed to efficiently keep the related information in the mining process. In the CUFP tree, the same items will be merged in a branch of the tree even when the existential probabilities in transactions are not the same. A mining algorithm called the CUFP-mine algorithm is then proposed based on the tree structure to find uncertain frequent patterns. Experimental results show that the proposed approach has a better performance than UF-tree algorithm both in the execution time and in the number of tree nodes.  相似文献   

17.
Traditional association-rule mining only concerns the occurrence frequencies of the items in a binary database. In real-world applications, customers may buy several copies of the purchased items. Other factors such as profit, quantity, or price should be concerned to measure the utilities of the purchased items. High-utility itemsets mining was thus proposed to consider the factors of quantity and profit. Two-phase model was the most commonly way to keep the transaction-weighted utilization downward closure property, thus reducing the numerous candidates in utility mining. Most methods for finding high-utility itemsets are used to handle a static database. In practical applications, transactions are changed whether insertion, deletion, or modification. Some itemsets may arise as the new high-utility itemsets or become invalid knowledge in the updated database. In this paper, a maintenance Fast Updated High Utility Pattern tree for transaction MODification (FUP-HUP-tree-MOD) algorithm is thus proposed to effective maintain and update the built HUP tree for mining high-utility itemsets in dynamic databases without candidate generation. Experiments are conducted to show better performance of the proposed algorithm compared to the two-phase algorithm and the HUP tree algorithm in batch mode.  相似文献   

18.
Fuzzy utility mining has been an emerging research issue because of its simplicity and comprehensibility. Different from traditional fuzzy data mining, fuzzy utility mining considers not only quantities of items in transactions but also their profits for deriving high fuzzy utility itemsets. In this paper, we introduce a new fuzzy utility measure with the fuzzy minimum operator to evaluate the fuzzy utilities of itemsets. Besides, an effective fuzzy utility upper-bound model based on the proposed measure is designed to provide the downward-closure property in fuzzy sets, thus reducing the search space of finding high fuzzy utility itemsets. A two-phase fuzzy utility mining algorithm, named TPFU, is also proposed and described for solving the problem of fuzzy utility mining. At last, the experimental results on both synthetic and real datasets show that the proposed algorithm has good performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号