首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Abstract

To overcome the limitation of high-utility itemset mining, more compact, lossless, and concise representations of high utility itemsets (HUIs) have been proposed in previous works, such as closed HUIs (CHUIs) or maximal HUIs (MHUIs). Focusing into MHUI mining, in this article, we present efficient approaches to directly mine MHUIs from transactional databases without generating any candidates. The proposed algorithms, which all execute in one phase, utilize many efficient data structures and pruning techniques such as EUCP combined with EUCS, CUIP combined with FUCS, and the P-set structure to significantly reduce the search space and remove nonpromising itemsets, thus, increase the performance of the MHUI mining process. Furthermore, while previous works assumed that the unit profit of items is fixed, which is not practical in many real-world applications, our work resolved this issue by applying a new utility calculation into the mining process to reflect the true nature of real-world databases, thus, generating more accurate results.  相似文献   

2.
High-utility itemset mining (HUIM) is a popular data mining task with applications in numerous domains. However, traditional HUIM algorithms often produce a very large set of high-utility itemsets (HUIs). As a result, analyzing HUIs can be very time consuming for users. Moreover, a large set of HUIs also makes HUIM algorithms less efficient in terms of execution time and memory consumption. To address this problem, closed high-utility itemsets (CHUIs), concise and lossless representations of all HUIs, were proposed recently. Although mining CHUIs is useful and desirable, it remains a computationally expensive task. This is because current algorithms often generate a huge number of candidate itemsets and are unable to prune the search space effectively. In this paper, we address these issues by proposing a novel algorithm called CLS-Miner. The proposed algorithm utilizes the utility-list structure to directly compute the utilities of itemsets without producing candidates. It also introduces three novel strategies to reduce the search space, namely chain-estimated utility co-occurrence pruning, lower branch pruning, and pruning by coverage. Moreover, an effective method for checking whether an itemset is a subset of another itemset is introduced to further reduce the time required for discovering CHUIs. To evaluate the performance of the proposed algorithm and its novel strategies, extensive experiments have been conducted on six benchmark datasets having various characteristics. Results show that the proposed strategies are highly efficient and effective, that the proposed CLS-Miner algorithmoutperforms the current state-ofthe- art CHUD and CHUI-Miner algorithms, and that CLSMiner scales linearly.  相似文献   

3.
Most approaches for discovering frequent itemsets derive association rules from a binary database. Profit, cost, and quantity are not considered in traditional association-rule mining. Utility mining was proposed to measure the utilities of purchase products to derive highutility itemsets (HUIs). Many algorithms have been proposed to efficiently find HUIs from a static database. In real-world applications, transactions are inserted, deleted, or modified in dynamic situations. Existing batch approaches have to re-process the updated database since previously discovered HUIs are not maintained. In this paper, a Fast UPdated (FUP) strategy with utility measure and a maintenance algorithm, called FUP-HUI-MOD, are developed to efficiently maintain and update discovered HUIs. When transactions are modified, the proposed algorithm partitions the transactions before and after the modification into two parts, creating four cases. Each case is maintained using a specific procedure to update the discovered HUIs. Based on the designed FUP-HUI-MOD algorithm, the original database is not required to be rescanned each time compared to the state-of-the-art high-utility itemset mining algorithms in batch mode. Experiments are conducted to show that the proposed algorithm outperforms batch algorithms in maintaining HUIs.  相似文献   

4.
含负项高效用项集(HUI)挖掘是新兴的数据挖掘任务之一。为了挖掘满足用户需求的含负项HUI结果集,提出了含负项top-k高效用项集(THN)挖掘算法。为了提升THN算法的时空性能,提出了自动提升最小效用阈值的策略,并采用模式增长方法进行深度优先搜索;使用重新定义的子树效用和重新定义的本地效用修剪搜索空间;使用事务合并技术和数据集投影技术解决多次扫描数据库的问题;为了提高效用计数的速度,使用效用数组计数技术计算项集的效用。实验结果表明,THN算法的内存消耗约为HUINIV-Mine算法的1/60,约为FHN算法的1/2;THN算法的执行时间是FHN算法的1/10;而且该算法在密集数据集上的性能更好。  相似文献   

5.
Utility of an itemset is considered as the value of this itemset, and utility mining aims at identifying the itemsets with high utilities. The temporal high utility itemsets are the itemsets whose support is larger than a pre-specified threshold in current time window of the data stream. Discovery of temporal high utility itemsets is an important process for mining interesting patterns like association rules from data streams. In this paper, we propose a novel method, namely THUI (Temporal High Utility Itemsets)-Mine, for mining temporal high utility itemsets from data streams efficiently and effectively. To the best of our knowledge, this is the first work on mining temporal high utility itemsets from data streams. The novel contribution of THUI-Mine is that it can effectively identify the temporal high utility itemsets by generating fewer candidate itemsets such that the execution time can be reduced substantially in mining all high utility itemsets in data streams. In this way, the process of discovering all temporal high utility itemsets under all time windows of data streams can be achieved effectively with less memory space and execution time. This meets the critical requirements on time and space efficiency for mining data streams. Through experimental evaluation, THUI-Mine is shown to significantly outperform other existing methods like Two-Phase algorithm under various experimental conditions.  相似文献   

6.
High utility itemset mining considers the importance of items such as profit and item quantities in transactions. Recently, mining high utility itemsets has emerged as one of the most significant research issues due to a huge range of real world applications such as retail market data analysis and stock market prediction. Although many relevant algorithms have been proposed in recent years, they incur the problem of generating a large number of candidate itemsets, which degrade mining performance. In this paper, we propose an algorithm named MU-Growth (Maximum Utility Growth) with two techniques for pruning candidates effectively in mining process. Moreover, we suggest a tree structure, named MIQ-Tree (Maximum Item Quantity Tree), which captures database information with a single-pass. The proposed data structure is restructured for reducing overestimated utilities. Performance evaluation shows that MU-Growth not only decreases the number of candidates but also outperforms state-of-the-art tree-based algorithms with overestimated methods in terms of runtime with a similar memory usage.  相似文献   

7.
Mining high utility itemsets by dynamically pruning the tree structure   总被引:2,自引:2,他引:0  
Mining high utility itemsets is one of the most important research issues in data mining owing to its ability to consider nonbinary frequency values of items in transactions and different profit values for each item. Mining such itemsets from a transaction database involves finding those itemsets with utility above a user-specified threshold. In this paper, we propose an efficient concurrent algorithm, called CHUI-Mine (Concurrent High Utility Itemsets Mine), for mining high utility itemsets by dynamically pruning the tree structure. A tree structure, called the CHUI-Tree, is introduced to capture the important utility information of the candidate itemsets. By recording changes in support counts of candidate high utility items during the tree construction process, we implement dynamic CHUI-Tree pruning, and discuss the rationality thereof. The CHUI-Mine algorithm makes use of a concurrent strategy, enabling the simultaneous construction of a CHUI-Tree and the discovery of high utility itemsets. Our algorithm reduces the problem of huge memory usage for tree construction and traversal in tree-based algorithms for mining high utility itemsets. Extensive experimental results show that the CHUI-Mine algorithm is both efficient and scalable.  相似文献   

8.
王敬华  罗相洲  吴倩 《计算机应用》2016,36(11):3062-3066
高效用项集挖掘在数据挖掘领域中受到了广泛的关注,但是高效用项集挖掘并没有考虑项集长度对效用值的影响,所以高平均效用项集挖掘被提出;而目前的一些高平均效用项集挖掘算法需要耗费大量的时间才能挖掘出有效的高平均效用项集。针对此问题,给出了一个高平均效用项集挖掘的改进算法——FHAUI。FHAUI算法将效用信息保存到效用列表中,通过效用列表的比较来挖掘出所有的高平均效用值,同时FHAUI算法还采用了一个二维矩阵来有效减少二项效用值的连接比较次数。最后将FHAUI算法在多个经典的数据集上测试。实验结果表明,FHAUI算法在效用列表的连接比较次数上有了极大的降低,同时其时间性能也有非常大提高。  相似文献   

9.
基于聚类划分的高效用模式并行挖掘算法   总被引:4,自引:0,他引:4  
针对在大规模数据库中挖掘高效用模式产生大量基于内存的效用模式树,从而导致内存空间占用较大以及丢失一些高效用项集的问题,提出在Hadoop分布式计算平台下的基于聚类划分的高效用模式并行挖掘算法PUCP。首先,采用聚类的方法把数据库中相似的事务划分为若干数据子集;然后,把若干划分好的数据子集分配到Hadoop平台的各个节点中构造效用模式树;最后,把各个节点中相同项的条件模式基分配到同一个节点中进行挖掘,以减少各个节点交叉操作的次数。通过实验结果和理论分析表明:PUCP算法在不影响挖掘结果可靠性的前提下,与主流串行高效用模式挖掘——效用模式增长挖掘算法(UP-Growth)和现有的并行高效用模式挖掘算法PHUI-Growth相比,挖掘效率分别提高了61.2%和16.6%;并且使用了Hadoop计算平台,能有效缓解挖掘大规模数据的内存压力。  相似文献   

10.
一种新的最大频繁项目集挖掘算法   总被引:5,自引:0,他引:5  
马丽生  邓辉文  齐逸 《计算机应用》2006,26(11):2670-2673
最大频繁项目集挖掘是数据挖掘领域最重要的基本问题之一,在分析已有算法的基础上,提出了一种新的挖掘最大频繁项目集的算法,实验表明该算法在性能上优于已有的同类算法。  相似文献   

11.

High-utility itemset mining is a prominent data-mining technique where the profit or weight of itemsets plays a crucial role in defining meaningful patterns. High average-utility itemset (HAUI) mining is an advancement over high-utility itemset mining, which introduces an unbiased measure called average utility to associate the utility of itemsets with their length. Several existing HAUI mining algorithms use various upper bounds such as average-utility upper bound, revised tighter upper bound, and looser upper bound to preserve pruning methods. However, these upper bounds overestimate the average-utility of itemsets and slow down the mining process. This paper presents a fast high average-utility itemset miner (FHAIM) algorithm, which uses two improved upper bounds and several efficient pruning strategies to avoid the processing of unpromising candidate itemsets. Moreover, a novel list structure named recommended average-utility list (RAUL) is presented to store the average-utility and the required information for pruning. The RAUL for an itemset can be constructed by joining the RAULs of its subsets to avoid excessive database scans. We have performed substantial experiments on various benchmark datasets to evaluate the performance of the FHAIM in comparison with two existing HAUI mining algorithms. Experimental results show that FHAIM outperforms the existing HAUI mining algorithms in terms of runtime, memory usage, join counts, and scalability.

  相似文献   

12.
Frequent-itemset mining only considers the frequency of occurrence of the items but does not reflect any other factors, such as price or profit. Utility mining is an extension of frequent-itemset mining, considering cost, profit or other measures from user preference. Traditionally, the utility of an itemset is the summation of the utilities of the itemset in all the transactions regardless of its length. The average utility measure is thus adopted in this paper to reveal a better utility effect of combining several items than the original utility measure. It is defined as the total utility of an itemset divided by its number of items within it. The average-utility itemsets, as well as the original utility itemsets, does not have the “downward-closure” property. A mining algorithm is then proposed to efficiently find the high average-utility itemsets. It uses the summation of the maximal utility among the items in each transaction with the target itemset as the upper bound to overestimate the actual average utilities of the itemset and processes it in two phases. As expected, the mined high average-utility itemsets in the proposed way will be fewer than the high utility itemsets under the same threshold. The proposed approach can thus be executed under a larger threshold than the original, thus with a more significant and relevant criterion. Experimental results also show the performance of the proposed algorithm.  相似文献   

13.
Many fuzzy data mining approaches have been proposed for finding fuzzy association rules with the predefined minimum support from quantitative transaction databases. Since each item has its own utility, utility itemset mining has become increasingly important. However, common problems with existing approaches are that an appropriate minimum support is difficult to determine and that the derived rules usually expose common-sense knowledge, which may not be interesting from a business point of view. This study thus proposes an algorithm for mining high-coherent-utility fuzzy itemsets to overcome problems with the properties of propositional logic. Quantitative transactions are first transformed into fuzzy sets. Then, the utility of each fuzzy itemset is calculated according to the given external utility table. If the value is larger than or equal to the minimum utility ratio, the itemset is considered as a high-utility fuzzy itemset. Finally, contingency tables are calculated and used for checking whether a high-utility fuzzy itemset satisfies four criteria. If so, it is a high-coherent-utility fuzzy itemset. Experiments on the foodmart and simulated datasets are made to show that the derived itemsets by the proposed algorithm not only can reach better profit than selling them separately, but also can provide fewer but more useful utility itemsets for decision-makers.  相似文献   

14.
High-utility itemsets mining (HUIM) is a critical issue which concerns not only the occurrence frequencies of itemsets in association-rule mining (ARM), but also the factors of quantity and profit in real-life applications. Many algorithms have been developed to efficiently mine high-utility itemsets (HUIs) from a static database. Discovered HUIs may become invalid or new HUIs may arise when transactions are inserted, deleted or modified. Existing approaches are required to re-process the updated database and re-mine HUIs each time, as previously discovered HUIs are not maintained. Previously, a pre-large concept was proposed to efficiently maintain and update the discovered information in ARM, which cannot be directly applied into HUIM. In this paper, a maintenance (PRE-HUI-MOD) algorithm with transaction modification based on a new pre-large strategy is presented to efficiently maintain and update the discovered HUIs. When the transactions are consequentially modified from the original database, the discovered information is divided into three parts with nine cases. A specific procedure is then performed to maintain and update the discovered information for each case. Based on the designed PRE-HUI-MOD algorithm, it is unnecessary to rescan original database until the accumulative total utility of the modified transactions achieves the designed safety bound, which can greatly reduce the computations of multiple database scans when compared to the batch-mode approaches.  相似文献   

15.
发现最大项目频集是数据挖掘应用中的关键问题。本文提出了一个基于反向矩阵的最大频集的交互式挖掘算法。该算法将事务数据库转换成反向矩阵,缩小了候选子集,利于交互式挖掘。通过对每个频繁项独立建立COFI-树,减少了挖掘中对内存容量的依赖。  相似文献   

16.
MAFIA: a maximal frequent itemset algorithm   总被引:4,自引:0,他引:4  
We present a new algorithm for mining maximal frequent itemsets from a transactional database. The search strategy of the algorithm integrates a depth-first traversal of the itemset lattice with effective pruning mechanisms that significantly improve mining performance. Our implementation for support counting combines a vertical bitmap representation of the data with an efficient bitmap compression scheme. In a thorough experimental analysis, we isolate the effects of individual components of MAFIA including search space pruning techniques and adaptive compression. We also compare our performance with previous work by running tests on very different types of data sets. Our experiments show that MAFIA performs best when mining long itemsets and outperforms other algorithms on dense data by a factor of three to 30.  相似文献   

17.
Mining high-utility itemsets (HUIs) from a transaction database refers to the discovery of itemsets with high utilities like profits. Most of existing studies discover HUIs from a transaction database in two phases. In phase 1, different overestimation methods are applied to calculate the upper bounds of the utilities of itemsets. Since the overestimated utilities of itemsets are adopted, the itemsets whose overestimated utilities are no less than a user-specified threshold are selected as candidate HUIs, and they are verified by scanning the database one more time in phase 2. However, a large number of candidate HUIs incur two problems: 1) it requires excessive memory to store these candidates; 2) it needs a large amount of running time to calculate their exact utilities. Vertical data format has been applied to mine HUIs recently. However this kind of method cannot deal with transactions with the same items effectively so that the size of database cannot be reduced sufficiently. The overall performance of algorithms is degraded consequently. Thus an algorithm HUITWU is proposed in this paper for mining HUIs. A novel data structure HUITWU-Tree is adopted to efficiently calculate the utilities of itemsets in a database. Extensive studies with both sparse and dense datasets have demonstrated that our proposed algorithm is more than an order of magnitude faster and consumes less memory than the state-of-the-art algorithms.  相似文献   

18.
李慧  刘贵全  瞿春燕 《计算机科学》2015,42(5):82-87, 123
对从事务数据库中挖掘有意义的项集的研究已超过10年.然而,大多数的研究要么使用频繁度或支持度(如频繁项集挖掘),要么使用效用值或利润(如高效用项集挖掘)作为主要的衡量标准.单独使用这两种衡量方式都有各自的局限性,比如频繁度很高的项集其效用值有可能很低,而效用值很高的项集其频繁度往往很低,将这些项集推荐给用户没有意义.将这两种衡量标准综合考虑,希望找出那些频繁度和效用值都很高的项集.该项工作最大的挑战是效用值既不满足单调性也不满足反单调性.因此,提出了高效算法FHIMA.FHIMA采用PrefixSpan的思想,挖掘时能避免产生非频繁的候选项集.此外,还根据效用和质量上界的一些性质,有效地缩小了搜索空间,极大地提高了FHIMA算法的效率.  相似文献   

19.
Fast and memory efficient mining of frequent closed itemsets   总被引:12,自引:0,他引:12  
This paper presents a new scalable algorithm for discovering closed frequent itemsets, a lossless and condensed representation of all the frequent itemsets that can be mined from a transactional database. Our algorithm exploits a divide-and-conquer approach and a bitwise vertical representation of the database and adopts a particular visit and partitioning strategy of the search space based on an original theoretical framework, which formalizes the problem of closed itemsets mining in detail. The algorithm adopts several optimizations aimed to save both space and time in computing itemset closures and their supports. In particular, since one of the main problems in this type of algorithms is the multiple generation of the same closed itemset, we propose a new effective and memory-efficient pruning technique, which, unlike other previous proposals, does not require the whole set of closed patterns mined so far to be kept in the main memory. This technique also permits each visited partition of the search space to be mined independently in any order and, thus, also in parallel. The tests conducted on many publicly available data sets show that our algorithm is scalable and outperforms other state-of-the-art algorithms like CLOSET+ and FP-CLOSE, in some cases by more than one order of magnitude. More importantly, the performance improvements become more and more significant as the support threshold is decreased.  相似文献   

20.
为了提高带负项值的onshelf效用项集挖掘算法的挖掘效率,提出带负项值的onshelf效用项集并行挖掘算法DTPHoun,算法基于MapReduce框架,充分利用其onshelf时间段因素,将原始事务数据库按照时间段进行分片。算法将挖掘过程转化为MapReduce工作,Map阶段在分片数据库中挖掘候选项集,Reduce阶段并行计算候选项集的onshelf效用值。实验结果表明,算法取得了较高的挖掘效率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号