首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Traditional association-rule mining only concerns the occurrence frequencies of the items in a binary database. In real-world applications, customers may buy several copies of the purchased items. Other factors such as profit, quantity, or price should be concerned to measure the utilities of the purchased items. High-utility itemsets mining was thus proposed to consider the factors of quantity and profit. Two-phase model was the most commonly way to keep the transaction-weighted utilization downward closure property, thus reducing the numerous candidates in utility mining. Most methods for finding high-utility itemsets are used to handle a static database. In practical applications, transactions are changed whether insertion, deletion, or modification. Some itemsets may arise as the new high-utility itemsets or become invalid knowledge in the updated database. In this paper, a maintenance Fast Updated High Utility Pattern tree for transaction MODification (FUP-HUP-tree-MOD) algorithm is thus proposed to effective maintain and update the built HUP tree for mining high-utility itemsets in dynamic databases without candidate generation. Experiments are conducted to show better performance of the proposed algorithm compared to the two-phase algorithm and the HUP tree algorithm in batch mode.  相似文献   

2.
Incrementally mining high utility patterns based on pre-large concept   总被引:1,自引:1,他引:0  
In traditional association rule mining, most algorithms are designed to discover frequent itemsets from a binary database. Utility mining was thus proposed to measure the utility values of purchased items for revealing high utility itemsets from a quantitative database. In the past, a two-phase high utility mining algorithm was thus proposed for efficiently discovering high utility itemsets from a quantitative database. In dynamic data mining, transactions may be inserted, deleted, or modified from a database. In this case, a batch mining procedure must rescan the whole updated database to maintain the up-to-date information. Designing an efficient approach for handling dynamic databases is thus a critical research issue in utility mining. In this paper, an incremental mining algorithm is proposed for efficiently maintaining discovered high utility itemsets based on pre-large concepts. Itemsets are first partitioned into three parts according to whether they have large (high), pre-large, or small transaction-weighted utilization in the original database and in inserted transactions. Individual procedures are then executed for each part. Experimental results show that the proposed incremental high utility mining algorithm outperforms existing algorithms.  相似文献   

3.
High-utility itemsets mining (HUIM) is a critical issue which concerns not only the occurrence frequencies of itemsets in association-rule mining (ARM), but also the factors of quantity and profit in real-life applications. Many algorithms have been developed to efficiently mine high-utility itemsets (HUIs) from a static database. Discovered HUIs may become invalid or new HUIs may arise when transactions are inserted, deleted or modified. Existing approaches are required to re-process the updated database and re-mine HUIs each time, as previously discovered HUIs are not maintained. Previously, a pre-large concept was proposed to efficiently maintain and update the discovered information in ARM, which cannot be directly applied into HUIM. In this paper, a maintenance (PRE-HUI-MOD) algorithm with transaction modification based on a new pre-large strategy is presented to efficiently maintain and update the discovered HUIs. When the transactions are consequentially modified from the original database, the discovered information is divided into three parts with nine cases. A specific procedure is then performed to maintain and update the discovered information for each case. Based on the designed PRE-HUI-MOD algorithm, it is unnecessary to rescan original database until the accumulative total utility of the modified transactions achieves the designed safety bound, which can greatly reduce the computations of multiple database scans when compared to the batch-mode approaches.  相似文献   

4.
Association-rule mining, which is based on frequency values of items, is the most common topic in data mining. In real-world applications, customers may, however, buy many copies of products and each product may have different factors, such as profits and prices. Only mining frequent itemsets in binary databases is thus not suitable for some applications. Utility mining is thus presented to consider additional measures, such as profits or costs according to user preference. In the past, a two-phase mining algorithm was designed for fast discovering high utility itemsets from databases. When data come intermittently, the approach needs to process all the transactions in a batch way. In this paper, an incremental mining algorithm for efficiently mining high utility itemsets is proposed to handle the above situation. It is based on the concept of the fast-update (FUP) approach, which was originally designed for association mining. The proposed approach first partitions itemsets into four parts according to whether they are high transaction-weighted utilization itemsets in the original database and in the newly inserted transactions. Each part is then executed by its own procedure. Experimental results also show that the proposed algorithm executes faster than the two-phase batch mining algorithm in the intermittent data environment  相似文献   

5.
基于DDMINER分布式数据库系统中频繁项目集的更新   总被引:13,自引:0,他引:13  
吉根林  杨明  赵斌  孙志挥 《计算机学报》2003,26(10):1387-1392
给出了一种分布式数据挖掘系统的体系结构DDMINER,对分布式数据库系统中频繁项目集的更新问题进行探讨,既考虑了数据库中事务增加的情况,又考虑了事务删除的情况;提出了一种基于DDMINER的局部频繁项目集的更新算法ULF和全局频繁项目集的更新算法UGF.该算法能够产生较少数量的候选频繁项目集,在求解全局频繁项目集过程中,传送候选局部频繁项目集支持数的通信量为O(n);将文章提出的算法用Java语言加以实现,并对算法性能进行了研究;实验结果表明这些算法是正确、可行的,并且具有较高的效率.  相似文献   

6.
高效的关联规则快速更新算法   总被引:2,自引:0,他引:2       下载免费PDF全文
挖掘关联规则的两大经典算法Apriori和FP-tree算法都是以批处理方式处理所有事务。但在实际应用中,新事务频繁地出现,这就需要不断更新关联规则。为了提高更新效率,有效减少扫描原数据库的次数,基于次频繁项的概念,在快速更新频繁模式树(FUFP-tree)算法的基础上,提出了一种改进的算法。实验结果表明新算法具有良好的性能。  相似文献   

7.
概化关联规则挖掘作为数据挖掘领域一个重要的拓展性研究课题,首先提出了一种概化扩展自然序树(generalized extended canonical-order tree,GECT)结构及其增量挖掘算法GECT-IM.该算法对原始分类事务数据库只扫描一次,就可以将所有交易信息映射至一棵压缩格式的GECT,然后通过对更新交易数据集扫描得到更新数据集中各项集的计数,结合相关性质及运算就可以发现大部分更新后的概化频繁项集;其次,针对GECT规模较大以及GECT-IM 算法仍然可能需要遍历初始GECT树的局限,在界定数据库更新和重构概念的基础上,基于一种可量化度量的准最小支持度阈值,提出了一种改进的准频繁概化扩展自然序树(pre-large generalized extended canonical-order tree,PGECT)结构及其增量挖掘算法PGECT-IM.由于有效避免了对初始GECT进行遍历的情形,从而进一步提升了概化关联规则增量挖掘效率.实验证明,提出的概化关联规则增量挖掘算法 GECT-IM 及其优化算法PGECT-IM,比现有增量挖掘算法具有更高的挖掘效率和更好的扩展性.  相似文献   

8.
Most approaches for discovering frequent itemsets derive association rules from a binary database. Profit, cost, and quantity are not considered in traditional association-rule mining. Utility mining was proposed to measure the utilities of purchase products to derive highutility itemsets (HUIs). Many algorithms have been proposed to efficiently find HUIs from a static database. In real-world applications, transactions are inserted, deleted, or modified in dynamic situations. Existing batch approaches have to re-process the updated database since previously discovered HUIs are not maintained. In this paper, a Fast UPdated (FUP) strategy with utility measure and a maintenance algorithm, called FUP-HUI-MOD, are developed to efficiently maintain and update discovered HUIs. When transactions are modified, the proposed algorithm partitions the transactions before and after the modification into two parts, creating four cases. Each case is maintained using a specific procedure to update the discovered HUIs. Based on the designed FUP-HUI-MOD algorithm, the original database is not required to be rescanned each time compared to the state-of-the-art high-utility itemset mining algorithms in batch mode. Experiments are conducted to show that the proposed algorithm outperforms batch algorithms in maintaining HUIs.  相似文献   

9.
The purpose of mining frequent itemsets is to identify the items in groups that always appear together and exceed the user-specified threshold of a transaction database. However, numerous frequent itemsets may exist in a transaction database, hindering decision making. Recently, the mining of frequent closed itemsets has become a major research issue because sets of frequent closed itemsets are condensed yet complete representations of frequent itemsets. Therefore, all frequent itemsets can be derived from a group of frequent closed itemsets. Nonetheless, the number of transactions in a transaction database can increase rapidly in a short time period, and a number of the transactions may be outdated. Thus, frequent closed itemsets may be changed with the addition of new transactions or the deletion of old transactions from the transaction database. Updating previously closed itemsets when transactions are added or removed from the transaction database is challenging. This study proposes an efficient algorithm for incrementally mining frequent closed itemsets without scanning the original database. The proposed algorithm updates closed itemsets by performing several operations on the previously closed itemsets and added/deleted transactions without searching the previously closed itemsets. The experimental results show that the proposed algorithm significantly outperforms previous methods, which require a substantial length of time to search previously closed itemsets.  相似文献   

10.
The frequent pattern tree (FP-tree) is an efficient data structure for association-rule mining without generation of candidate itemsets. It was used to compress a database into a tree structure which stored only large items. It, however, needed to process all transactions in a batch way. In real-world applications, new transactions are usually incrementally inserted into databases. In the past, we proposed a Fast Updated FP-tree (FUFP-tree) structure to efficiently handle new transactions and to make the tree update process become easier. In this paper, we attempt to modify the FUFP-tree construction based on the concept of pre-large itemsets. Pre-large itemsets are defined by a lower support threshold and an upper support threshold. It does not need to rescan the original database until a number of new transactions have been inserted. The proposed approach can thus achieve a good execution time for tree construction especially when each time a small number of transactions are inserted. Experimental results also show that the proposed Pre-FUFP maintenance algorithm has a good performance for incrementally handling new transactions.  相似文献   

11.
Mining useful information and helpful knowledge from large databases has evolved into an important research area in recent years. Among the classes of knowledge derived, finding sequential patterns in temporal transaction databases is very important since it can help model customer behavior. In the past, researchers usually assumed databases were static to simplify data-mining problems. In real-world applications, new transactions may be added into databases frequently. Designing an efficient and effective mining algorithm that can maintain sequential patterns as a database grows is thus important. In this paper, we propose a novel incremental mining algorithm for maintaining sequential patterns based on the concept of pre-large sequences to reduce the need for rescanning original databases. Pre-large sequences are defined by a lower support threshold and an upper support threshold that act as gaps to avoid the movements of sequences directly from large to small and vice versa. The proposed algorithm does not require rescanning original databases until the accumulative amount of newly added customer sequences exceeds a safety bound, which depends on database size. Thus, as databases grow larger, the numbers of new transactions allowed before database rescanning is required also grow. The proposed approach thus becomes increasingly efficient as databases grow.  相似文献   

12.
Modification of records in databases is common in real-world applications. Developing an efficient and effective mining algorithm to maintain discovered information as the records in a database are updated is thus quite important in the field of data mining. Although association rules for modification of records can be maintained by using deletion and insertion procedures, this requires twice the computation time needed for a single procedure. In this paper, we present a new modification algorithm to resolve this issue. The concept of pre-large itemsets is used to reduce the need for rescanning original databases and to save maintenance costs. The proposed algorithm does not require rescanning of original databases until a specified number of records have been modified. If the database is large, then the number of modified records allowed will also be large. This characteristic is especially useful for real-world applications.  相似文献   

13.
Association rule mining is an important topic in data mining. The problem is to discover all (or almost all) associations among items in the transaction database that satisfy some user-specified constraints. Usually, the constraints are related to minimal support and minimal confidence. Class association rules (CARs) are a special type of association rules that can be applied for classification problem. Previous research showed that classification based on association rules has higher accuracy than can be achieved with an inductive learning algorithm or C4.5. As such, many methods have been proposed for mining CARs, although these use batch processing. However, datasets are often changed, with records added or/and deleted, and consequently updating CARs is a challenging problem. This paper proposes an efficient method for updating CARs when records are deleted. First, we use an MECR-tree to store nodes for the original dataset. The information in the nodes of this tree are updated based on the deleted records. Second, the concept of pre-large itemsets is used to avoid rescanning the original dataset. Finally, we propose an algorithm to efficiently update and generate CARs. We also analyze the time complexity to show the efficiency of our proposed algorithm. The experimental results show that the proposed method outperforms mining CARs from the dataset after record deletion.  相似文献   

14.
基于FP_tree的频繁项目集增量式更新算法   总被引:1,自引:0,他引:1       下载免费PDF全文
赵岩  姚勇  刘志镜 《计算机工程》2008,34(11):63-65
对频繁项目集的更新问题进行研究,提出一种基于频繁模式树的频繁项目集增量式更新算法。充分利用已有挖掘结果,有效解决最小支持度和事务数据库同时发生变化时相应频繁项目集的更新问题。在事务数据库变化同时包括增加和减少的情况下,对算法性能进行分析与测试,结果证明该算法高效可行。  相似文献   

15.
The frequent pattern tree (FP-tree. is an efficient data structure for association-rule mining without generation of candidate itemsets. It was used to compress a database into a tree structure which stored only large items. It, however, needed to process all transactions in a batch way. In the past, we proposed a Fast Updated FP-tree (FUFP-tree. structure to efficiently handle new transactions and to make the tree update process become easier. In this paper, we propose the structure of prelarge trees to incrementally mine association rules based on the concept of pre-large itemsets. Due to the properties of pre-large concepts, the proposed approach does not need to rescan the original database until a number of new transactions have been inserted. The proposed approach can thus achieve a good execution time for tree construction especially when a small number of transactions are inserted each time. Experimental results also show that the proposed approach has a good performance for incrementally handling new transactions.  相似文献   

16.
Mining high-utility itemsets (HUIs) from a transaction database refers to the discovery of itemsets with high utilities like profits. Most of existing studies discover HUIs from a transaction database in two phases. In phase 1, different overestimation methods are applied to calculate the upper bounds of the utilities of itemsets. Since the overestimated utilities of itemsets are adopted, the itemsets whose overestimated utilities are no less than a user-specified threshold are selected as candidate HUIs, and they are verified by scanning the database one more time in phase 2. However, a large number of candidate HUIs incur two problems: 1) it requires excessive memory to store these candidates; 2) it needs a large amount of running time to calculate their exact utilities. Vertical data format has been applied to mine HUIs recently. However this kind of method cannot deal with transactions with the same items effectively so that the size of database cannot be reduced sufficiently. The overall performance of algorithms is degraded consequently. Thus an algorithm HUITWU is proposed in this paper for mining HUIs. A novel data structure HUITWU-Tree is adopted to efficiently calculate the utilities of itemsets in a database. Extensive studies with both sparse and dense datasets have demonstrated that our proposed algorithm is more than an order of magnitude faster and consumes less memory than the state-of-the-art algorithms.  相似文献   

17.
High-Utility Itemset Mining (HUIM) is considered a major issue in recent decades since it reveals profit strategies for use in industry for decision-making. Most existing works have focused on mining high-utility itemsets from databases showing large amount of patterns; however exact decisions are still challenging to make from that large amounts of discovered knowledge. Closed High-utility itemset mining (CHUIM) provides a smart way to present concise high-utility itemsets that can be more effective for making correct decisions. However, none of the existing works have focused on handling large-scale databases to integrate discovered knowledge from several distributed databases. In this paper, we first present a large-scale information fusion architecture to integrate discovered closed high-utility patterns from several distributed databases. The generic composite model is used to cluster transactions regarding their relevant correlation that can ensure correctness and completeness of the fusion model. The well-known MapReduce framework is then deployed in the developed DFM-Miner algorithm to handle big datasets for information fusion and integration. Experiments are then compared to the state-of-the-art CHUI-Miner and CLS-Miner algorithms for mining closed high-utility patterns and the results indicated that the designed model is well designed for handling large-scale databases with less memory usage. Moreover, the designed MapReduce framework can speed up the mining performance of closed high-utility patterns in the developed fusion system.  相似文献   

18.
The mining frequent itemsets plays an important role in the mining of association rules. Frequent itemsets are typically mined from binary databases where each item in a transaction may have a different significance. Mining Frequent Weighted Itemsets (FWI) from weighted items transaction databases addresses this issue. This paper therefore proposes algorithms for the fast mining of FWI from weighted item transaction databases. Firstly, an algorithm for directly mining FWI using WIT-trees is presented. After that, some theorems are developed concerning the fast mining of FWI. Based on these theorems, an advanced algorithm for mining FWI is proposed. Finally, a Diffset strategy for the efficient computation of the weighted support for itemsets is described, and an algorithm for mining FWI using Diffsets presented. A complete evaluation of the proposed algorithms is also presented.  相似文献   

19.
高效用序列模式挖掘是数据挖掘领域的一项重要内容, 在生物信息学、消费行为分析等方面具有重要的应用.与传统基于频繁项模式挖掘方法不同, 高效用序列模式挖掘不仅考虑项集的内外效用, 更突出项集的时间序列含义, 计算复杂度较高.尽管已经有一定数量的算法被提出应用于解决该类问题, 挖掘算法的时空效率依然成为该领域的主要研究热点问题.鉴于此, 本文提出一个基于模式增长的高效用序列模式挖掘算法HUSP-FP.依据高效用序列项集必须满足事务效用闭包属性要求, 算法首先在去除无用项后建立全局树, 进而采用模式增长方法从全局树上获取全部高效用序列模式, 避免产生候选项集. 在实验环节与目前效率较好的HUSP-Miner、USPAN、HUS-Span三类算法进行了时空计算对比, 实验结果表明本文给出算法在较小阈值下仍能有效挖掘到相关序列模式, 并且在计算时间和空间使用效率两方面取得了较大的提高.  相似文献   

20.
讨论分布式数据库系统中最小支持度变化时频繁项目集如何高效更新问题,提出了一种基于最小支持度变化的局部频繁项目集的更新算法ULFS和全局频繁项目集的更新算法UGFS.该算法能够充分利用已挖掘的结果.并且产生较少数量的候选频繁项目集,在求解全局频繁项目集过程中.候选局部频繁项目集支持数的通信量为O(n).将文章提出的算法用Java加以实现.并时算法性能进行了研究.实验结果表明这些算法是可行、有效的.并且具有较快的速度.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号