首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Incrementally fast updated frequent pattern trees   总被引:3,自引:0,他引:3  
The frequent-pattern-tree (FP-tree) is an efficient data structure for association-rule mining without generation of candidate itemsets. It was used to compress a database into a tree structure which stored only large items. It, however, needed to process all transactions in a batch way. In real-world applications, new transactions are usually inserted into databases. In this paper, we thus attempt to modify the FP-tree construction algorithm for efficiently handling new transactions. A fast updated FP-tree (FUFP-tree) structure is proposed, which makes the tree update process become easier. An incremental FUFP-tree maintenance algorithm is also proposed for reducing the execution time in reconstructing the tree when new transactions are inserted. Experimental results also show that the proposed FUFP-tree maintenance algorithm runs faster than the batch FP-tree construction algorithm for handling new transactions and generates nearly the same tree structure as the FP-tree algorithm. The proposed approach can thus achieve a good trade-off between execution time and tree complexity.  相似文献   

2.
基于频繁模式树的关联规则增量式更新算法   总被引:48,自引:1,他引:48  
研究了大型事务数据库中关联规则的增量式更新总是,提出了一种基于频繁模式树的关联规则增量式更新算法,以处理最小支持度或事务数据库发生变化后相应关联规则的更新问题,并对其性能进行了分析。  相似文献   

3.
董林  舒红 《计算机应用》2013,33(11):3049-3051
为了得到有趣且有效的空间关联规则通常需要多次执行挖掘操作,可以使用增量维护算法来提高挖掘效率。然而,能够直接使用空间数据的关联规则增量更新算法尚属空白。为解决这一问题,对挖掘阈值改变和空间数据集更新后通过筛选或增量挖掘等方法实现规则维护的策略进行了分析,并提出适用于支持度阈值减小和空间图层增加这两类情况的增量挖掘算法——ISA。ISA算法不依赖于空间事务表的构建与更新,可以直接使用空间图层作为输入数据。在基于实际数据的实验中,采用ISA算法所得结果与类Apriori算法一致,耗时则相对缩短20.0%至71.0%;此外,对1372772条规则进行了基于筛选的更新,耗时低于0.1s。实验结果表明,所提出的空间关联规则增量维护策略和算法是可行、正确且高效的。  相似文献   

4.
数据采集手段的丰富,使获取、保存大量数据变得容易,从庞杂的数据中提取有用的知识和信息是数据挖掘的主要任务,关联规则是数据挖掘领域的一个重要分支。本文针对事务数据库中增加新的数据集后相应关联规则的更新和维护问题,提出了一种关联规则增量式增量算法  相似文献   

5.
曾小宁  肖水晶 《计算机应用》2007,27(6):1403-1406
引入扩展差别矩阵和扩展决策矩阵,提出了新的属性约简算法和增量更新算法,即基于扩展差别矩阵的属性约简算法和基于扩展决策矩阵的增量式规则提取算法,讨论了规则的增量更新算法。由于使用了增量更新算法和并行处理技术,从而提高了数据挖掘的效率,降低了时间复杂度。通过实验说明此算法是有效和可行的。  相似文献   

6.
数据采集手段的丰富,使获取、保存大量数据变得容易,从庞杂的数据中提取有用的知识和信息是数据挖掘的主要任务,关联规则是数据挖掘领域的一个重要分支。本文针对事务数据库中增加新的数据集后相应关联规则的更新和维护问题,提出了一种关联规则增量式增量算法  相似文献   

7.
A concept lattice is an ordered structure between concepts. It is particularly effective in mining association rules. However, a concept lattice is not efficient for large databases because the lattice size increases with the number of transactions. Finding an efficient strategy for dynamically updating the lattice is an important issue for real-world applications, where new transactions are constantly inserted into databases. To build an efficient storage structure for mining association rules, this study proposes a method for building the initial frequent closed itemset lattice from the original database. The lattice is updated when new transactions are inserted. The number of database rescans over the entire database is reduced in the maintenance process. The proposed algorithm is compared with building a lattice in batch mode to demonstrate the effectiveness of the proposed algorithm.  相似文献   

8.
A linear model tree is a decision tree with a linear functional model in each leaf. Previous model tree induction algorithms have been batch techniques that operate on the entire training set. However there are many situations when an incremental learner is advantageous. In this article a new batch model tree learner is described with two alternative splitting rules and a stopping rule. An incremental algorithm is then developed that has many similarities with the batch version but is able to process examples one at a time. An online pruning rule is also developed. The incremental training time for an example is shown to only depend on the height of the tree induced so far, and not on the number of previous examples. The algorithms are evaluated empirically on a number of standard datasets, a simple test function and three dynamic domains ranging from a simple pendulum to a complex 13 dimensional flight simulator. The new batch algorithm is compared with the most recent batch model tree algorithms and is seen to perform favourably overall. The new incremental model tree learner compares well with an alternative online function approximator. In addition it can sometimes perform almost as well as the batch model tree algorithms, highlighting the effectiveness of the incremental implementation. Editor: Johannes Fürnkranz  相似文献   

9.
Incrementally mining high utility patterns based on pre-large concept   总被引:1,自引:1,他引:0  
In traditional association rule mining, most algorithms are designed to discover frequent itemsets from a binary database. Utility mining was thus proposed to measure the utility values of purchased items for revealing high utility itemsets from a quantitative database. In the past, a two-phase high utility mining algorithm was thus proposed for efficiently discovering high utility itemsets from a quantitative database. In dynamic data mining, transactions may be inserted, deleted, or modified from a database. In this case, a batch mining procedure must rescan the whole updated database to maintain the up-to-date information. Designing an efficient approach for handling dynamic databases is thus a critical research issue in utility mining. In this paper, an incremental mining algorithm is proposed for efficiently maintaining discovered high utility itemsets based on pre-large concepts. Itemsets are first partitioned into three parts according to whether they have large (high), pre-large, or small transaction-weighted utilization in the original database and in inserted transactions. Individual procedures are then executed for each part. Experimental results show that the proposed incremental high utility mining algorithm outperforms existing algorithms.  相似文献   

10.
In the present scenario of global economy and World Wide Web, large sets of evolving and distributed data can be handled efficiently by incremental data mining. Frequent patterns are very important in knowledge discovery and data mining process, such as mining of association rules, correlations. FP-tree is a very versatile data structure used for mining of frequent patterns in knowledge discovery and data mining process. FP-tree is a compact representation of transaction database that contains frequency information of all relevant frequent patterns (FP) of the database. All of the existing incremental frequent pattern mining algorithms, such as AFPIM, CATS, CanTree, CP-tree, and SPO-tree, perform incremental mining by processing one transaction of the incremental part of database at a time and updating it to the FP-tree of initial (original) database. Here, in this paper, we propose a novel method that takes advantage of FP-tree representation of incremental transaction database for incremental mining. We propose a batch incremental processing algorithm BIT_FPGrowth that restructures and merges two small consecutive duration FP-trees to obtain a FP-tree of the FP-Growth algorithm. Our BIT_FPGrowth uses FP-tree as preprocessed data repository to get transactions (i.e., item-sets), unlike other sequential incremental algorithms that read transactions from database. BIT_FPGrowth algorithm takes less time for constructing FP-tree. Our experimental results show that, as the size of the database increases, increase in runtime of BIT_FPGrowth is much less and is least of all the other algorithms.  相似文献   

11.
增量更新关联规则挖掘主要解决事务数据库中交易记录不断更新和最小支持度发生变化时关联规则的维护问题。针对目前诸多增量更新关联规则挖掘算法存在效率低、计算成本高、规则难以维护等问题,提出一种基于倒排索引树的增量更新关联挖掘算法。该算法有效地将倒排索引技术与树型结构相结合,使得交易数据库中的数据不断更新和最小支持度随应用环境不同而不断改变时,以实现无需扫描原始交易数据库和不产生候选项集的情况下生成频繁项集。实验结果表明,该算法只需占用较小的存储空间、且检索项集的效率较高,能高效地解决增量更新关联规则难以维护的问题。  相似文献   

12.
The Frequent-Pattern-tree (FP tree) is an efficient data structure for association-rule mining without generation of candidate itemsets. It was used to represent a database into a tree structure which stored only frequent items. It, however, needed to process all transactions in a batch way. In the past, Hong et al. thus proposed an efficient incremental mining algorithm for handling newly inserted transactions. In addition to record insertion, record deletion from databases is also commonly seen in real-applications. In this paper, we thus attempt to modify the FP-tree construction algorithm for efficiently handling deletion of records. A fast updated FP-tree (FUFP-tree) structure is used, which makes the tree update process become easier. An FUFP-tree maintenance algorithm for the deletion of records is also proposed for reducing the execution time in reconstructing the tree when records are deleted. Experimental results also show that the proposed FUFP-tree maintenance algorithm for deletion of records runs faster than the batch FP-tree construction algorithm for handling deleted records and generates nearly the same tree structure as the FP-tree algorithm. The proposed approach can thus achieve a good trade-off between execution time and tree complexity.  相似文献   

13.
核属性蚁群算法的规则获取   总被引:1,自引:0,他引:1  
蚁群算法是一种新型的模拟进化算法,研究已经表明该算法具有许多优良的性质,并且在优化计算中已得到了很多应用.粗糙集理论作为一种智能数据分析和数据挖掘的新的数学工具,其主要优点在于它不需要任何关于被处理数据的先验或额外知识.本文从规则获取和优化两方面研究基于粗糙集理论和蚁群算法的分类规则挖掘方法.通过研究决策表和决策规则系数,建立基于粗糙集表示和度量的知识理论,将粗糙集理论与蚁群算法融合,采用粗糙集理论进行属性约简,利用蚁群算法获取最优分类规则,优势互补.实验结果比较表明,算法获取的分类规则,具有良好的预测能力和更为简洁的表示形式.  相似文献   

14.
一种有效的关联规则增量式更新算法   总被引:8,自引:2,他引:6  
关联规则是数据挖掘中的一个重要研究内容。目前已经提出了许多用于高效地发现大规模数据库中的关联规则的算法,而对已发现规则的更新及维护问题的研究却较少。文章提出了基于频繁模式树的关联规则增量式更新算法,以处理事务数据库中增加了新的事务数据集后相应关联规则的更新问题,并对其性能进行了分析。  相似文献   

15.
相关测度与增量式支持度和信任度的计算   总被引:5,自引:0,他引:5  
王晓峰  王天然 《软件学报》2002,13(11):2208-2214
通过相关测度的定义,从理论上探讨了增量式规则发现问题,并把分类规则挖掘和关联规则挖掘联系起来进行研究,为该问题的深入研究奠定了理论基础.相关测度刻画了给定关系和相关集合的数字特征.对相关测度的概念、定义、性质以及与支持度和信任度的关系等方面作了详细的分析和探讨,给出了基于相关集合的支持度和信任度的定义及计算方法.证明了测度增量定理和支持度增量定理,并给出了增量式支持度和信任度的计算公式.另外还详细地分析了数据增量对关联规则和信任度的影响,探讨了基于新支持度的候选项的修剪问题.所提出的相关测度及其思想为研究既能用于分类规则又能用于关联规则的统一数据挖掘方法提供了有价值的新思路.  相似文献   

16.
一种实用的关联规则增量式更新算法   总被引:2,自引:0,他引:2  
薛锦  陈原斌 《计算机工程与应用》2003,39(13):212-213,217
关联规则是数据挖掘中的一个重要研究内容。目前已经提出了许多用于高效地发现大规模数据库中的关联规则的算法,而对已发现规则的更新及维护问题的研究却较少。该文提出了一种实用的关联规则增量式更新算法,以处理事务数据库中增加了新的事务数据集后相应的关联规则的更新问题,并对其性能进行了分析。  相似文献   

17.
将Rough集理论应用于规则归纳系统,提出了一种基于粗糙集获取规则知识库的增量式学习方法,能够有效处理决策表中不一致情形,采用启发式算法获取决策表的最简规则,当新对象加入时在原有规则集基础上进行规则知识库的增量式更新,避免了为更新规则而重新运行规获取算法。并用UCI中多个数据集从规则集的规则数目、数据浓缩率、预测能力等指标对该算法进行了测试。实验表明了该算法的有效性。  相似文献   

18.
一种基于事务时间分割的关联规则增量式更新方法   总被引:1,自引:0,他引:1  
文章介绍了一种增量式关联规则更新方法,其核心思想是,将长事务以时间分割,分成一个连续的情节集合,当前情节期间获得的信息,依赖于当前的事务子集以及前面情节期间已经发现的信息。仅使用更新的事务和前面阶段的挖掘结果,增量式地产生频集。用Apriori类算法作为局部过程来产生频集,给出了具体的动态挖掘算法。  相似文献   

19.
Different from traditional association-rule mining, a new paradigm called Ratio Rule (RR) was proposed recently. Ratio rules are aimed at capturing the quantitative association knowledge, We extend this framework to mining ratio rules from distributed and dynamic data sources. This is a novel and challenging problem. The traditional techniques used for ratio rule mining is an eigen-system analysis which can often fall victim to noise. This has limited the application of ratio rule mining greatly. The distributed data sources impose additional constraints for the mining procedure to be robust in the presence of noise, because it is difficult to clean all the data sources in real time in real-world tasks. In addition, the traditional batch methods for ratio rule mining cannot cope with dynamic data. In this paper, we propose an integrated method to mining ratio rules from distributed and changing data sources, by first mining the ratio rules from each data source separately through a novel robust and adaptive one-pass algorithm (which is called Robust and Adaptive Ratio Rule (RARR)), and then integrating the rules of each data source in a simple probabilistic model. In this way, we can acquire the global rules from all the local information sources adaptively. We show that the RARR technique can converge to a fixed point and is robust as well. Moreover, the integration of rules is efficient and effective. Both theoretical analysis and experiments illustrate that the performance of RARR and the proposed information integration procedure is satisfactory for the purpose of discovering latent associations in distributed dynamic data source.  相似文献   

20.
夏英  刘婉蓉 《计算机应用》2008,28(12):3224-3226
现有的关联规则算法大多都致力于解决增量式更新问题,需要多次扫描数据集,无法对海量数据进行有效处理。针对此问题,提出了基于滑动窗口的关联规则增量式更新算法(SWIUA),利用滑动窗口进行数据更新,挖掘出用户感兴趣的关联规则。该算法只需要扫描原始数据集和更新的数据各一遍,降低了I/O时间;并采用优化策略对候选项集过滤和删除,提高了关联规则的挖掘性能,能有效处理大量新增数据。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号