首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we proposed an efficient algorithm, called PCP-Miner (Pointset Closed Pattern Miner), for mining frequent closed patterns from a pointset database, where a pointset contains a set of points. Our proposed algorithm consists of two phases. First, we find all frequent patterns of length two in the database. Second, for each pattern found in the first phase, we recursively generate frequent closed patterns by a frequent pattern tree in a depth-first search manner. Since the PCP-Miner does not generate unnecessary candidates, it is more efficient and scalable than the modified Apriori, SASMiner and MaxGeo. The experimental results show that the PCP-Miner algorithm outperforms the comparing algorithms by more than one order of magnitude.  相似文献   

2.
Inter-sequence pattern mining can find associations across several sequences in a sequence database, which can discover both a sequential pattern within a transaction and sequential patterns across several different transactions. However, inter-sequence pattern mining algorithms usually generate a large number of recurrent frequent patterns. We have observed mining closed inter-sequence patterns instead of frequent ones can lead to a more compact yet complete result set. Therefore, in this paper, we propose a model of closed inter-sequence pattern mining and an efficient algorithm called CISP-Miner for mining such patterns, which enumerates closed inter-sequence patterns recursively along a search tree in a depth-first search manner. In addition, several effective pruning strategies and closure checking schemes are designed to reduce the search space and thus accelerate the algorithm. Our experiment results demonstrate that the proposed CISP-Miner algorithm is very efficient and outperforms a compared EISP-Miner algorithm in most cases.  相似文献   

3.
目前TOP-K高效用模式挖掘算法需要产生候选项集,特别是当数据集比较大或者数据集中包含较多长事务项集时,算法的时间和空间效率会受到更大的影响.针对此问题,通过将事务项集和项集效用信息有效地保存到树结构HUP-Tree,给出一个不需要候选项集的挖掘算法TOPKHUP;HUP-Tree树能保证从中计算到每个模式的效用值,不需要再扫描数据集来计算模式的效用值,从而使挖掘算法的时空效率得到较大的提高.采用7个典型数据集对算法的性能进行测试,实验结果证明TOPKHUP的时间和空间效率都优于已有算法,并对K值的变化保持平稳.  相似文献   

4.
A transaction database usually consists of a set of time-stamped transactions. Mining frequent patterns in transaction databases has been studied extensively in data mining research. However, most of the existing frequent pattern mining algorithms (such as Apriori and FP-growth) do not consider the time stamps associated with the transactions. In this paper, we extend the existing frequent pattern mining framework to take into account the time stamp of each transaction and discover patterns whose frequency dramatically changes over time. We define a new type of patterns, called transitional patterns, to capture the dynamic behavior of frequent patterns in a transaction database. Transitional patterns include both positive and negative transitional patterns. Their frequencies increase/decrease dramatically at some time points of a transaction database. We introduce the concept of significant milestones for a transitional pattern, which are time points at which the frequency of the pattern changes most significantly. Moreover, we develop an algorithm to mine from a transaction database the set of transitional patterns along with their significant milestones. Our experimental studies on real-world databases illustrate that mining positive and negative transitional patterns is highly promising as a practical and useful approach for discovering novel and interesting knowledge from large databases.  相似文献   

5.
Mining frequent trajectory patterns in spatial-temporal databases   总被引:1,自引:0,他引:1  
In this paper, we propose an efficient graph-based mining (GBM) algorithm for mining the frequent trajectory patterns in a spatial-temporal database. The proposed method comprises two phases. First, we scan the database once to generate a mapping graph and trajectory information lists (TI-lists). Then, we traverse the mapping graph in a depth-first search manner to mine all frequent trajectory patterns in the database. By using the mapping graph and TI-lists, the GBM algorithm can localize support counting and pattern extension in a small number of TI-lists. Moreover, it utilizes the adjacency property to reduce the search space. Therefore, our proposed method can efficiently mine the frequent trajectory patterns in the database. The experimental results show that it outperforms the Apriori-based and PrefixSpan-based methods by more than one order of magnitude.  相似文献   

6.
虽然FP-Growth算法能够有效地从数据库中挖掘频繁模式,但如何由其挖掘出的频繁模式中高效地产生关联规则仍是一个相当复杂的问题。该文提出了用于组织频繁模式的线索频繁模式树(TFPT)和一个从TFPT中挖掘关联规则的高效算法—最短模式优先算法(SPF)。挖掘模式Y的关联规则时,SPF算法应用了两个优化策略,避免了对大量的不可能成为规则XY-X左部的Y的子集的检查,从而获得了很好的性能。实验表明:与类FP-Growth算法结合时,SPF算法运行速度远远快于Apriori算法,并有相当好的可伸缩性。  相似文献   

7.
Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only a more compact yet complete result set but also better efficiency. However, most of the previously developed closed pattern mining algorithms work under the candidate maintenance-and- test paradigm, which is inherently costly in both runtime and space usage when the support threshold is low or the patterns become long. In this paper, we present BIDE, an efficient algorithm for mining frequent closed sequences without candidate maintenance. It adopts a novel sequence closure checking scheme called Bl-Directional Extension and prunes the search space more deeply compared to the previous algorithms by using the BackScan pruning method. A thorough performance study with both sparse and dense, real, and synthetic data sets has demonstrated that BIDE significantly outperforms the previous algorithm: It consumes an order(s) of magnitude less memory and can be more than an order of magnitude faster. It is also linearly scalable in terms of database size.  相似文献   

8.
In this paper, we explore a new data mining capability that involves mining calling path patterns in global system for mobile communication (GSM) networks. Our proposed method consists of two phases. First, we devise a data structure to convert the original calling paths in the log file into a frequent calling path graph. Second, we design an algorithm to mine the calling path patterns from the frequent calling path graph obtained. By using the frequent calling path graph to mine the calling path patterns, our proposed algorithm does not generate unnecessary candidate patterns and requires less database scans. If the corresponding calling path graph of the GSM network can be fitted in the main memory, our proposed algorithm scans the database only once. Otherwise, the cellular structure of the GSM network is divided into several partitions so that the corresponding calling path sub-graph of each partition can be fitted in the main memory. The number of database scans for this case is equal to the number of partitioned sub-graphs. Therefore, our proposed algorithm is more efficient than the PrefixSpan and a priori-like approaches. The experimental results show that our proposed algorithm outperforms the a priori-like and PrefixSpan approaches by several orders of magnitude.  相似文献   

9.
Previous research works have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. Upon discovery of frequent closed XML query patterns, indexing and caching can be effectively adopted for query performance enhancement. Most of the previous algorithms for finding frequent patterns basically introduced a straightforward generate-and-test strategy. In this paper, we present SOLARIA*, an efficient algorithm for mining frequent closed XML query patterns without candidate maintenance and costly tree-containment checking. Efficient algorithm of sequence mining is involved in discovering frequent tree-structured patterns, which aims at replacing expensive containment testing with cheap parent-child checking in sequences. SOLARIA* deeply prunes unrelated search space for frequent pattern enumeration by parent-child relationship constraint. By a thorough experimental study on various real-life data, we demonstrate the efficiency and scalability of SOLARIA* over the previous known alternative. SOLARIA* is also linearly scalable in terms of XML queries' size.  相似文献   

10.
在频繁模式挖掘过程中能够动态改变约束的算法比较少.提出了一种基于约束的频繁模式挖掘算法MCFP.MCFP首先按照约束的性质来建立频繁模式树,并且只需扫描一遍数据库,然后建立每个项的条件树,挖掘以该项为前缀的最大频繁模式,并用最大模式树来存储,最后根据最大模式来找出所有支持度明确的频繁模式.MCFP算法允许用户在挖掘频繁模式过程中动态地改变约束.实验表明,该算法与iCFP算法相比是很有效的.  相似文献   

11.
周明  李宏 《计算机工程》2007,33(2):74-76
传统频繁项集挖掘算法在处理稠密或长数据集(如基因表达数据集)时效率低且产生大量冗余模式,为解决这些问题一些学者提出了闭合模式的概念和挖掘闭合模式的算法,研究证明挖掘闭合模式可以显著减少项集数量并消除大量冗余模式。该文针对生物数据特点提出了一个新颖的挖掘频繁闭合模式的算法REMFOR,该算法在闭合模式概念和行枚举思想的基础上,采用垂直数据结构和fp-tree技术,对行集建立行fp-tree来挖掘频繁闭合模式。通过实例和实验证明该算法是正确有效的。  相似文献   

12.
刘佳新 《计算机工程》2012,38(12):39-41
现有的增量式挖掘算法在支持度发生变化时,需要对序列数据库进行重复挖掘,为减少由此产生的时空消耗,提出一种高效的增量式序列模式挖掘算法。算法采用频繁序列树作为序列存储结构,当序列数据库和最小支持度发生变化时,通过执行更新操作,实现频繁序列树的更新,利用深度优先遍历频繁序列树找到序列数据库中所有的序列模式。实验结果表明,与IncSpan算法和PrefixSpan算法相比,该算法的挖掘效率较高。  相似文献   

13.
Mining spatial association rules in image databases   总被引:2,自引:0,他引:2  
In this paper, we propose a novel spatial mining algorithm, called 9DLT-Miner, to mine the spatial association rules from an image database, where every image is represented by the 9DLT representation. The proposed method consists of two phases. First, we find all frequent patterns of length one. Next, we use frequent k-patterns (k ? 1) to generate all candidate (k + 1)-patterns. For each candidate pattern generated, we scan the database to count the pattern’s support and check if it is frequent. The steps in the second phase are repeated until no more frequent patterns can be found. Since our proposed algorithm prunes most of impossible candidates, it is more efficient than the Apriori algorithm. The experiment results show that 9DLT-Miner runs 2-5 times faster than the Apriori algorithm.  相似文献   

14.
Mining frequent patterns from univariate uncertain data   总被引:1,自引:0,他引:1  
In this paper, we propose a new algorithm called U2P-Miner for mining frequent U2 patterns from univariate uncertain data, where each attribute in a transaction is associated with a quantitative interval and a probability density function. The algorithm is implemented in two phases. First, we construct a U2P-tree that compresses the information in the target database. Then, we use the U2P-tree to discover frequent U2 patterns. Potential frequent U2 patterns are derived by combining base intervals and verified by traversing the U2P-tree. We also develop two techniques to speed up the mining process. Since the proposed method is based on a tree-traversing strategy, it is both efficient and scalable. Our experimental results demonstrate that the U2P-Miner algorithm outperforms three widely used algorithms, namely, the modified Apriori, modified H-mine, and modified depth-first backtracking algorithms.  相似文献   

15.
Sequential pattern mining has been studied extensively in the data mining community. Most previous studies require the specification of a min_support threshold for mining a complete set of sequential patterns satisfying the threshold. However, in practice, it is difficult for users to provide an appropriate min_support threshold. To overcome this difficulty, we propose an alternative mining task: mining top-k frequent closed sequential patterns of length no less than min_, where k is the desired number of closed sequential patterns to be mined and min_ is the minimal length of each pattern. We mine the set of closed patterns because it is a compact representation of the complete set of frequent patterns. An efficient algorithm, called TSP, is developed for mining such patterns without min_support. Starting at (absolute) min_support=1, the algorithm makes use of the length constraint and the properties of top-k closed sequential patterns to perform dynamic support raising and projected database pruning. Our extensive performance study shows that TSP has high performance. In most cases, it outperforms the efficient closed sequential pattern-mining algorithm, CloSpan, even when the latter is running with the best tuned min_support threshold. Thus, we conclude that, for sequential pattern mining, mining top-k frequent closed sequential patterns without min_support is more preferable than the traditional min_support-based mining.  相似文献   

16.
针对Web用户访问模式问题,采用最大频繁访问路径(MFP)方法可以挖掘出更有普遍意义的模式。给出一种新的用户访问模式树WUAP tree结构,并采用E OEM模型,综合考虑了页面拓扑结构及用户浏览路径等多个数据源,进一步提出了一种Web访问模式挖掘算法WUAP mine。该算法不用产生候选集和递归,只对事务数据库进行一次扫描,对WUAP tree结构进行深度优先遍历一次,就可从WUAP tree结构上直接查询出Web用户频繁访问模式。最后,从理论和实践上推导和验证了它的有效性和高效性。  相似文献   

17.
直接对生物序列进行频繁模式挖掘会产生很多冗余模式,闭合模式更能表达出序列的功能和结构。根据生物序列的特点,提出了基于相邻闭合频繁模式段的模式挖掘算法-JCPS。首先产生闭合相邻频繁模式段,然后对这些闭合频繁模式段进行组合,同时进行闭合检测,产生新的闭合频繁模式。通过对真实的蛋白质序列家族库的处理,证明该算法能有效处理生物序列数据。  相似文献   

18.
In recent years, generalization-based data mining techniques have become an interesting topic for many data scientists. Generalized itemset mining is an exploration technique that focuses on extracting high-level abstractions and correlations in a database. However, the problem that domain experts must always deal with is how to manage and interpret a large number of extracted patterns from a massive database of transactions. In generalized pattern mining, taxonomies that contain abstraction information for each dataset are defined, so the number of frequent patterns can grow enormously. Therefore, exploiting knowledge turns into a difficult and costly process. In this article, we introduce an approach that uses cardinality-based constraints with transaction id and numeric encoding to mine generalized patterns. We applied transaction id to support the computation of each frequent itemset as well as to encode taxonomies into a numeric type using two simple rules. We also attempted to apply the combination of cardinality cons- traints and closed or maximal patterns. Experiments show that our optimizations significantly improve the performance of the original method, and the importance of comprehensive information within closed and maximal patterns is worth considering in generalized frequent pattern mining.  相似文献   

19.
The purpose of mining frequent itemsets is to identify the items in groups that always appear together and exceed the user-specified threshold of a transaction database. However, numerous frequent itemsets may exist in a transaction database, hindering decision making. Recently, the mining of frequent closed itemsets has become a major research issue because sets of frequent closed itemsets are condensed yet complete representations of frequent itemsets. Therefore, all frequent itemsets can be derived from a group of frequent closed itemsets. Nonetheless, the number of transactions in a transaction database can increase rapidly in a short time period, and a number of the transactions may be outdated. Thus, frequent closed itemsets may be changed with the addition of new transactions or the deletion of old transactions from the transaction database. Updating previously closed itemsets when transactions are added or removed from the transaction database is challenging. This study proposes an efficient algorithm for incrementally mining frequent closed itemsets without scanning the original database. The proposed algorithm updates closed itemsets by performing several operations on the previously closed itemsets and added/deleted transactions without searching the previously closed itemsets. The experimental results show that the proposed algorithm significantly outperforms previous methods, which require a substantial length of time to search previously closed itemsets.  相似文献   

20.
本文首先提出了一种挖掘频集的高效算法PP。它采用了一种基于树的模式支持集表示,避免了反复扫描数据库和递归建造个数与频繁模式数相同的模式支持集,其效率比Apriori和FPGrowth高1—3个数量级。PP被进一步扩展成发现分类规则的有效算法CRM-PP。CRM-PP将多支持率剪裁集成到频集发现阶段,将二阶段挖掘法改进为单阶段挖掘法。CRM-PP的效率也比基于Apriori和FPGrowth的二阶段算法高1—3个数量级。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号