首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 453 毫秒
1.
2.
To date, association rule mining has mainly focused on the discovery of frequent patterns. Nevertheless, it is often interesting to focus on those that do not frequently occur. Existing algorithms for mining this kind of infrequent patterns are mainly based on exhaustive search methods and can be applied only over categorical domains. In a previous work, the use of grammar-guided genetic programming for the discovery of frequent association rules was introduced, showing that this proposal was competitive in terms of scalability, expressiveness, flexibility and the ability to restrict the search space. The goal of this work is to demonstrate that this proposal is also appropriate for the discovery of rare association rules. This approach allows one to obtain solutions within specified time limits and does not require large amounts of memory, as current algorithms do. It also provides mechanisms to discard noise from the rare association rule set by applying four different and specific fitness functions, which are compared and studied in depth. Finally, this approach is compared with other existing algorithms for mining rare association rules, and an analysis of the mined rules is performed. As a result, this approach mines rare rules in a homogeneous and low execution time. The experimental study shows that this proposal obtains a small and accurate set of rules close to the size specified by the data miner.  相似文献   

3.
This paper presents some new algorithms to efficiently mine max frequent generalized itemsets (g-itemsets) and essential generalized association rules (g-rules). These are compact and general representations for all frequent patterns and all strong association rules in the generalized environment. Our results fill an important gap among algorithms for frequent patterns and association rules by combining two concepts. First, generalized itemsets employ a taxonomy of items, rather than a flat list of items. This produces more natural frequent itemsets and associations such as (meat, milk) instead of (beef, milk), (chicken, milk), etc. Second, compact representations of frequent itemsets and strong rules, whose result size is exponentially smaller, can solve a standard dilemma in mining patterns: with small threshold values for support and confidence, the user is overwhelmed by the extraordinary number of identified patterns and associations; but with large threshold values, some interesting patterns and associations fail to be identified. Our algorithms can also expand those max frequent g-itemsets and essential g-rules into the much larger set of ordinary frequent g-itemsets and strong g-rules. While that expansion is not recommended in most practical cases, we do so in order to present a comparison with existing algorithms that only handle ordinary frequent g-itemsets. In this case, the new algorithm is shown to be thousands, and in some cases millions, of the time faster than previous algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies, with the depths of seven or more. Experimental results for previous algorithms limited themselves to taxonomies with depth at most three or four. In each of the two problems, a straightforward lattice-based approach is briefly discussed and then a classificationbased algorithm is developed. In particular, the two classification-based algorithms are MFGI_class for mining max frequent g-itemsets and EGR_class for mining essential g-rules. The classification-based algorithms are featured with conceptual classification trees and dynamic generation and pruning algorithms.  相似文献   

4.
CBC-DS: 基于频繁闭模式的数据流分类算法   总被引:2,自引:0,他引:2  
基于关联规则的分类算法通常根据频繁模式生成类关联规则,但频繁模式挖掘易遭受组合爆炸问题,影响算法效率.并且数据流的出现也对分类算法提出了新的挑战.相对于频繁模式,频繁闭模式的数目较少,挖掘频繁闭模式的算法通常具有较高的效率.为此,提出了一种高效的基于频繁闭模式的数据流分类算法—CBC-DS.主要贡献在于:1)提出了一种基于逆文法顺序FP-Tree的频繁闭项集单遍挖掘过程,用于挖掘类关联规则,该过程采用了一种混合项顺序搜索策略以满足数据流挖掘的单遍性需求,并采用位图技术提高效率;2)提出了“自支持度”概念,用于筛选规则以提高算法分类精度.实验表明,位图技术能够提高算法速度2倍以上,利用自支持度能够提高算法平均精度0.5%左右;最终CBC-DS算法的平均分类精度比经典算法CMAR高1%左右,并且CBC-DS算法的规则挖掘速度远快于CMAR算法.  相似文献   

5.
Dataless Transitions Between Concise Representations of Frequent Patterns   总被引:1,自引:0,他引:1  
For many data mining problems in order to solve them it is required to discover frequent patterns. Frequent itemsets are useful e.g. in the discovery of association and episode rules, sequential patterns and clusters. Nevertheless, the number of frequent itemsets is usually huge. Therefore, a number of lossless representations of frequent itemsets have recently been proposed. Two of such representations, namely the closed itemsets and the generators representation, are of particular interest as they can efficiently be applied for the discovery of most interesting non-redundant association and episode rules. On the other hand, it has been proved experimentally that other representations of frequent patterns happen to be more concise and more quickly extractable than these two representations even by several orders of magnitude. Hence, such concise representations seem to be an interesting alternative for materializing and reusing the knowledge of frequent patterns. The problem however arises, how to transform the intermediate representations into the desired ones efficiently and preferably without accessing the database. This article tackles this problem. As a result of investigating the properties of representations of frequent patterns, we offer a set of efficient algorithms for dataless transitioning between them.  相似文献   

6.
In the present scenario of global economy and World Wide Web, large sets of evolving and distributed data can be handled efficiently by incremental data mining. Frequent patterns are very important in knowledge discovery and data mining process, such as mining of association rules, correlations. FP-tree is a very versatile data structure used for mining of frequent patterns in knowledge discovery and data mining process. FP-tree is a compact representation of transaction database that contains frequency information of all relevant frequent patterns (FP) of the database. All of the existing incremental frequent pattern mining algorithms, such as AFPIM, CATS, CanTree, CP-tree, and SPO-tree, perform incremental mining by processing one transaction of the incremental part of database at a time and updating it to the FP-tree of initial (original) database. Here, in this paper, we propose a novel method that takes advantage of FP-tree representation of incremental transaction database for incremental mining. We propose a batch incremental processing algorithm BIT_FPGrowth that restructures and merges two small consecutive duration FP-trees to obtain a FP-tree of the FP-Growth algorithm. Our BIT_FPGrowth uses FP-tree as preprocessed data repository to get transactions (i.e., item-sets), unlike other sequential incremental algorithms that read transactions from database. BIT_FPGrowth algorithm takes less time for constructing FP-tree. Our experimental results show that, as the size of the database increases, increase in runtime of BIT_FPGrowth is much less and is least of all the other algorithms.  相似文献   

7.
Most incremental mining and online mining algorithms concentrate on finding association rules or patterns consistent with entire current sets of data. Users cannot easily obtain results from only interesting portion of data. This may prevent the usage of mining from online decision support for multidimensional data. To provide ad-hoc, query-driven, and online mining support, we first propose a relation called the multidimensional pattern relation to structurally and systematically store context and mining information for later analysis. Each tuple in the relation comes from an inserted dataset in the database. We then develop an online mining approach called three-phase online association rule mining (TOARM) based on this proposed multidimensional pattern relation to support online generation of association rules under multidimensional considerations. The TOARM approach consists of three phases during which final sets of patterns satisfying various mining requests are found. It first selects and integrates related mining information in the multidimensional pattern relation, and then if necessary, re-processes itemsets without sufficient information against the underlying datasets. Some implementation considerations for the algorithm are also stated in detail. Experiments on homogeneous and heterogeneous datasets were made and the results show the effectiveness of the proposed approach.  相似文献   

8.
一种改进的关联规则挖掘方法研究   总被引:4,自引:0,他引:4  
徐勇  周森鑫 《微机发展》2006,16(3):77-79
关联模式挖掘研究是数据挖掘研究领域的重要分支之一,旨在发现模式之间存在的关联或相关关系。然而,传统的基于支持度-可信度框架的挖掘方法存在着一些不足:一是会产生过多的模式(包括频繁项集和规则);二是挖掘出来的规则有些是用户不感兴趣的、无用的,甚至是错误的。所以在挖掘过程中能有效地对无用模式进行剪枝是必要的。利用相关关系对模式进行评价是一种有效的剪枝方法。实验结果分析表明,在传统挖掘方法的基础上引入相关关系度量可以有效地对非相关模式进行剪枝,从而减小频繁项集和规则的规模。  相似文献   

9.
Bridging rules take the antecedent and action from different conceptual clusters. They are distinguished from association rules (frequent itemsets) because (1) they can be generated by the infrequent itemsets that are pruned in association rule mining, and (2) they are measured by their importance including the distance between two conceptual clusters, whereas frequent itemsets are measured only by their support. In this paper, we first design two algorithms for mining bridging rules between clusters, and then propose two non-linear metrics to measure their interestingness. We evaluate these algorithms experimentally and demonstrate that our approach is promising.  相似文献   

10.
加权关联规则的改进算法   总被引:9,自引:2,他引:7  
论文讨论了加权关联规则问题,针对布尔类型的加权关联规则问题提出一种改进算法。该算法首先利用普通的关联规则算法产生频繁集,然后在该频繁集的基础上产生加权频繁集。同时,给出了最优的最小支持度设定方法,保证了普通关联规则算法所产生的频繁集为加权频繁集的超集。该算法有较高的效率,并且能够有效利用已有的关联规则算法。  相似文献   

11.
Data-mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Most conventional data-mining algorithms identify the relationships among transactions using binary values, however, transactions with quantitative values are commonly seen in real-world applications. This paper thus proposes a new data-mining algorithm for extracting interesting knowledge from transactions stored as quantitative values. The proposed algorithm integrates fuzzy set concepts and the apriori mining algorithm to find interesting fuzzy association rules in given transaction data sets. Experiments with student grades at I-Shou University were also made to verify the performance of the proposed algorithm.  相似文献   

12.
This paper studies the problem of mining frequent itemsets along with their temporal patterns from large transaction sets. A model is proposed in which users define a large set of temporal patterns that are interesting or meaningful to them. A temporal pattern defines the set of time points where the user expects a discovered itemset to be frequent. The model is general in that (i) no constraints are placed on the interesting patterns given by the users, and (ii) two measures—inclusiveness and exclusiveness—are used to capture how well the temporal patterns match the time points given by the discovered itemsets. Intuitively, these measures indicate to what extent a discovered itemset is frequent at time points included in a temporal pattern p, but not at time points not in p. Using these two measures, one is able to model many temporal data mining problems appeared in the literature, as well as those that have not been studied. By exploiting the relationship within and between itemset space and pattern space simultaneously, a series of pruning techniques are developed to speed up the mining process. Experiments show that these pruning techniques allow one to obtain performance benefits up to 100 times over a direct extension of non-temporal data mining algorithms.  相似文献   

13.
A genetic-fuzzy mining approach for items with multiple minimum supports   总被引:2,自引:2,他引:0  
Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Mining association rules from transaction data is most commonly seen among the mining techniques. Most of the previous mining approaches set a single minimum support threshold for all the items and identify the relationships among transactions using binary values. In the past, we proposed a genetic-fuzzy data-mining algorithm for extracting both association rules and membership functions from quantitative transactions under a single minimum support. In real applications, different items may have different criteria to judge their importance. In this paper, we thus propose an algorithm which combines clustering, fuzzy and genetic concepts for extracting reasonable multiple minimum support values, membership functions and fuzzy association rules from quantitative transactions. It first uses the k-means clustering approach to gather similar items into groups. All items in the same cluster are considered to have similar characteristics and are assigned similar values for initializing a better population. Each chromosome is then evaluated by the criteria of requirement satisfaction and suitability of membership functions to estimate its fitness value. Experimental results also show the effectiveness and the efficiency of the proposed approach.  相似文献   

14.
关系数据库中知识发现的一种粒计算方法   总被引:1,自引:0,他引:1  
邱桃荣  刘清  黄厚宽 《自动化学报》2009,35(8):1071-1079
提出用粒计算方法从关系数据库或信息系统中挖掘具有不同粒度大小的多维多层次关联规则. 首先, 基于粒计算的划分模型给出了从关系数据库或信息系统中进行知识发现的框架; 其次, 提出频繁k-项目集生成的粒计算方法; 最后, 对所提出的粒计算方法通过实际例子进行说明, 并选择两类不同数据集在给定不同支持度下进行测试, 以及与两种经典方法进行了比较. 测试结果表明所提出的粒计算方法有效. 而且借助粒计算使得关联规则的语义变得更加清晰和易于理解.  相似文献   

15.
李广璞  黄妙华 《计算机科学》2018,45(Z11):1-11, 26
关联分析作为数据挖掘的主要研究模块之一,主要用于发现隐藏在大型数据集中的强关联特征。而多数关联规则挖掘任务可分为频繁模式(频繁项集、频繁序列、频繁子图)的产生和规则的产生。前者发现数据集中满足最小支持度阈值的项集、序列与子图;后者从上一步发现的频繁模式中提取高置信度的规则。频繁项集挖掘是许多数据挖掘任务中的关键问题,也是关联规则挖掘算法的核心。十几年来,学者们致力于提高频繁项集的生成效率,从不同的角度进行改进以提高算法效率,大量的高效可伸缩性算法被提出。文中对频繁项集挖掘进行深入分析,对完全频繁项集、闭频繁项集、极大频繁项集的典型算法进行介绍和评述,最后对频繁项集挖掘算法的研究方向进行简要分析。  相似文献   

16.
Recent years have witnessed an increasing interest in computing cosine similarity between high-dimensional documents, transactions, and gene sequences, etc. Most previous studies limited their scope to the pairs of items, which cannot be adapted to the multi-itemset cases. Therefore, from a frequent pattern mining perspective, there exists still a critical need for discovering interesting patterns whose cosine similarity values are above some given thresholds. However, the knottiest point of this problem is, the cosine similarity has no anti-monotone property. To meet this challenge, we propose the notions of conditional anti-monotone property and Support-Ascending Set Enumeration Tree (SA-SET). We prove that the cosine similarity has the conditional anti-monotone property and therefore can be used for the interesting pattern mining if the itemset traversal sequence is defined by the SA-SET. We also identify the anti-monotone property of an upper bound of the cosine similarity, which can be used in further pruning the candidate itemsets. An Apriori-like algorithm called CosMiner is then put forward to mine the cosine interesting patterns from large-scale multi-item databases. Experimental results show that CosMiner can efficiently identify interesting patterns using the conditional anti-monotone property of the cosine similarity and the anti-monotone property of its upper bound, even at extremely low levels of support.  相似文献   

17.
目前已提出了许多快速的关联规则挖掘算法,实际上用户只关心部分关联规则,如他们仅想 知道包含指定项目的规则.当这些约束被用于数据预处理或将它结合到数据挖掘算法中去时 ,可以显著减少算法的执行时间.为此,考虑了一类包含或不包含某些项目的布尔表达式约 束条件,提出了一种快速的基于FP-tree的约束最大频繁项目集挖掘算法CMFIMA,并对其更 新问题进行了研究,提出了一种增量式更新约束最大频繁项目集挖掘算法CMFIUA.  相似文献   

18.
Mining multiple-level association rules in large databases   总被引:2,自引:0,他引:2  
A top-down progressive deepening method is developed for efficient mining of multiple-level association rules from large transaction databases based on the a priori principle. A group of variant algorithms is proposed based on the ways of sharing intermediate results, with the relative performance tested and analyzed. The enforcement of different interestingness measurements to find more interesting rules, and the relaxation of rule conditions for finding “level-crossing” association rules, are also investigated. The study shows that efficient algorithms can be developed from large databases for the discovery of interesting and strong multiple-level association rules  相似文献   

19.
关联规则挖掘是近年来数据挖掘领域中一个相当活跃的领域,频繁项集挖掘是关联规则挖掘中最重要的任务。最大频繁项集的规模远远小于频繁项集的规模,通过最大频繁项集可以导出所有的频繁项集,因此进行了很多专门挖掘最大频繁项集的研究。给出了关联规则和相关术语的基本概念,对最大频繁项集挖掘算法作了分析与评价,便于研究者对已有的算法进行改进,提出具有更好性能的新算法。  相似文献   

20.
There have been many kinds of association rule mining (ARM) algorithms, e.g., Apriori and FP-tree, to discover meaningful frequent patterns from a large dataset. Particularly, it is more difficult for such ARM algorithms to be applied for temporal databases which are continuously changing over time. Such algorithms are generally based on repeating time-consuming tasks, e.g., scanning databases. To deal with this problem, in this paper, we propose a constraint graph-based method for maintaining frequent patterns (FP) discovered from the temporal databases. Particularly, the constraint graph, which is represented as a set of constraint between two items, can be established by temporal persistency of the patterns. It means that some patterns can be used to build the constraint graph, when the patterns have been shown in a set of the FP. Two types of constraints can be generated by users and adaptation. Based on our scheme, we find that a large number of dataset has been efficiently reduced during mining process and the gathering information while updating.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号