首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于属性互信息熵的量化关联规则挖掘   总被引:2,自引:1,他引:1       下载免费PDF全文
在量化关联规则挖掘中存在量化属性及其取值区间的组合爆炸问题,影响算法效率。提出算法BMIQAR,通过考察量化属性间互信息熵,找到具有强信息关系的属性集,从中得到频繁项集以产生规则。实验表明,由于在属性层进行了剪枝,因此缩减了搜索空间,提高了算法的性能,且能得到绝大多数置信度较高的规则。  相似文献   

2.
最大值控制的多最小支持度关联规则挖掘算法   总被引:2,自引:0,他引:2  
何朝阳  赵剑锋  江水 《计算机工程》2006,32(11):103-105
大部分关联规则挖掘算法使用同一最小支持度阈值进行挖掘,但在实际使用中由干各项目发生频率的不同,理应有不同的最小支持度支持。该文提出了一种多最小支持度关联规则挖掘算法,为每一项目设置一最小支持度,同时在生成舒选集和最大频繁集的过程中使用最大值控制来实现剪枝,有效地提高了该算法的效率,最后用一个超市销售物品的例子来说明该算法的使用。  相似文献   

3.
An information-theoretic approach to quantitative association rule mining   总被引:1,自引:1,他引:0  
Quantitative association rule (QAR) mining has been recognized an influential research problem over the last decade due to the popularity of quantitative databases and the usefulness of association rules in real life. Unlike boolean association rules (BARs), which only consider boolean attributes, QARs consist of quantitative attributes which contain much richer information than the boolean attributes. However, the combination of these quantitative attributes and their value intervals always gives rise to the generation of an explosively large number of itemsets, thereby severely degrading the mining efficiency. In this paper, we propose an information-theoretic approach to avoid unrewarding combinations of both the attributes and their value intervals being generated in the mining process. We study the mutual information between the attributes in a quantitative database and devise a normalization on the mutual information to make it applicable in the context of QAR mining. To indicate the strong informative relationships among the attributes, we construct a mutual information graph (MI graph), whose edges are attribute pairs that have normalized mutual information no less than a predefined information threshold. We find that the cliques in the MI graph represent a majority of the frequent itemsets. We also show that frequent itemsets that do not form a clique in the MI graph are those whose attributes are not informatively correlated to each other. By utilizing the cliques in the MI graph, we devise an efficient algorithm that significantly reduces the number of value intervals of the attribute sets to be joined during the mining process. Extensive experiments show that our algorithm speeds up the mining process by up to two orders of magnitude. Most importantly, we are able to obtain most of the high-confidence QARs, whereas the QARs that are not returned by MIC are shown to be less interesting.  相似文献   

4.
空间关联规则的双向挖掘   总被引:9,自引:0,他引:9  
空间数据库中关联规则挖掘不仅需要考虑关系元组属性之间的关系——纵向关系,更需要挖掘元组之间的关系——横向关系,如相邻、相交、重叠等。本文通过分析空间数据库的存储模式,借鉴事务数据库关联规则的挖掘方法,对空间关联规则进行完整定义,并对规则的兴趣度度量进行探讨。根据挖掘的方向将空间数据挖掘归纳为纵向挖掘、横向挖掘、双向挖掘。在双向挖掘中,提出一种新算法,该算法根据挖掘任务进行约束,缩小挖掘空间,然后通过空间计算将空间关系转化为非空间关系,经过多次循环,获取非空间项集,进而挖掘出空间关联规则。据此提出空间数据双向挖掘工作流程,并通过实例进行了验证。  相似文献   

5.
一种新的多值属性关联规则挖掘算法   总被引:1,自引:0,他引:1       下载免费PDF全文
为解决多值属性的关联规则挖掘问题给出相似属性集合矩阵的概念,提出一种新的多值关联规则挖掘算法——Qarmasm算法。该算法无须扩展事务属性,约简效率高,能够直接生成候选频繁项集,求出其支持度,有效地发现频繁项。给出算法的描述及其复杂性分析。与经典算法的对比表明,该算法具有明显的优势。  相似文献   

6.
基于属性分组的高效挖掘关联规则算法   总被引:6,自引:0,他引:6  
挖掘频繁项集在数据挖掘中有着重要的作用。目前,关于频繁项集的挖掘问题已经提出了一些算法,虽然实现了一次扫描数据库即可以发现所有的频繁项集,但是当属性数目很多时,算法的执行效率下降很快。论文首次提出了利用属性分组作为挖掘关联规则的工具,给出了基于属性分组的频繁项集挖掘算法,用矩阵来存储数据库属性间的信息并提取频繁项集,而且不产生候选项集。经实验验证该算法是快速有效的。  相似文献   

7.
Researchers realized the importance of integrating fuzziness into association rules mining in databases with binary and quantitative attributes. However, most of the earlier algorithms proposed for fuzzy association rules mining either assume that fuzzy sets are given or employ a clustering algorithm, like CURE, to decide on fuzzy sets; for both cases the number of fuzzy sets is pre-specified. In this paper, we propose an automated method to decide on the number of fuzzy sets and for the autonomous mining of both fuzzy sets and fuzzy association rules. We achieve this by developing an automated clustering method based on multi-objective Genetic Algorithms (GA); the aim of the proposed approach is to automatically cluster values of a quantitative attribute in order to obtain large number of large itemsets in less time. We compare the proposed multi-objective GA based approach with two other approaches, namely: 1) CURE-based approach, which is known as one of the most efficient clustering algorithms; 2) Chien et al. clustering approach, which is an automatic interval partition method based on variation of density. Experimental results on 100 K transactions extracted from the adult data of USA census in year 2000 showed that the proposed automated clustering method exhibits good performance over both CURE-based approach and Chien et al.’s work in terms of runtime, number of large itemsets and number of association rules.  相似文献   

8.
为了在事务数据库中发现关联规则,在现实挖掘应用中,经常采用不同的标准去判断不同项目的重要性,管理项目之间的分类关系和处理定量数据集这3个方法去处理问题,因此提出一个在定量事务数据库中采用多最小支持度,在项目集中获取隐含知识的多层模糊关联规则挖掘算法。该挖掘算法使用两种支持度约束和至上而下逐步细化的方法推导出频繁项集,同时可以发现交叉层次的模糊关联规则。通过实例证明了该挖掘算法在多最小支持度约束下推导出的多层模糊关联规则是易于理解和有意义的,具有很好的效率和伸缩性。  相似文献   

9.
In this paper, we examine a new data mining issue of mining association rules from customer databases and transaction databases. The problem is decomposed into two subproblems: identifying all the large itemsets from the transaction database and mining association rules from the customer database and the large itemsets identified. For the first subproblem, we propose an efficient algorithm to discover all the large itemsets from the transaction database. Experimental results show that by our approach, the total execution time can be reduced significantly. For the second subproblem, a relationship graph is constructed according to the identified large itemsets from the transaction database and the priorities of condition attributes from the customer database. Based on the relationship graph, we present an efficient graph-based algorithm to discover interesting association rules embedded in the transaction database and the customer database.  相似文献   

10.
关联规则挖掘是数据挖掘的重要领域之一,利用粗糙集理论来挖掘关联规则的方法已经得到广泛关注.针对不完备信息系统,提出了基于粗糙集理论的快速ORD关联规则挖掘算法.该算法首先采用基于粗糙集理论的属性约简算法进行属性约简,然后采用快速、高效的冗余项集和冗余规则修剪算法--ORD算法获取关联规则.将该算法与其它同类流行的算法在4个UCI数据集上进行实验比较,结果表明该算法性能良好.  相似文献   

11.
In item promotion applications, there is a strong need for tools that can help to unlock the hidden profit within each individual customer’s transaction history. Discovering association patterns based on the data mining technique is helpful for this purpose. However, the conventional association mining approach, while generating “strong” association rules, cannot detect potential profit-building opportunities that can be exposed by “soft” association rules, which recommend items with looser but significant enough associations. This paper proposes a novel mining method that automatically detects hidden profit-building opportunities through discovering soft associations among items from historical transactions. Specifically, this paper proposes a relaxation method of association mining with a new support measurement, called soft support, that can be used for mining soft association patterns expressed with the “most” fuzzy quantifier. In addition, a novel measure for validating the soft-associated rules is proposed based on the estimated possibility of a conditioned quantified fuzzy event. The new measure is shown to be effective by comparison with several existing measures. A new association mining algorithm based on modification of the FT-Tree algorithm is proposed to accommodate this new support measure. Finally, the mining algorithm is applied to several data sets to investigate its effectiveness in finding soft patterns and content recommendation.  相似文献   

12.
针对就业信息数据中存在着大量的量化属性和分类属性等现象,提出了一种基于k-means的量化关联规则挖掘方法。该方法利用聚类算法k-means对量化属性进行合理分区,将量化属性转化为布尔型;利用改进的布尔关联规则方法对此进行关联规则挖掘,找出学生的受教育属性和就业属性之间的关联性;对挖掘出的规则进行分析和运用。就业信息数据实验证明,文中所提方法对就业信息进行挖掘是有效的、可行的。  相似文献   

13.
基于图的关联规则改进算法   总被引:1,自引:0,他引:1  
关联规则挖掘是数据挖掘研究的最重要课题之一。基于图的关联规则挖掘DLG算法通过一次扫描数据库构建关联图,然后遍历该关联图产生频繁项集,有效地提高了关联规则挖掘的性能。在分析该算法基本原理基础上,提出了一种改进的算法—DLG#。改进算法在关联图构造同时构造项集关联矩阵,在候选项集生成时结合关联图和Apriori性质对冗余项集进行剪枝,减少了候选项集数,简化了候选项集的验证。比较实验结果表明,在不同数据集和不同支持度阈值下,改进算法都能更快速的发现频繁项集,当频繁项集平均长度较大时性能提高明显。  相似文献   

14.
关联规则挖掘作为近年来的研究热点之一,其经典算法Apriori算法因需要多次扫描数据库且会产生大量候选项集,严重影响了关联规则的挖掘效率.在此基础上提出了一种基于矩阵压缩的加权关联规则挖掘算法,只需扫描一次数据库,并将其转换为0-1矩阵,根据相关性质对矩阵进行压缩,从而降低了算法执行过程中的计算量;同时,考虑到项目的重要性,采取加权的方法,用求概率的方式设置项目属性的权值.同Apriori算法相比,本算法在挖掘过程中能直接查找高阶频繁项集.实验结果表明,本算法能有效提高关联规则的挖掘效率.  相似文献   

15.
应用于入侵检测系统的报警关联的改进Apriori算法   总被引:2,自引:1,他引:1  
王台华  万宇文  郭帆  余敏 《计算机应用》2010,30(7):1785-1788
在众多的关联规则挖掘算法中,Apriori算法是最为经典的一个,但Apriori算法有以下缺陷:需要扫描多次数据库、生成大量候选集以及迭代求解频繁项集。提出了一种一步交集操作得到最大频繁项目集的方法。支持度由交集的次数得到而无需再去扫描事务数据库,将其中一些属性进行编号能减少存储空间且方便搜索候选集列表,从而提高算法的效率。最后针对入侵检测系统形成关联规则。实验结果表明,优化后的算法能有效地提高关联规则挖掘的效率。  相似文献   

16.
提出了一种新颖的频繁模式挖掘算法,该算法与现有的挖掘算法相比具有明显的优点,首先,该算法不需要产生候选项集,其次该算法具有更少的数据库扫描次数,该算法在中小型数据库上挖掘关联规则只需要扫描交易数据库一次,对于大型交易数据库的关联规则挖掘最多也只需要扫描交易数据库两次。因而,该算法与现有的频繁模式挖掘算法相比具有更高的效率。  相似文献   

17.
发现频繁项集是关联规则挖掘中最基本、最重要的问题.提出了一种基于二进制表示的频繁项集挖掘算法,并利用二进制的性质快速产生候选项集并计算其支持度.算法总体性能在一定程度上得到了提高.  相似文献   

18.
Beyond Market Baskets: Generalizing Association Rules to Dependence Rules   总被引:11,自引:2,他引:9  
One of the more well-studied problems in data mining is the search for association rules in market basket data. Association rules are intended to identify patterns of the type: A customer purchasing item A often also purchases item B. Motivated partly by the goal of generalizing beyond market basket data and partly by the goal of ironing out some problems in the definition of association rules, we develop the notion of dependence rules that identify statistical dependence in both the presence and absence of items in itemsets. We propose measuring significance of dependence via the chi-squared test for independence from classical statistics. This leads to a measure that is upward-closed in the itemset lattice, enabling us to reduce the mining problem to the search for a border between dependent and independent itemsets in the lattice. We develop pruning strategies based on the closure property and thereby devise an efficient algorithm for discovering dependence rules. We demonstrate our algorithm's effectiveness by testing it on census data, text data (wherein we seek term dependence), and synthetic data.  相似文献   

19.
Genetic-Fuzzy Data Mining With Divide-and-Conquer Strategy   总被引:1,自引:0,他引:1  
Data mining is most commonly used in attempts to induce association rules from transaction data. Most previous studies focused on binary-valued transaction data. Transaction data in real-world applications, however, usually consist of quantitative values. This paper, thus, proposes a fuzzy data-mining algorithm for extracting both association rules and membership functions from quantitative transactions. A genetic algorithm (GA)-based framework for finding membership functions suitable for mining problems is proposed. The fitness of each set of membership functions is evaluated by the fuzzy-supports of the linguistic terms in the large 1-itemsets and by the suitability of the derived membership functions. The evaluation by the fuzzy supports of large 1-itemsets is much faster than that when considering all itemsets or interesting association rules. It can also help divide-and-conquer the derivation process of the membership functions for different items. The proposed GA framework, thus, maintains multiple populations, each for one item's membership functions. The final best sets of membership functions in all the populations are then gathered together to be used for mining fuzzy association rules. Experiments are conducted to analyze different fitness functions and set different fitness functions and setting different supports and confidences. Experiments are also conducted to compare the proposed algorithm, the one with uniform fuzzy partition, and the existing one without divide-and-conquer, with results validating the performance of the proposed algorithm.  相似文献   

20.
加权关联规则挖掘算法的研究   总被引:20,自引:0,他引:20  
讨论了加权关联规则的挖掘算法,对布尔型属性,在挖掘算法MINWAL(O)和MINWAL(W)的基础上给出一种改进的加权关联规则挖掘算法,此算法能有效地考虑布尔型属必的重要性和规则中所含属性的个数,对数量型属性,应用竞争聚集算法将数量型属性划分成若干个模糊集,产系统地提出加权模糊关联规则的挖掘算法,此算法能有效地考虑数量型属性的重要性和规则中所含属性的个数,并适用于大型数据库。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号