共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
Quantitative association rule (QAR) mining has been recognized an influential research problem over the last decade due to
the popularity of quantitative databases and the usefulness of association rules in real life. Unlike boolean association
rules (BARs), which only consider boolean attributes, QARs consist of quantitative attributes which contain much richer information
than the boolean attributes. However, the combination of these quantitative attributes and their value intervals always gives
rise to the generation of an explosively large number of itemsets, thereby severely degrading the mining efficiency. In this
paper, we propose an information-theoretic approach to avoid unrewarding combinations of both the attributes and their value
intervals being generated in the mining process. We study the mutual information between the attributes in a quantitative
database and devise a normalization on the mutual information to make it applicable in the context of QAR mining. To indicate
the strong informative relationships among the attributes, we construct a mutual information graph (MI graph), whose edges
are attribute pairs that have normalized mutual information no less than a predefined information threshold. We find that
the cliques in the MI graph represent a majority of the frequent itemsets. We also show that frequent itemsets that do not
form a clique in the MI graph are those whose attributes are not informatively correlated to each other. By utilizing the
cliques in the MI graph, we devise an efficient algorithm that significantly reduces the number of value intervals of the
attribute sets to be joined during the mining process. Extensive experiments show that our algorithm speeds up the mining
process by up to two orders of magnitude. Most importantly, we are able to obtain most of the high-confidence QARs, whereas
the QARs that are not returned by MIC are shown to be less interesting. 相似文献
4.
空间关联规则的双向挖掘 总被引:9,自引:0,他引:9
空间数据库中关联规则挖掘不仅需要考虑关系元组属性之间的关系——纵向关系,更需要挖掘元组之间的关系——横向关系,如相邻、相交、重叠等。本文通过分析空间数据库的存储模式,借鉴事务数据库关联规则的挖掘方法,对空间关联规则进行完整定义,并对规则的兴趣度度量进行探讨。根据挖掘的方向将空间数据挖掘归纳为纵向挖掘、横向挖掘、双向挖掘。在双向挖掘中,提出一种新算法,该算法根据挖掘任务进行约束,缩小挖掘空间,然后通过空间计算将空间关系转化为非空间关系,经过多次循环,获取非空间项集,进而挖掘出空间关联规则。据此提出空间数据双向挖掘工作流程,并通过实例进行了验证。 相似文献
5.
6.
基于属性分组的高效挖掘关联规则算法 总被引:6,自引:0,他引:6
挖掘频繁项集在数据挖掘中有着重要的作用。目前,关于频繁项集的挖掘问题已经提出了一些算法,虽然实现了一次扫描数据库即可以发现所有的频繁项集,但是当属性数目很多时,算法的执行效率下降很快。论文首次提出了利用属性分组作为挖掘关联规则的工具,给出了基于属性分组的频繁项集挖掘算法,用矩阵来存储数据库属性间的信息并提取频繁项集,而且不产生候选项集。经实验验证该算法是快速有效的。 相似文献
7.
Multi-objective genetic algorithms based automated clustering for fuzzy association rules mining 总被引:1,自引:0,他引:1
Researchers realized the importance of integrating fuzziness into association rules mining in databases with binary and quantitative
attributes. However, most of the earlier algorithms proposed for fuzzy association rules mining either assume that fuzzy sets
are given or employ a clustering algorithm, like CURE, to decide on fuzzy sets; for both cases the number of fuzzy sets is
pre-specified. In this paper, we propose an automated method to decide on the number of fuzzy sets and for the autonomous
mining of both fuzzy sets and fuzzy association rules. We achieve this by developing an automated clustering method based
on multi-objective Genetic Algorithms (GA); the aim of the proposed approach is to automatically cluster values of a quantitative
attribute in order to obtain large number of large itemsets in less time. We compare the proposed multi-objective GA based
approach with two other approaches, namely: 1) CURE-based approach, which is known as one of the most efficient clustering
algorithms; 2) Chien et al. clustering approach, which is an automatic interval partition method based on variation of density.
Experimental results on 100 K transactions extracted from the adult data of USA census in year 2000 showed that the proposed
automated clustering method exhibits good performance over both CURE-based approach and Chien et al.’s work in terms of runtime,
number of large itemsets and number of association rules. 相似文献
8.
常浩 《计算机工程与设计》2012,33(8):3224-3229
为了在事务数据库中发现关联规则,在现实挖掘应用中,经常采用不同的标准去判断不同项目的重要性,管理项目之间的分类关系和处理定量数据集这3个方法去处理问题,因此提出一个在定量事务数据库中采用多最小支持度,在项目集中获取隐含知识的多层模糊关联规则挖掘算法。该挖掘算法使用两种支持度约束和至上而下逐步细化的方法推导出频繁项集,同时可以发现交叉层次的模糊关联规则。通过实例证明了该挖掘算法在多最小支持度约束下推导出的多层模糊关联规则是易于理解和有意义的,具有很好的效率和伸缩性。 相似文献
9.
Mining interesting association rules from customer databases and transaction databases 总被引:1,自引:0,他引:1
In this paper, we examine a new data mining issue of mining association rules from customer databases and transaction databases. The problem is decomposed into two subproblems: identifying all the large itemsets from the transaction database and mining association rules from the customer database and the large itemsets identified. For the first subproblem, we propose an efficient algorithm to discover all the large itemsets from the transaction database. Experimental results show that by our approach, the total execution time can be reduced significantly. For the second subproblem, a relationship graph is constructed according to the identified large itemsets from the transaction database and the priorities of condition attributes from the customer database. Based on the relationship graph, we present an efficient graph-based algorithm to discover interesting association rules embedded in the transaction database and the customer database. 相似文献
10.
关联规则挖掘是数据挖掘的重要领域之一,利用粗糙集理论来挖掘关联规则的方法已经得到广泛关注.针对不完备信息系统,提出了基于粗糙集理论的快速ORD关联规则挖掘算法.该算法首先采用基于粗糙集理论的属性约简算法进行属性约简,然后采用快速、高效的冗余项集和冗余规则修剪算法--ORD算法获取关联规则.将该算法与其它同类流行的算法在4个UCI数据集上进行实验比较,结果表明该算法性能良好. 相似文献
11.
Feng-Hsu Wang 《Information Sciences》2008,178(7):1848-1876
In item promotion applications, there is a strong need for tools that can help to unlock the hidden profit within each individual customer’s transaction history. Discovering association patterns based on the data mining technique is helpful for this purpose. However, the conventional association mining approach, while generating “strong” association rules, cannot detect potential profit-building opportunities that can be exposed by “soft” association rules, which recommend items with looser but significant enough associations. This paper proposes a novel mining method that automatically detects hidden profit-building opportunities through discovering soft associations among items from historical transactions. Specifically, this paper proposes a relaxation method of association mining with a new support measurement, called soft support, that can be used for mining soft association patterns expressed with the “most” fuzzy quantifier. In addition, a novel measure for validating the soft-associated rules is proposed based on the estimated possibility of a conditioned quantified fuzzy event. The new measure is shown to be effective by comparison with several existing measures. A new association mining algorithm based on modification of the FT-Tree algorithm is proposed to accommodate this new support measure. Finally, the mining algorithm is applied to several data sets to investigate its effectiveness in finding soft patterns and content recommendation. 相似文献
12.
针对就业信息数据中存在着大量的量化属性和分类属性等现象,提出了一种基于k-means的量化关联规则挖掘方法。该方法利用聚类算法k-means对量化属性进行合理分区,将量化属性转化为布尔型;利用改进的布尔关联规则方法对此进行关联规则挖掘,找出学生的受教育属性和就业属性之间的关联性;对挖掘出的规则进行分析和运用。就业信息数据实验证明,文中所提方法对就业信息进行挖掘是有效的、可行的。 相似文献
13.
基于图的关联规则改进算法 总被引:1,自引:0,他引:1
黄红星 《计算机与数字工程》2009,37(12):38-41,162
关联规则挖掘是数据挖掘研究的最重要课题之一。基于图的关联规则挖掘DLG算法通过一次扫描数据库构建关联图,然后遍历该关联图产生频繁项集,有效地提高了关联规则挖掘的性能。在分析该算法基本原理基础上,提出了一种改进的算法—DLG#。改进算法在关联图构造同时构造项集关联矩阵,在候选项集生成时结合关联图和Apriori性质对冗余项集进行剪枝,减少了候选项集数,简化了候选项集的验证。比较实验结果表明,在不同数据集和不同支持度阈值下,改进算法都能更快速的发现频繁项集,当频繁项集平均长度较大时性能提高明显。 相似文献
14.
关联规则挖掘作为近年来的研究热点之一,其经典算法Apriori算法因需要多次扫描数据库且会产生大量候选项集,严重影响了关联规则的挖掘效率.在此基础上提出了一种基于矩阵压缩的加权关联规则挖掘算法,只需扫描一次数据库,并将其转换为0-1矩阵,根据相关性质对矩阵进行压缩,从而降低了算法执行过程中的计算量;同时,考虑到项目的重要性,采取加权的方法,用求概率的方式设置项目属性的权值.同Apriori算法相比,本算法在挖掘过程中能直接查找高阶频繁项集.实验结果表明,本算法能有效提高关联规则的挖掘效率. 相似文献
15.
16.
提出了一种新颖的频繁模式挖掘算法,该算法与现有的挖掘算法相比具有明显的优点,首先,该算法不需要产生候选项集,其次该算法具有更少的数据库扫描次数,该算法在中小型数据库上挖掘关联规则只需要扫描交易数据库一次,对于大型交易数据库的关联规则挖掘最多也只需要扫描交易数据库两次。因而,该算法与现有的频繁模式挖掘算法相比具有更高的效率。 相似文献
17.
发现频繁项集是关联规则挖掘中最基本、最重要的问题.提出了一种基于二进制表示的频繁项集挖掘算法,并利用二进制的性质快速产生候选项集并计算其支持度.算法总体性能在一定程度上得到了提高. 相似文献
18.
One of the more well-studied problems in data mining is the search for association rules in market basket data. Association rules are intended to identify patterns of the type: A customer purchasing item A often also purchases item B. Motivated partly by the goal of generalizing beyond market basket data and partly by the goal of ironing out some problems in the definition of association rules, we develop the notion of dependence rules that identify statistical dependence in both the presence and absence of items in itemsets. We propose measuring significance of dependence via the chi-squared test for independence from classical statistics. This leads to a measure that is upward-closed in the itemset lattice, enabling us to reduce the mining problem to the search for a border between dependent and independent itemsets in the lattice. We develop pruning strategies based on the closure property and thereby devise an efficient algorithm for discovering dependence rules. We demonstrate our algorithm's effectiveness by testing it on census data, text data (wherein we seek term dependence), and synthetic data. 相似文献
19.
Genetic-Fuzzy Data Mining With Divide-and-Conquer Strategy 总被引:1,自引:0,他引:1
Tzung-Pei Hong Chun-Hao Chen Yeong-Chyi Lee Yu-Lung Wu 《Evolutionary Computation, IEEE Transactions on》2008,12(2):252-265
Data mining is most commonly used in attempts to induce association rules from transaction data. Most previous studies focused on binary-valued transaction data. Transaction data in real-world applications, however, usually consist of quantitative values. This paper, thus, proposes a fuzzy data-mining algorithm for extracting both association rules and membership functions from quantitative transactions. A genetic algorithm (GA)-based framework for finding membership functions suitable for mining problems is proposed. The fitness of each set of membership functions is evaluated by the fuzzy-supports of the linguistic terms in the large 1-itemsets and by the suitability of the derived membership functions. The evaluation by the fuzzy supports of large 1-itemsets is much faster than that when considering all itemsets or interesting association rules. It can also help divide-and-conquer the derivation process of the membership functions for different items. The proposed GA framework, thus, maintains multiple populations, each for one item's membership functions. The final best sets of membership functions in all the populations are then gathered together to be used for mining fuzzy association rules. Experiments are conducted to analyze different fitness functions and set different fitness functions and setting different supports and confidences. Experiments are also conducted to compare the proposed algorithm, the one with uniform fuzzy partition, and the existing one without divide-and-conquer, with results validating the performance of the proposed algorithm. 相似文献
20.
加权关联规则挖掘算法的研究 总被引:20,自引:0,他引:20
陆建江 《计算机研究与发展》2002,39(10):1281-1286
讨论了加权关联规则的挖掘算法,对布尔型属性,在挖掘算法MINWAL(O)和MINWAL(W)的基础上给出一种改进的加权关联规则挖掘算法,此算法能有效地考虑布尔型属必的重要性和规则中所含属性的个数,对数量型属性,应用竞争聚集算法将数量型属性划分成若干个模糊集,产系统地提出加权模糊关联规则的挖掘算法,此算法能有效地考虑数量型属性的重要性和规则中所含属性的个数,并适用于大型数据库。 相似文献