首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In data mining applications, it is important to develop evaluation methods for selecting quality and profitable rules. This paper utilizes a non-parametric approach, Data Envelopment Analysis (DEA), to estimate and rank the efficiency of association rules with multiple criteria. The interestingness of association rules is conventionally measured based on support and confidence. For specific applications, domain knowledge can be further designed as measures to evaluate the discovered rules. For example, in market basket analysis, the product value and cross-selling profit associated with the association rule can serve as essential measures to rule interestingness. In this paper, these domain measures are also included in the rule ranking procedure for selecting valuable rules for implementation. An example of market basket analysis is applied to illustrate the DEA based methodology for measuring the efficiency of association rules with multiple criteria.  相似文献   

2.
In data mining applications, it is important to develop evaluation methods for selecting quality and profitable rules. This paper utilizes a non-parametric approach, Data Envelopment Analysis (DEA), to estimate and rank the efficiency of association rules with multiple criteria. The interestingness of association rules is conventionally measured based on support and confidence. For specific applications, domain knowledge can be further designed as measures to evaluate the discovered rules. For example, in market basket analysis, the product value and cross-selling profit associated with the association rule can serve as essential measures to rule interestingness. In this paper, these domain measures are also included in the rule ranking procedure for selecting valuable rules for implementation. An example of market basket analysis is applied to illustrate the DEA based methodology for measuring the efficiency of association rules with multiple criteria.  相似文献   

3.
影响关联规则挖掘的有趣性因素的研究   总被引:7,自引:2,他引:7  
关联规则挖掘是数据挖掘研究中的一个重要方面,而其中一个重要问题是对挖掘出的规则的感兴趣程度的评估。实际应用中可从数据源中挖掘出大量的规则,但这些规则中的大部分对用户来说是不一定感兴趣的。关联规则挖掘中的有趣性问题可从客观和主观两个方面对关联规则的兴趣度进行评测。利用模板将用户感兴趣的规则和不感兴趣的规则区分开,以此来完成关联规则有趣性的主观评测;在关联规则的置信度和支持度基础上对关联规则的有趣性的客观评测增加了约束。  相似文献   

4.
Recent research has shown that association rules are useful in gene expression data analysis. Interestingness measure plays an important role in the association rule mining on small sample size, high dimensionality, and noisy gene expression data. This work introduces two interestingness measures by exploring prior knowledge contained in open biological databases. They are Max-Pathway-Distance (MaxPD), which explores the gene’s relativity in Kyoto encyclopedia of genes and genomes Pathway, and Max-Chromosomal-Distance (MaxCD), which makes use of the distance among genes in the chromosome. The properties of our proposed interestingness measures are also explored to mine the interesting rules efficiently. Experimental results on four real-life gene expression datasets show the effectiveness of MaxPD and MaxCD in both classification accuracy and biological interpretability.  相似文献   

5.
Measures of interestingness play a crucial role in association rule mining. An important methodological problem, on which several papers appeared in the literature, is to provide a reasonable classification of the measures. In this paper, we explore Boolean factor analysis, which uses formal concepts corresponding to classes of measures as factors, for the purpose of clustering of the measures. Unlike the existing studies, our method reveals overlapping clusters of interestingness measures. We argue that the overlap between clusters is a desired feature of natural groupings of measures and that because formal concepts are used as factors in Boolean factor analysis, the resulting clusters have a clear meaning and are easy to interpret. We conduct three case studies on clustering of measures, provide interpretations of the resulting clusters and compare the results to those of the previous approaches reported in the literature.  相似文献   

6.
Bridging rules take the antecedent and action from different conceptual clusters. They are distinguished from association rules (frequent itemsets) because (1) they can be generated by the infrequent itemsets that are pruned in association rule mining, and (2) they are measured by their importance including the distance between two conceptual clusters, whereas frequent itemsets are measured only by their support. In this paper, we first design two algorithms for mining bridging rules between clusters, and then propose two non-linear metrics to measure their interestingness. We evaluate these algorithms experimentally and demonstrate that our approach is promising.  相似文献   

7.
关联规则挖掘是数据挖掘研究中的一个重要方面,而其中一个重要问题是对挖掘出的规则的兴趣度的评估,过去的研究发现,在实际应用中往往很容易从数据源中挖掘出大量的规则,但这些规则中的大部分对用户来说是不感兴趣的,本文对规则的兴趣度度量的两个方面作了讨论:一个是主观兴趣度度量,另一个是客观兴趣度度量,最后介绍了如何利用模板进行挖掘有趣的规则。  相似文献   

8.
关联规则兴趣度问题研究   总被引:5,自引:0,他引:5       下载免费PDF全文
梅志芳  王建 《计算机工程》2010,36(1):38-39,42
经典的关联规则都是使用基于支持度和可信度的度量标准,但经过实践应用证明存在很多问题。为此,引入兴趣度作为关联规则的新度量标准,阐述当前重点研究的客观兴趣度,对PS公式进行探讨,提出它的优点和不足,在此基础上进行相关改进,克服了可信度与支持度框架的缺陷,具有优化关联规则挖掘的作用。  相似文献   

9.
Many studies have shown the limits of the support/confidence framework used in Apriori ‐like algorithms to mine association rules. There are a lot of efficient implementations based on the antimonotony property of the support, but candidate set generation (e.g., frequent item set mining) is still costly. In addition, many rules are uninteresting or redundant and one can miss interesting rules like nuggets. We are thus facing a complexity issue and a quality issue. One solution is to not use frequent itemset mining and to focus as soon as possible on interesting rules using additional interestingness measures. We present here a formal framework that allows us to make a link between analytic and algorithmic properties of interestingness measures. We introduce the notion of optimonotony in relation with the optimal rule discovery framework. We then demonstrate a necessary and sufficient condition for the existence of optimonotony. This result can thus be applied to classify the measures. We study the case of 39 classical measures and show that 31 of them are optimonotone. These optimonotone measures can thus be used with an underlying pruning strategy. Empirical evaluations show that the pruning strategy is efficient and leads to the discovery of nuggets using an optimonotone measure and without the support constraint.  相似文献   

10.
11.
兴趣度量在关联规则挖掘中常用来发现那些潜在的令人感兴趣的模式,基于FP树结构的FP-growth算法是目前较高效的关联规则挖掘算法之一,如果挖掘潜在的有价值的低支持度模式,这种算法效率较低。为此,本文提出一种新的兴趣度量—项项正相关兴趣度量,该量度具有良好的反单调性,所得到的模式中任意一项在事务中的出现均可提升模式中其余项出现的可能性。同时,提出一种改进的FP挖掘算法,该算法采用一种压缩的FP树结构,并利用非递归调用方法来减少挖掘中建立额外条件模式树的开销。更为重要的是,在频繁项集挖掘中引入项项正相关兴趣度量剪枝策略,有效过滤掉非正相关长模式和无效项集,扩大了可挖掘支持度阈值范围。实验结果表明,该算法是有效和可行的。  相似文献   

12.
《国际计算机数学杂志》2012,89(11):2233-2245
A data mining algorithm, such as Apriori, discovers a huge number of association rules (ARs) and therefore efficiently ranking all these rules is an important issue. This paper suggests a data envelopment analysis (DEA) method for ranking the discovered ARs using a maximum discrimination between the interestingness criteria defined for all ARs. It is shown that the proposed DEA model has a unique optimal solution which can be computed efficiently when the maximum discrimination between the criteria, the difference between DEA weights, is considered. The contribution of this study can be explained as follows: First, we show that using the conventional DEA model for ranking ARs may produce an invalid result because the weights corresponding to interestingness criteria would not discriminate between the criteria. This is investigated for a dataset consisting of 46 ARs with four criteria, namely support, confidence, itemset value and cross-selling. The paper also introduces the maximum discrimination between the weights of the criteria and obtains the optimal solution of the corresponding DEA model efficiently without the need of solving the related mathematical models. On the other hand, this model concludes less number of useful rule(s). A comparative analysis is then used to show the advantage of the proposed DEA method.  相似文献   

13.
关联规则的鲁棒性分析与应用   总被引:1,自引:0,他引:1       下载免费PDF全文
首先对采用支持度、置信度和改善度阈值来选择关联规则的不足进行了分析,然后引入规则的鲁棒性分析,提出了基于lift 条件下的关联规则鲁棒性判断,能够在若干规则中选择出鲁棒性高的规则,提供决策参考,最后给出了实例应用。  相似文献   

14.
As the complexity of software systems grows, it becomes increasingly difficult for developers to be aware of all the dependencies that exist between artifacts (e.g., files or methods) of a system. Change recommendation has been proposed as a technique to overcome this problem, as it suggests to a developer relevant source-code artifacts related to her changes. Association rule mining has shown promise in deriving such recommendations by uncovering relevant patterns in the system’s change history. The strength of the mined association rules is captured using a variety of interestingness measures. However, state-of-the-art recommendation engines typically use only the rule with the highest interestingness value when more than one rule applies. In contrast, we argue that when multiple rules apply, this indicates collective evidence, and aggregating those rules (and their evidence) will lead to more accurate change recommendation. To investigate this hypothesis we conduct a large empirical study of 15 open source software systems and two systems from our industry partners. We evaluate association rule aggregation using four variants of the change history for each system studied, enabling us to compare two different levels of granularity in two different scenarios. Furthermore, we study 40 interestingness measures using the rules produced by two different mining algorithms. The results show that (1) between 13 and 90% of change recommendations can be improved by rule aggregation, (2) rule aggregation almost always improves change recommendation for both algorithms and all measures, and (3) fine-grained histories benefit more from rule aggregation.  相似文献   

15.
Numerous interestingness measures have been proposed in statistics and data mining to assess object relationships. This is especially important in recent studies of association or correlation pattern mining. However, it is still not clear whether there is any intrinsic relationship among many proposed measures, and which one is truly effective at gauging object relationships in large data sets. Recent studies have identified a critical property, null-(transaction) invariance, for measuring associations among events in large data sets, but many measures do not have this property. In this study, we re-examine a set of null-invariant interestingness measures and find that they can be expressed as the generalized mathematical mean, leading to a total ordering of them. Such a unified framework provides insights into the underlying philosophy of the measures and helps us understand and select the proper measure for different applications. Moreover, we propose a new measure called Imbalance Ratio to gauge the degree of skewness of a data set. We also discuss the efficient computation of interesting patterns of different null-invariant interestingness measures by proposing an algorithm, GAMiner, which complements previous studies. Experimental evaluation verifies the effectiveness of the unified framework and shows that GAMiner speeds up the state-of-the-art algorithm by an order of magnitude.  相似文献   

16.
17.
Assessing rules with interestingness measures is the pillar of successful application of association rules discovery. However, association rules discovered are normally large in number, some of which are not considered as interesting or significant for the application at hand. In this paper, we present a systematic approach to ascertain the discovered rules, and provide a precise statistical approach supporting this framework. The proposed strategy combines data mining and statistical measurement techniques, including redundancy analysis, sampling and multivariate statistical analysis, to discard the non- significant rules. Moreover, we consider real world datasets which are characterized by the uniform and non-uniform data/items distribution with a mixture of measurement levels throughout the data/items. The proposed unified framework is applied on these datasets to demonstrate its effectiveness in discarding many of the redundant or non-significant rules, while still preserving the high accuracy of the rule set as a whole.  相似文献   

18.
In the domain of association rules mining (ARM) discovering the rules for numerical attributes is still a challenging issue. Most of the popular approaches for numerical ARM require a priori data discretization to handle the numerical attributes. Moreover, in the process of discovering relations among data, often more than one objective (quality measure) is required, and in most cases, such objectives include conflicting measures. In such a situation, it is recommended to obtain the optimal trade-off between objectives. This paper deals with the numerical ARM problem using a multi-objective perspective by proposing a multi-objective particle swarm optimization algorithm (i.e., MOPAR) for numerical ARM that discovers numerical association rules (ARs) in only one single step. To identify more efficient ARs, several objectives are defined in the proposed multi-objective optimization approach, including confidence, comprehensibility, and interestingness. Finally, by using the Pareto optimality the best ARs are extracted. To deal with numerical attributes, we use rough values containing lower and upper bounds to show the intervals of attributes. In the experimental section of the paper, we analyze the effect of operators used in this study, compare our method to the most popular evolutionary-based proposals for ARM and present an analysis of the mined ARs. The results show that MOPAR extracts reliable (with confidence values close to 95%), comprehensible, and interesting numerical ARs when attaining the optimal trade-off between confidence, comprehensibility and interestingness.  相似文献   

19.
The discovery of association rules is a very efficient data mining technique that is especially suitable for large amounts of categorical data. This paper shows how the discovery of association rules can be of benefit for numeric data as well. Based on a review of previous approaches we introduce Q2, a faster algorithm for the discovery of multi-dimensional association rules over ordinal data. We experimentally compare the new algorithm with the previous approach, obtaining performance improvements of more than an order of magnitude on supermarket data. In addition, a new absolute measure for the interestingness of quantitative association rules is introduced. It is based on the view that quantitative association rules have to be interpreted with respect to their Boolean generalizations. This measure has two major benefits compared to the previously used relative interestingness measure; first, it speeds up rule extraction and evaluation and second, it is easier to interpret for a user. Finally we introduce a rule browser which supports the exploration of ordinal data with quantitative association rules.  相似文献   

20.
Knowledge discovery in databases is used to discover useful and understandable knowledge from large databases. A process of knowledge discovery consists of two steps, the data mining step and the evaluation step. In this paper, evaluating and ranking the interestingness of summaries generated from databases, which is a part of the second step, is studied using diversity measures. Sixteen previously analyzed diversity measures of interestingness are used along with three not previously considered ones, brought from different well-known areas. The latter three measures are evaluated theoretically according to five principles that a measure must satisfy to be qualified acceptable for ranking summaries. A theoretical correlation study between the eight measures that satisfy all five principles is presented based on mathematical proofs. An empirical evaluation is conducted using three real databases. Then, a classification of the eight measures is deduced. The resulting classification is used to reduce the number of measures to only two, which are the best over all criteria, and that produce non-similar results. This helps the user interpret the most important discovered knowledge in his decision making process.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号