首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
In data mining applications, it is important to develop evaluation methods for selecting quality and profitable rules. This paper utilizes a non-parametric approach, Data Envelopment Analysis (DEA), to estimate and rank the efficiency of association rules with multiple criteria. The interestingness of association rules is conventionally measured based on support and confidence. For specific applications, domain knowledge can be further designed as measures to evaluate the discovered rules. For example, in market basket analysis, the product value and cross-selling profit associated with the association rule can serve as essential measures to rule interestingness. In this paper, these domain measures are also included in the rule ranking procedure for selecting valuable rules for implementation. An example of market basket analysis is applied to illustrate the DEA based methodology for measuring the efficiency of association rules with multiple criteria.  相似文献   

2.
3.
《国际计算机数学杂志》2012,89(11):2233-2245
A data mining algorithm, such as Apriori, discovers a huge number of association rules (ARs) and therefore efficiently ranking all these rules is an important issue. This paper suggests a data envelopment analysis (DEA) method for ranking the discovered ARs using a maximum discrimination between the interestingness criteria defined for all ARs. It is shown that the proposed DEA model has a unique optimal solution which can be computed efficiently when the maximum discrimination between the criteria, the difference between DEA weights, is considered. The contribution of this study can be explained as follows: First, we show that using the conventional DEA model for ranking ARs may produce an invalid result because the weights corresponding to interestingness criteria would not discriminate between the criteria. This is investigated for a dataset consisting of 46 ARs with four criteria, namely support, confidence, itemset value and cross-selling. The paper also introduces the maximum discrimination between the weights of the criteria and obtains the optimal solution of the corresponding DEA model efficiently without the need of solving the related mathematical models. On the other hand, this model concludes less number of useful rule(s). A comparative analysis is then used to show the advantage of the proposed DEA method.  相似文献   

4.
Market basket analysis is one of the typical applications in mining association rules. The valuable information discovered from data mining can be used to support decision making. Generally, support and confidence (objective) measures are used to evaluate the interestingness of association rules. However, in some cases, by using these two measures, the discovered rules may be not profitable and not actionable (not interesting) to enterprises. Therefore, how to discover the patterns by considering both objective measures (e.g. probability) and subjective measures (e.g. profit) is a challenge in data mining, particularly in marketing applications. This paper focuses on pattern evaluation in the process of knowledge discovery by using the concept of profit mining. Data Envelopment Analysis is utilized to calculate the efficiency of discovered association rules with multiple objective and subjective measures. After evaluating the efficiency of association rules, they are categorized into two classes, relatively efficient (interesting) and relatively inefficient (uninteresting). To classify these two classes, Decision Tree (DT)‐based classifier is built by using the attributes of association rules. The DT classifier can be used to find out the characteristics of interesting association rules, and to classify the unknown (new) association rules.  相似文献   

5.
A number of studies, theoretical, empirical, or both, have been conducted to provide insight into the properties and behavior of interestingness measures for association rule mining. While each has value in its own right, most are either limited in scope or, more importantly, ignore the purpose for which interestingness measures are intended, namely the ultimate ranking of discovered association rules. This paper, therefore, focuses on an analysis of the rule-ranking behavior of 61 well-known interestingness measures tested on the rules generated from 110 different datasets. By clustering based on ranking behavior, we highlight, and formally prove, previously unreported equivalences among interestingness measures. We also show that there appear to be distinct clusters of interestingness measures, but that there remain differences among clusters, confirming that domain knowledge is essential to the selection of an appropriate interestingness measure for a particular task and business objective.  相似文献   

6.
7.
影响关联规则挖掘的有趣性因素的研究   总被引:7,自引:2,他引:7  
关联规则挖掘是数据挖掘研究中的一个重要方面,而其中一个重要问题是对挖掘出的规则的感兴趣程度的评估。实际应用中可从数据源中挖掘出大量的规则,但这些规则中的大部分对用户来说是不一定感兴趣的。关联规则挖掘中的有趣性问题可从客观和主观两个方面对关联规则的兴趣度进行评测。利用模板将用户感兴趣的规则和不感兴趣的规则区分开,以此来完成关联规则有趣性的主观评测;在关联规则的置信度和支持度基础上对关联规则的有趣性的客观评测增加了约束。  相似文献   

8.
挖掘所关注规则的多策略方法研究   总被引:20,自引:1,他引:19  
通过数据挖掘,从大型数据库中发现了大量规则,如何选取所关注的规则,是知识发现的重要研究内容。该文研究了利用领域知识对规则的主观关注程度进行度量的方法,给出了一个能够度量规则的简洁性和新奇性的客观关注程度的计算函数,提出了选取用户关注的规则的多策略方法。  相似文献   

9.
关联规则挖掘是数据挖掘研究中的一个重要方面,而其中一个重要问题是对挖掘出的规则的兴趣度的评估,过去的研究发现,在实际应用中往往很容易从数据源中挖掘出大量的规则,但这些规则中的大部分对用户来说是不感兴趣的,本文对规则的兴趣度度量的两个方面作了讨论:一个是主观兴趣度度量,另一个是客观兴趣度度量,最后介绍了如何利用模板进行挖掘有趣的规则。  相似文献   

10.
现有的关联规则挖掘算法没有考虑数据流中会话的非均匀分布特性和历史数据的作用,并且忽略了连续属性处理时的“尖锐边界”问题。针对这些问题,本文提出一种基于时间衰减模型的模糊会话关联规则挖掘算法。首先,针对数据流中会话的非均匀分布特性,基于时间片对会话进行划分,完整的保留了时间片内会话之间的相关性信息;然后,采用模糊集对会话的连续属性进行处理,增加了规则的兴趣度和可理解性;最后,在考虑历史数据作用和允许误差情况的基础上,基于时间衰减模型挖掘数据流中的临界频繁项集和模糊关联规则。实验结果表明,本文方法在提高时间效率、降低冗余率和增加规则兴趣度方面存在明显优势。  相似文献   

11.
The quality of discovered association rules is commonly evaluated by interestingness measures (commonly support and confidence) with the purpose of supplying indicators to the user in the understanding and use of the new discovered knowledge. Low-quality datasets have a very bad impact over the quality of the discovered association rules, and one might legitimately wonder if a so-called “interesting” rule noted LHSRHS is meaningful when 30% of the LHS data are not up-to-date anymore, 20% of the RHS data are not accurate, and 15% of the LHS data come from a data source that is well-known for its bad credibility. This paper presents an overview of data quality characterization and management techniques that can be advantageously employed for improving the quality awareness of the knowledge discovery and data mining processes. We propose to integrate data quality indicators for quality aware association rule mining. We propose a cost-based probabilistic model for selecting legitimately interesting rules. Experiments on the challenging KDD-Cup-98 datasets show that variations on data quality have a great impact on the cost and quality of discovered association rules and confirm our approach for the integrated management of data quality indicators into the KDD process that ensure the quality of data mining results.  相似文献   

12.
In a recent paper by Toloo et al. [Toloo, M., Sohrabi, B., & Nalchigar, S. (2009). A new method for ranking discovered rules from data mining by DEA. Expert Systems with Applications, 36, 8503–8508], they proposed a new integrated data envelopment analysis model to find most efficient association rule in data mining. Then, utilizing this model, an algorithm is developed for ranking association rules by considering multiple criteria. In this paper, we show that their model only selects one efficient association rule by chance and is totally depended on the solution method or software is used for solving the problem. In addition, it is shown that their proposed algorithm can only rank efficient rules randomly and will fail to rank inefficient DMUs. We also refer to some other drawbacks in that paper and propose another approach to set up a full ranking of the association rules. A numerical example illustrates some contents of the paper.  相似文献   

13.
As the complexity of software systems grows, it becomes increasingly difficult for developers to be aware of all the dependencies that exist between artifacts (e.g., files or methods) of a system. Change recommendation has been proposed as a technique to overcome this problem, as it suggests to a developer relevant source-code artifacts related to her changes. Association rule mining has shown promise in deriving such recommendations by uncovering relevant patterns in the system’s change history. The strength of the mined association rules is captured using a variety of interestingness measures. However, state-of-the-art recommendation engines typically use only the rule with the highest interestingness value when more than one rule applies. In contrast, we argue that when multiple rules apply, this indicates collective evidence, and aggregating those rules (and their evidence) will lead to more accurate change recommendation. To investigate this hypothesis we conduct a large empirical study of 15 open source software systems and two systems from our industry partners. We evaluate association rule aggregation using four variants of the change history for each system studied, enabling us to compare two different levels of granularity in two different scenarios. Furthermore, we study 40 interestingness measures using the rules produced by two different mining algorithms. The results show that (1) between 13 and 90% of change recommendations can be improved by rule aggregation, (2) rule aggregation almost always improves change recommendation for both algorithms and all measures, and (3) fine-grained histories benefit more from rule aggregation.  相似文献   

14.
Data envelopment analysis (DEA) is a mathematical approach for evaluating the efficiency of decision-making units (DMUs) that convert multiple inputs into multiple outputs. Traditional DEA models assume that all input and output data are known exactly. In many situations, however, some inputs and/or outputs take imprecise data. In this paper, we present optimistic and pessimistic perspectives for obtaining an efficiency evaluation for the DMU under consideration with imprecise data. Additionally, slacks-based measures of efficiency are used for direct assessment of efficiency in the presence of imprecise data with slack values. Finally, the geometric average of the two efficiency values is used to determine the DMU with the best performance. A ranking approach based on degree of preference is used for ranking the efficiency intervals of the DMUs. Two numerical examples are used to show the application of the proposed DEA approach.  相似文献   

15.
兴趣度量在关联规则挖掘中常用来发现那些潜在的令人感兴趣的模式,基于FP树结构的FP-growth算法是目前较高效的关联规则挖掘算法之一,如果挖掘潜在的有价值的低支持度模式,这种算法效率较低。为此,本文提出一种新的兴趣度量—项项正相关兴趣度量,该量度具有良好的反单调性,所得到的模式中任意一项在事务中的出现均可提升模式中其余项出现的可能性。同时,提出一种改进的FP挖掘算法,该算法采用一种压缩的FP树结构,并利用非递归调用方法来减少挖掘中建立额外条件模式树的开销。更为重要的是,在频繁项集挖掘中引入项项正相关兴趣度量剪枝策略,有效过滤掉非正相关长模式和无效项集,扩大了可挖掘支持度阈值范围。实验结果表明,该算法是有效和可行的。  相似文献   

16.
In the domain of association rules mining (ARM) discovering the rules for numerical attributes is still a challenging issue. Most of the popular approaches for numerical ARM require a priori data discretization to handle the numerical attributes. Moreover, in the process of discovering relations among data, often more than one objective (quality measure) is required, and in most cases, such objectives include conflicting measures. In such a situation, it is recommended to obtain the optimal trade-off between objectives. This paper deals with the numerical ARM problem using a multi-objective perspective by proposing a multi-objective particle swarm optimization algorithm (i.e., MOPAR) for numerical ARM that discovers numerical association rules (ARs) in only one single step. To identify more efficient ARs, several objectives are defined in the proposed multi-objective optimization approach, including confidence, comprehensibility, and interestingness. Finally, by using the Pareto optimality the best ARs are extracted. To deal with numerical attributes, we use rough values containing lower and upper bounds to show the intervals of attributes. In the experimental section of the paper, we analyze the effect of operators used in this study, compare our method to the most popular evolutionary-based proposals for ARM and present an analysis of the mined ARs. The results show that MOPAR extracts reliable (with confidence values close to 95%), comprehensible, and interesting numerical ARs when attaining the optimal trade-off between confidence, comprehensibility and interestingness.  相似文献   

17.
Using association rules as texture features   总被引:1,自引:0,他引:1  
A new type of texture feature based on association rules is proposed in this paper. Association rules have been used in applications such as market basket analysis to capture relationships present among items in large data sets. It is shown that association rules can be adapted to capture frequently occurring local structures in images. Association rules capture both structural and statistical information, and automatically identifies the structures that occur most frequently and relationships that have significant discriminative power. Methods for classification and segmentation of textured images using association rules as texture features are described. Simulation results using images consisting of man made and natural textures show that association rule features perform well compared to other widely used texture features. It is shown that association rule features can distinguish texture pairs with identical first, second, and third order statistics, and texture pairs that are not easily discriminable visually  相似文献   

18.
1 引言数据挖掘是一种新的商业信息处理技术,其主要特点是对商业数据库中的大量业务数据进行抽取、转换、分析和其他模型化处理,从中提取辅助商业决策的关键性数据。通常,经过某些数据挖掘工具的挖掘后,例如,文[1]所给出的快速算法,我们会得到大量的关联规则。对用户来说,从这些大量的规则中找出自己感兴趣的规则十分困难,而且,也  相似文献   

19.
The discovery of association rules is a very efficient data mining technique that is especially suitable for large amounts of categorical data. This paper shows how the discovery of association rules can be of benefit for numeric data as well. Based on a review of previous approaches we introduce Q2, a faster algorithm for the discovery of multi-dimensional association rules over ordinal data. We experimentally compare the new algorithm with the previous approach, obtaining performance improvements of more than an order of magnitude on supermarket data. In addition, a new absolute measure for the interestingness of quantitative association rules is introduced. It is based on the view that quantitative association rules have to be interpreted with respect to their Boolean generalizations. This measure has two major benefits compared to the previously used relative interestingness measure; first, it speeds up rule extraction and evaluation and second, it is easier to interpret for a user. Finally we introduce a rule browser which supports the exploration of ordinal data with quantitative association rules.  相似文献   

20.
《Knowledge》1999,12(5-6):309-315
This paper discusses several factors influencing the evaluation of the degree of interestingness of rules discovered by a data mining algorithm. This article aims at: (1) drawing attention to several factors related to rule interestingness that have been somewhat neglected in the literature; (2) showing some ways of modifying rule interestingness measures to take these factors into account; (3) introducing a new criterion to measure attribute surprisingness, as a factor influencing the interestingness of discovered rules.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号