共查询到20条相似文献,搜索用时 0 毫秒
1.
《国际计算机数学杂志》2012,89(1):69-80
Association rule is one of the data mining techniques involved in discovering information that represents the association among data. Data in the database sometimes appear infrequent but highly associated with a specific data. This paper proposes a technique for significant rare data by introducing second support in discovering the association rules of such data. We show that the proposed approach provides better performance as compared to standard association rules techniques. 相似文献
2.
一种关联规则增量更新算法 总被引:22,自引:0,他引:22
针对事务数据库的内容不断增加后相应关联规则的更新问题,提出了一种简单高效的增量式关联规则挖掘算法SFUA,并和已有的FUP算法进行了分析比较。 相似文献
3.
《Expert systems with applications》2014,41(5):2259-2268
Business rules are an effective way to control data quality. Business experts can directly enter the rules into appropriate software without error prone communication with programmers. However, not all business situations and possible data quality problems can be considered in advance. In situations where business rules have not been defined yet, patterns of data handling may arise in practice. We employ data mining to accounting transactions in order to discover such patterns. The discovered patterns are represented in form of association rules. Then, deviations from discovered patterns can be marked as potential data quality violations that need to be examined by humans. Data quality breaches can be expensive but manual examination of many transactions is also expensive. Therefore, the goal is to find a balance between marking too many and too few transactions as being potentially erroneous. We apply appropriate procedures to evaluate the classification accuracy of developed association rules and support the decision on the number of deviations to be manually examined based on economic principles. 相似文献
4.
Daniel Sánchez José María Serrano Ignacio Blanco Maria Jose Martín-Bautista María-Amparo Vila 《Data mining and knowledge discovery》2008,16(3):313-348
In this paper we deal with the problem of mining for approximate dependencies (AD) in relational databases. We introduce a
definition of AD based on the concept of association rule, by means of suitable definitions of the concepts of item and transaction.
This definition allow us to measure both the accuracy and support of an AD. We provide an interpretation of the new measures
based on the complexity of the theory (set of rules) that describes the dependence, and we employ this interpretation to compare
the new measures with existing ones. A methodology to adapt existing association rule mining algorithms to the task of discovering
ADs is introduced. The adapted algorithms obtain the set of ADs that hold in a relation with accuracy and support greater
than user-defined thresholds. The experiments we have performed show that our approach performs reasonably well over large
databases with real-world data. 相似文献
5.
In data mining applications, it is important to develop evaluation methods for selecting quality and profitable rules. This paper utilizes a non-parametric approach, Data Envelopment Analysis (DEA), to estimate and rank the efficiency of association rules with multiple criteria. The interestingness of association rules is conventionally measured based on support and confidence. For specific applications, domain knowledge can be further designed as measures to evaluate the discovered rules. For example, in market basket analysis, the product value and cross-selling profit associated with the association rule can serve as essential measures to rule interestingness. In this paper, these domain measures are also included in the rule ranking procedure for selecting valuable rules for implementation. An example of market basket analysis is applied to illustrate the DEA based methodology for measuring the efficiency of association rules with multiple criteria. 相似文献
6.
An evolutionary algorithm to discover quantitative association rules from huge databases without the need for an a priori discretization 总被引:1,自引:0,他引:1
Association rules are one of the most frequently used tools for finding relationships between different attributes in a database. There are various techniques for obtaining these rules, the most common of which are those which give categorical association rules. However, when we need to relate attributes which are numeric and discrete, we turn to methods which generate quantitative association rules, a far less studied method than the above. In addition, when the database is extremely large, many of these tools cannot be used. In this paper, we present an evolutionary tool for finding association rules in databases (both small and large) comprising quantitative and categorical attributes without the need for an a priori discretization of the domain of the numeric attributes. Finally, we evaluate the tool using both real and synthetic databases. 相似文献
7.
可增量更新的关联规则挖掘算法 总被引:3,自引:0,他引:3
本文给出了一种新奇有效的增量式关联规则挖掘算法,以处理因事务数据库内容增加后相应关联规则的更新问题,该算法认真研究了关联规则挖掘过程中的数据存储的结构,充分利用以前挖掘的结果,从而大大减少了对数据的重复扫描,提高了数据挖掘算法的效率。 相似文献
8.
Multi-objective PSO algorithm for mining numerical association rules without a priori discretization
《Expert systems with applications》2014,41(9):4259-4273
In the domain of association rules mining (ARM) discovering the rules for numerical attributes is still a challenging issue. Most of the popular approaches for numerical ARM require a priori data discretization to handle the numerical attributes. Moreover, in the process of discovering relations among data, often more than one objective (quality measure) is required, and in most cases, such objectives include conflicting measures. In such a situation, it is recommended to obtain the optimal trade-off between objectives. This paper deals with the numerical ARM problem using a multi-objective perspective by proposing a multi-objective particle swarm optimization algorithm (i.e., MOPAR) for numerical ARM that discovers numerical association rules (ARs) in only one single step. To identify more efficient ARs, several objectives are defined in the proposed multi-objective optimization approach, including confidence, comprehensibility, and interestingness. Finally, by using the Pareto optimality the best ARs are extracted. To deal with numerical attributes, we use rough values containing lower and upper bounds to show the intervals of attributes. In the experimental section of the paper, we analyze the effect of operators used in this study, compare our method to the most popular evolutionary-based proposals for ARM and present an analysis of the mined ARs. The results show that MOPAR extracts reliable (with confidence values close to 95%), comprehensible, and interesting numerical ARs when attaining the optimal trade-off between confidence, comprehensibility and interestingness. 相似文献
9.
The amount of ontologies and semantic annotations available on the Web is constantly growing. This new type of complex and heterogeneous graph-structured data raises new challenges for the data mining community. In this paper, we present a novel method for mining association rules from semantic instance data repositories expressed in RDF/(S) and OWL. We take advantage of the schema-level (i.e. Tbox) knowledge encoded in the ontology to derive appropriate transactions which will later feed traditional association rules algorithms. This process is guided by the analyst requirements, expressed in the form of query patterns. Initial experiments performed on semantic data of a biomedical application show the usefulness and efficiency of the approach. 相似文献
10.
数据挖掘的一个重要方面是挖掘关联规则,目前已提出了包括经典算法Apriori在内的许多算法,而在实际关联规则的挖掘过程中,用户将需要不断调整用于描述用户兴趣程度的阈值:最小支持度和最小置信度。如何维护已发现的关联规则变得至关重要。该文提出的GIUA算法解决了在数据库D不变的情况下,最小支持度和最小置信度发生变化时关联规则的维护问题,最大效率地利用原有结果,通过动态分组将连接步和修剪步的循环减到最少,并尽可能地将挖掘过程并行化。 相似文献
11.
关联规则的挖掘算法已被数据库界广泛研究,这些关联规则挖掘算法在原有的基础上不断被优化、改进,使挖掘的效率不断提高.本文讨论了度变小的情况下典型的增量算法IUA算法以及它已有的一些改进,分析了特点与不足,在此基础上,提出了改进算法QIUA算法,并且从理论和实验两个方面验证了算法的可行性与有效性. 相似文献
12.
13.
一种改进的关联规则挖掘算法 总被引:9,自引:0,他引:9
目前,已经提出了许多挖掘关联规则的算法,其中最著名的是Apriori算法及其变型。这些传统的算法大多存在项集生成瓶颈和难以确定合适的支持度阈值的问题,并且没有考虑数据库的被分析项的各自不同的重要性。为了解决这些问题,该文提出了一种新的关联挖掘算法。 相似文献
14.
Mining association rules are widely studied in data mining society. In this paper, we analyze the measure method of support–confidence framework for mining association rules, from which we find it tends to mine many redundant or unrelated rules besides the interesting ones. In order to ameliorate the criterion, we propose a new method of match as the substitution of confidence. We analyze in detail the property of the proposed measurement. Experimental results show that the generated rules by the improved method reveal high correlation between the antecedent and the consequent when the rules were compared with that produced by the support–confidence framework. Furthermore, the improved method decreases the generation of redundant rules. 相似文献
15.
16.
17.
郭有强 《计算机应用与软件》2010,27(3):97-99,130
在频集更新算法的研究中,关于数据集减量式的更新算法研究比较少。提出一种最小支持度和置信度不变,从事务数据库中删除一个事务数据集后,如何高效地生成变化后的事务数据库频集的算法。算法在如何充分利用以往挖掘过程中的信息,避免多次扫描数据集以及如何减少候选集的规模等方面进行了研究,给出了算法的实现。通过对实验结果的性能对比分析,表明算法是可行、有效的。 相似文献
18.
基于关联规则数据挖掘技术在数据仓库中的应用 总被引:2,自引:0,他引:2
介绍了关联规则的基本概念,以及关联规则的种类和核心算法。详细说明了多层关联规则的主要内容,结合实际数据在数据仓库和数据挖掘中的处理过程,讨论了关联规则在科学数据库系统中的应用。最后介绍了多维关联规则挖掘技术的基本概念和关键问题的处理方法。 相似文献
19.
一种基于Apriori的高效关联规则挖掘算法的研究 总被引:32,自引:3,他引:32
文章在关联规则挖掘算法Apriori的基础上,分析和探讨了AprioriTid算法,并给出了该算法的实现思想,同时通过实例说明了算法的执行过程。 相似文献
20.