共查询到10条相似文献,搜索用时 46 毫秒
1.
在事务数据集中发现项目间的关联规则是数据挖掘的一个经典问题,但传统的关联规则挖掘方法对于大事务数据集而言,执行效率相对较低。已经有研究表明,采样技术能有效地改善挖掘效率。在分析现有采样方法的基础上,提出了一种新的基于采样的高效关联规则挖掘算法ESMA。该算法采用了更加有效的双向采样策略。通过实验分析表明,该算法明显地加快了大事务数据库中采样的速度,从而降低了CPU时间,而且具有很好的可扩展性。 相似文献
2.
Mining association rules are widely studied in data mining society. In this paper, we analyze the measure method of support–confidence framework for mining association rules, from which we find it tends to mine many redundant or unrelated rules besides the interesting ones. In order to ameliorate the criterion, we propose a new method of match as the substitution of confidence. We analyze in detail the property of the proposed measurement. Experimental results show that the generated rules by the improved method reveal high correlation between the antecedent and the consequent when the rules were compared with that produced by the support–confidence framework. Furthermore, the improved method decreases the generation of redundant rules. 相似文献
3.
Chun-Hao Chen Guo-Cheng Lan Tzung-Pei Hong Yui-Kai Lin 《Expert systems with applications》2013,40(16):6531-6537
Data mining has been studied for a long time. Its goal is to help market managers find relationships among items from large databases and thus increase sales volume. Association-rule mining is one of the well known and commonly used techniques for this purpose. The Apriori algorithm is an important method for such a task. Based on the Apriori algorithm, lots of mining approaches have been proposed for diverse applications. Many of these data mining approaches focus on positive association rules such as “if milk is bought, then cookies are bought”. Such rules may, however, be misleading since there may be customers that buy milk and not buy cookies. This paper thus takes the properties of propositional logic into consideration and proposes an algorithm for mining highly coherent rules. The derived association rules are expected to be more meanful and reliable for business. Experiments on two datasets are also made to show the performance of the proposed approach. 相似文献
4.
Association rules are a data mining technique used to discover frequent patterns in a data set. In this work, association
rules are used in the medical domain, where data sets are generally high dimensional and small. The chief disadvantage about
mining association rules in a high dimensional data set is the huge number of patterns that are discovered, most of which
are irrelevant or redundant. Several constraints are proposed for filtering purposes, since our aim is to discover only significant
association rules and accelerate the search process. A greedy algorithm is introduced to compute rule covers in order to summarize
rules having the same consequent. The significance of association rules is evaluated using three metrics: support, confidence
and lift. Experiments focus on discovering association rules on a real data set to predict absence or existence of heart disease.
Constraints are shown to significantly reduce the number of discovered rules and improve running time. Rule covers summarize
a large number of rules by producing a succinct set of rules with high-quality metrics.
Carlos Ordonez received a degree in applied mathematics (actuarial sciences) and an MS degree in computer science, both from the UNAM University,
Mexico, in 1992 and 1996, respectively. He got a PhD degree in computer science from the Georgia Institute of Technology,
USA, in 2000. Dr. Ordonez currently works for Teradata (NCR) conducting research on database and data mining technology. He
has published more than 20 research articles and holds three patents.
Norberto Ezquerra obtained his undergraduate degree in mathematics and physics from the University of South Florida, and his doctoral degree
from Florida State University, USA. He is an associate professor at the College of Computing at the Georgia Institute of Technology
and an adjunct faculty member in the Emory University School of Medicine. His research interests include computer graphics,
computer vision in medicine, AI in medicine, modeling of physically based systems, medical informatics and telemedicine. He
is associate editor of the IEEE Transactions on Medical Imaging Journal, and a member of the American Medical Informatics Association and the IEEE Engineering in Medicine Biology Society.
Cesar A. Santana received his MD degree in 1984 from the Institute of Medical Science, in Havana, Cuba. In 1988, he finished his residency
training in internal medicine, and in 1991, completed a fellowship in nuclear medicine in Havana, Cuba. Dr. Santana received
a PhD in nuclear cardiology in 1996 from the Department of Cardiology of the Vall d' Hebron University Hospital in Barcelona,
Spain. Dr. Santana is an assistant professor at the Emory University School of Medicine and conducts research in the Radiology
Department at the Emory University Hospital. 相似文献
5.
The basic goal of scene understanding is to organize the video into sets of events and to find the associated temporal dependencies. Such systems aim to automatically interpret activities in the scene, as well as detect unusual events that could be of particular interest, such as traffic violations and unauthorized entry. The objective of this work, therefore, is to learn behaviors of multi-agent actions and interactions in a semi-supervised manner. Using tracked object trajectories, we organize similar motion trajectories into clusters using the spectral clustering technique. This set of clusters depicts the different paths/routes, i.e., the distinct events taking place at various locations in the scene. A temporal mining algorithm is used to mine interval-based frequent temporal patterns occurring in the scene. A temporal pattern indicates a set of events that are linked based on their relationship with other events in the set, and we use Allen's interval-based temporal logic to describe these relations. The resulting frequent patterns are used to generate temporal association rules, which convey the semantic information contained in the scene. Our overall aim is to generate rules that govern the dynamics of the scene and perform anomaly detection. We apply the proposed approach on two publicly available complex traffic datasets and demonstrate considerable improvements over the existing techniques. 相似文献
6.
7.
《Expert systems with applications》2014,41(5):2259-2268
Business rules are an effective way to control data quality. Business experts can directly enter the rules into appropriate software without error prone communication with programmers. However, not all business situations and possible data quality problems can be considered in advance. In situations where business rules have not been defined yet, patterns of data handling may arise in practice. We employ data mining to accounting transactions in order to discover such patterns. The discovered patterns are represented in form of association rules. Then, deviations from discovered patterns can be marked as potential data quality violations that need to be examined by humans. Data quality breaches can be expensive but manual examination of many transactions is also expensive. Therefore, the goal is to find a balance between marking too many and too few transactions as being potentially erroneous. We apply appropriate procedures to evaluate the classification accuracy of developed association rules and support the decision on the number of deviations to be manually examined based on economic principles. 相似文献
8.
Emerging applications introduce the requirement for novel association-rule mining algorithms that will be scalable not only with respect to the number of records (number of rows) but also with respect to the domain's size (number of columns). In this paper, we focus on the cases where the items of a large domain correlate with each other in a way that small worlds are formed, that is, the domain is clustered into groups with a large number of intra-group and a small number of inter-group correlations. This property appears in several real-world cases, e.g., in bioinformatics, e-commerce applications, and bibliographic analysis, and can help to significantly prune the search space so as to perform efficient association-rule mining. We develop an algorithm that partitions the domain of items according to their correlations and we describe a mining algorithm that carefully combines partitions to improve the efficiency. Our experiments show the superiority of the proposed method against existing algorithms, and that it overcomes the problems (e.g., increase in CPU cost and possible I/O thrashing) caused by existing algorithms due to the combination of a large domain and a large number of records. 相似文献
9.
Yue XuAuthor Vitae Yuefeng Li Author VitaeGavin Shaw Author Vitae 《Data & Knowledge Engineering》2011,70(6):555-575
Association rule mining has contributed to many advances in the area of knowledge discovery. However, the quality of the discovered association rules is a big concern and has drawn more and more attention recently. One problem with the quality of the discovered association rules is the huge size of the extracted rule set. Often for a dataset, a huge number of rules can be extracted, but many of them can be redundant to other rules and thus useless in practice. Mining non-redundant rules is a promising approach to solve this problem. In this paper, we first propose a definition for redundancy, then propose a concise representation, called a Reliable basis, for representing non-redundant association rules. The Reliable basis contains a set of non-redundant rules which are derived using frequent closed itemsets and their generators instead of using frequent itemsets that are usually used by traditional association rule mining approaches. An important contribution of this paper is that we propose to use the certainty factor as the criterion to measure the strength of the discovered association rules. Using this criterion, we can ensure the elimination of as many redundant rules as possible without reducing the inference capacity of the remaining extracted non-redundant rules. We prove that the redundancy elimination, based on the proposed Reliable basis, does not reduce the strength of belief in the extracted rules. We also prove that all association rules, their supports and confidences, can be retrieved from the Reliable basis without accessing the dataset. Therefore the Reliable basis is a lossless representation of association rules. Experimental results show that the proposed Reliable basis can significantly reduce the number of extracted rules. We also conduct experiments on the application of association rules to the area of product recommendation. The experimental results show that the non-redundant association rules extracted using the proposed method retain the same inference capacity as the entire rule set. This result indicates that using non-redundant rules only is sufficient to solve real problems needless using the entire rule set. 相似文献
10.
Daniel Sánchez José María Serrano Ignacio Blanco Maria Jose Martín-Bautista María-Amparo Vila 《Data mining and knowledge discovery》2008,16(3):313-348
In this paper we deal with the problem of mining for approximate dependencies (AD) in relational databases. We introduce a
definition of AD based on the concept of association rule, by means of suitable definitions of the concepts of item and transaction.
This definition allow us to measure both the accuracy and support of an AD. We provide an interpretation of the new measures
based on the complexity of the theory (set of rules) that describes the dependence, and we employ this interpretation to compare
the new measures with existing ones. A methodology to adapt existing association rule mining algorithms to the task of discovering
ADs is introduced. The adapted algorithms obtain the set of ADs that hold in a relation with accuracy and support greater
than user-defined thresholds. The experiments we have performed show that our approach performs reasonably well over large
databases with real-world data. 相似文献