共查询到20条相似文献,搜索用时 0 毫秒
1.
Multi-objective PSO algorithm for mining numerical association rules without a priori discretization
《Expert systems with applications》2014,41(9):4259-4273
In the domain of association rules mining (ARM) discovering the rules for numerical attributes is still a challenging issue. Most of the popular approaches for numerical ARM require a priori data discretization to handle the numerical attributes. Moreover, in the process of discovering relations among data, often more than one objective (quality measure) is required, and in most cases, such objectives include conflicting measures. In such a situation, it is recommended to obtain the optimal trade-off between objectives. This paper deals with the numerical ARM problem using a multi-objective perspective by proposing a multi-objective particle swarm optimization algorithm (i.e., MOPAR) for numerical ARM that discovers numerical association rules (ARs) in only one single step. To identify more efficient ARs, several objectives are defined in the proposed multi-objective optimization approach, including confidence, comprehensibility, and interestingness. Finally, by using the Pareto optimality the best ARs are extracted. To deal with numerical attributes, we use rough values containing lower and upper bounds to show the intervals of attributes. In the experimental section of the paper, we analyze the effect of operators used in this study, compare our method to the most popular evolutionary-based proposals for ARM and present an analysis of the mined ARs. The results show that MOPAR extracts reliable (with confidence values close to 95%), comprehensible, and interesting numerical ARs when attaining the optimal trade-off between confidence, comprehensibility and interestingness. 相似文献
2.
Discovering gene association networks by multi-objective evolutionary quantitative association rules
M. Martínez-Ballesteros I.A. Nepomuceno-Chamorro J.C. Riquelme 《Journal of Computer and System Sciences》2014
In the last decade, the interest in microarray technology has exponentially increased due to its ability to monitor the expression of thousands of genes simultaneously. The reconstruction of gene association networks from gene expression profiles is a relevant task and several statistical techniques have been proposed to build them. The problem lies in the process to discover which genes are more relevant and to identify the direct regulatory relationships among them. We developed a multi-objective evolutionary algorithm for mining quantitative association rules to deal with this problem. We applied our methodology named GarNet to a well-known microarray data of yeast cell cycle. The performance analysis of GarNet was organized in three steps similarly to the study performed by Gallo et al. GarNet outperformed the benchmark methods in most cases in terms of quality metrics of the networks, such as accuracy and precision, which were measured using YeastNet database as true network. Furthermore, the results were consistent with previous biological knowledge. 相似文献
3.
4.
An evolutionary algorithm to discover quantitative association rules in multidimensional time series
M. Mart��nez-Ballesteros F. Mart��nez-��lvarez A. Troncoso J. C. Riquelme 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2011,15(10):2065-2084
An evolutionary approach for finding existing relationships among several variables of a multidimensional time series is presented
in this work. The proposed model to discover these relationships is based on quantitative association rules. This algorithm,
called QARGA (Quantitative Association Rules by Genetic Algorithm), uses a particular codification of the individuals that
allows solving two basic problems. First, it does not perform a previous attribute discretization and, second, it is not necessary
to set which variables belong to the antecedent or consequent. Therefore, it may discover all underlying dependencies among
different variables. To evaluate the proposed algorithm three experiments have been carried out. As initial step, several
public datasets have been analyzed with the purpose of comparing with other existing evolutionary approaches. Also, the algorithm
has been applied to synthetic time series (where the relationships are known) to analyze its potential for discovering rules
in time series. Finally, a real-world multidimensional time series composed by several climatological variables has been considered.
All the results show a remarkable performance of QARGA. 相似文献
5.
Bilal Alataş Erhan Akin 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2006,10(3):230-237
In this paper, a genetic algorithm (GA) is proposed as a search strategy for not only positive but also negative quantitative
association rule (AR) mining within databases. Contrary to the methods used as usual, ARs are directly mined without generating
frequent itemsets. The proposed GA performs a database-independent approach that does not rely upon the minimum support and
the minimum confidence thresholds that are hard to determine for each database. Instead of randomly generated initial population,
uniform population that forces the initial population to be not far away from the solutions and distributes it in the feasible
region uniformly is used. An adaptive mutation probability, a new operator called uniform operator that ensures the genetic
diversity, and an efficient adjusted fitness function are used for mining all interesting ARs from the last population in
only single run of GA. The efficiency of the proposed GA is validated upon synthetic and real databases. 相似文献
6.
Most methods for mining association rules from tabular data mine simple rules which only use the equality operator “=” in their items. For quantitative attributes, approaches tend to discretize domain values by partitioning them into intervals. Limiting the operator only to “=” results in many interesting frequent patterns that may not be identified. It is obvious that where there is an order between objects, operators such as greater than or less than a given value are as important as the equality operator. This motivates us to extend association rules, from the simple equality operator, to a more general set of operators. We address the problem of mining general association rules in tabular data where rules can have all operators {?, >, ≠, =} in their antecedent part. The proposed algorithm, mining general rules (MGR), is applicable to datasets with discrete-ordered attributes and on quantitative discretized attributes. The proposed algorithm stores candidate general itemsets in a tree structure in such a way that supports of complex itemsets can be recursively computed from supports of simpler itemsets. The algorithm is shown to have benefits in terms of time complexity, memory management and has good potential for parallelization. 相似文献
7.
One of the major challenges in data mining is the extraction of comprehensible knowledge from recorded data. In this paper, a coevolutionary-based classification technique, namely COevolutionary Rule Extractor (CORE), is proposed to discover classification rules in data mining. Unlike existing approaches where candidate rules and rule sets are evolved at different stages in the classification process, the proposed CORE coevolves rules and rule sets concurrently in two cooperative populations to confine the search space and to produce good rule sets that are comprehensive. The proposed coevolutionary classification technique is extensively validated upon seven datasets obtained from the University of California, Irvine (UCI) machine learning repository, which are representative artificial and real-world data from various domains. Comparison results show that the proposed CORE produces comprehensive and good classification rules for most datasets, which are competitive as compared with existing classifiers in literature. Simulation results obtained from box plots also unveil that CORE is relatively robust and invariant to random partition of datasets. 相似文献
8.
Mining association rules are widely studied in data mining society. In this paper, we analyze the measure method of support–confidence framework for mining association rules, from which we find it tends to mine many redundant or unrelated rules besides the interesting ones. In order to ameliorate the criterion, we propose a new method of match as the substitution of confidence. We analyze in detail the property of the proposed measurement. Experimental results show that the generated rules by the improved method reveal high correlation between the antecedent and the consequent when the rules were compared with that produced by the support–confidence framework. Furthermore, the improved method decreases the generation of redundant rules. 相似文献
9.
Daniel Sánchez José María Serrano Ignacio Blanco Maria Jose Martín-Bautista María-Amparo Vila 《Data mining and knowledge discovery》2008,16(3):313-348
In this paper we deal with the problem of mining for approximate dependencies (AD) in relational databases. We introduce a
definition of AD based on the concept of association rule, by means of suitable definitions of the concepts of item and transaction.
This definition allow us to measure both the accuracy and support of an AD. We provide an interpretation of the new measures
based on the complexity of the theory (set of rules) that describes the dependence, and we employ this interpretation to compare
the new measures with existing ones. A methodology to adapt existing association rule mining algorithms to the task of discovering
ADs is introduced. The adapted algorithms obtain the set of ADs that hold in a relation with accuracy and support greater
than user-defined thresholds. The experiments we have performed show that our approach performs reasonably well over large
databases with real-world data. 相似文献
10.
Mining fuzzy association rules for classification problems 总被引:3,自引:0,他引:3
The effective development of data mining techniques for the discovery of knowledge from training samples for classification problems in industrial engineering is necessary in applications, such as group technology. This paper proposes a learning algorithm, which can be viewed as a knowledge acquisition tool, to effectively discover fuzzy association rules for classification problems. The consequence part of each rule is one class label. The proposed learning algorithm consists of two phases: one to generate large fuzzy grids from training samples by fuzzy partitioning in each attribute, and the other to generate fuzzy association rules for classification problems by large fuzzy grids. The proposed learning algorithm is implemented by scanning training samples stored in a database only once and applying a sequence of Boolean operations to generate fuzzy grids and fuzzy rules; therefore, it can be easily extended to discover other types of fuzzy association rules. The simulation results from the iris data demonstrate that the proposed learning algorithm can effectively derive fuzzy association rules for classification problems. 相似文献
11.
We present a new distributed association rule mining (D-ARM) algorithm that demonstrates superlinear speed-up with the number of computing nodes. The algorithm is the first D-ARM algorithm to perform a single scan over the database. As such, its performance is unmatched by any previous algorithm. Scale-up experiments over standard synthetic benchmarks demonstrate stable run time regardless of the number of computers. Theoretical analysis reveals a tighter bound on error probability than the one shown in the corresponding sequential algorithm. As a result of this tighter bound and by utilizing the combined memory of several computers, the algorithm generates far fewer candidates than comparable sequential algorithms—the same order of magnitude as the optimum. 相似文献
12.
Yue XuAuthor Vitae Yuefeng Li Author VitaeGavin Shaw Author Vitae 《Data & Knowledge Engineering》2011,70(6):555-575
Association rule mining has contributed to many advances in the area of knowledge discovery. However, the quality of the discovered association rules is a big concern and has drawn more and more attention recently. One problem with the quality of the discovered association rules is the huge size of the extracted rule set. Often for a dataset, a huge number of rules can be extracted, but many of them can be redundant to other rules and thus useless in practice. Mining non-redundant rules is a promising approach to solve this problem. In this paper, we first propose a definition for redundancy, then propose a concise representation, called a Reliable basis, for representing non-redundant association rules. The Reliable basis contains a set of non-redundant rules which are derived using frequent closed itemsets and their generators instead of using frequent itemsets that are usually used by traditional association rule mining approaches. An important contribution of this paper is that we propose to use the certainty factor as the criterion to measure the strength of the discovered association rules. Using this criterion, we can ensure the elimination of as many redundant rules as possible without reducing the inference capacity of the remaining extracted non-redundant rules. We prove that the redundancy elimination, based on the proposed Reliable basis, does not reduce the strength of belief in the extracted rules. We also prove that all association rules, their supports and confidences, can be retrieved from the Reliable basis without accessing the dataset. Therefore the Reliable basis is a lossless representation of association rules. Experimental results show that the proposed Reliable basis can significantly reduce the number of extracted rules. We also conduct experiments on the application of association rules to the area of product recommendation. The experimental results show that the non-redundant association rules extracted using the proposed method retain the same inference capacity as the entire rule set. This result indicates that using non-redundant rules only is sufficient to solve real problems needless using the entire rule set. 相似文献
13.
14.
《Expert systems with applications》2014,41(5):2259-2268
Business rules are an effective way to control data quality. Business experts can directly enter the rules into appropriate software without error prone communication with programmers. However, not all business situations and possible data quality problems can be considered in advance. In situations where business rules have not been defined yet, patterns of data handling may arise in practice. We employ data mining to accounting transactions in order to discover such patterns. The discovered patterns are represented in form of association rules. Then, deviations from discovered patterns can be marked as potential data quality violations that need to be examined by humans. Data quality breaches can be expensive but manual examination of many transactions is also expensive. Therefore, the goal is to find a balance between marking too many and too few transactions as being potentially erroneous. We apply appropriate procedures to evaluate the classification accuracy of developed association rules and support the decision on the number of deviations to be manually examined based on economic principles. 相似文献
15.
I. De Falco A. Della Cioppa A. Iazzetta E. Tarantino 《Knowledge and Information Systems》2005,7(2):179-201
The process of automatically extracting novel, useful and ultimately comprehensible information from large databases, known as data mining, has become of great importance due to the ever-increasing amounts of data collected by large organizations. In particular, the emphasis is devoted to heuristic search methods able to discover patterns that are hard or impossible to detect using standard query mechanisms and classical statistical techniques. In this paper an evolutionary system capable of extracting explicit classification rules is presented. Special interest is dedicated to find easily interpretable rules that may be used to make crucial decisions. A comparison with the findings achieved by other methods on a real problem, the breast cancer diagnosis, is performed. 相似文献
16.
提出一种基于免疫原理的人工免疫算法,用于模糊关联规则的挖掘.该算法通过借鉴生物免疫系统中的克隆选择原理来实施优化操作,它直接从给出的数据中,通过优化机制自动确定每个属性对应的模糊集合,使推导出的满足条件的模糊关联规则数目最多.将实际数据集和相关算法进行性能比较,实验结果表明了所提出算法的有效性. 相似文献
17.
Emerging applications introduce the requirement for novel association-rule mining algorithms that will be scalable not only with respect to the number of records (number of rows) but also with respect to the domain's size (number of columns). In this paper, we focus on the cases where the items of a large domain correlate with each other in a way that small worlds are formed, that is, the domain is clustered into groups with a large number of intra-group and a small number of inter-group correlations. This property appears in several real-world cases, e.g., in bioinformatics, e-commerce applications, and bibliographic analysis, and can help to significantly prune the search space so as to perform efficient association-rule mining. We develop an algorithm that partitions the domain of items according to their correlations and we describe a mining algorithm that carefully combines partitions to improve the efficiency. Our experiments show the superiority of the proposed method against existing algorithms, and that it overcomes the problems (e.g., increase in CPU cost and possible I/O thrashing) caused by existing algorithms due to the combination of a large domain and a large number of records. 相似文献
18.
In this paper, we propose an efficient method for mining all frequent inter-transaction patterns. The method consists of two phases. First, we devise two data structures: a dat-list, which stores the item information used to find frequent inter-transaction patterns; and an ITP-tree, which stores the discovered frequent inter-transaction patterns. In the second phase, we apply an algorithm, called ITP-Miner (Inter-Transaction Patterns Miner), to mine all frequent inter-transaction patterns. By using the ITP-tree, the algorithm requires only one database scan and can localize joining, pruning, and support counting to a small number of dat-lists. The experiment results show that the ITP-Miner algorithm outperforms the FITI (First Intra Then Inter) algorithm by one order of magnitude. 相似文献
19.
Chun-Hao Chen Guo-Cheng Lan Tzung-Pei Hong Yui-Kai Lin 《Expert systems with applications》2013,40(16):6531-6537
Data mining has been studied for a long time. Its goal is to help market managers find relationships among items from large databases and thus increase sales volume. Association-rule mining is one of the well known and commonly used techniques for this purpose. The Apriori algorithm is an important method for such a task. Based on the Apriori algorithm, lots of mining approaches have been proposed for diverse applications. Many of these data mining approaches focus on positive association rules such as “if milk is bought, then cookies are bought”. Such rules may, however, be misleading since there may be customers that buy milk and not buy cookies. This paper thus takes the properties of propositional logic into consideration and proposes an algorithm for mining highly coherent rules. The derived association rules are expected to be more meanful and reliable for business. Experiments on two datasets are also made to show the performance of the proposed approach. 相似文献
20.
This paper addresses the integration of fuzziness with On-Line Analytical Processing (OLAP) based association rules mining. It contributes to the ongoing research on multidimensional online association rules
mining by proposing a general architecture that utilizes a fuzzy data cube for knowledge discovery. A data cube is mainly constructed to provide users with the flexibility to view data from different
perspectives as some dimensions of the cube contain multiple levels of abstraction. The first step of the process described
in this paper involves introducing fuzzy data cube as a remedy to the problem of handling quantitative values of dimensional
attributes in a cube. This facilitates the online mining of fuzzy association rules at different levels within the constructed
fuzzy data cube. Then, we investigate combining the concepts of weight and multiple-level to mine fuzzy weighted multi-cross-level
association rules from the constructed fuzzy data cube. For this purpose, three different methods are introduced for single
dimension, multidimensional and hybrid (integrates the other two methods) fuzzy weighted association rules mining. Each of
the three methods utilizes a fuzzy data cube constructed to suite the particular method. To the best of our knowledge, this
is the first effort in this direction. We compared the proposed approach to an existing approach that does not utilize fuzziness.
Experimental results obtained for each of the three methods on a synthetic dataset and on the adult data of the United States
census in year 2000 demonstrate the effectiveness and applicability of the proposed fuzzy OLAP based mining approach.
OLAP is one of the most popular tools for on-line, fast and effective multidimensional data analysis.
In the OLAP framework, data is mainly stored in data hypercubes (simply called cubes). 相似文献