共查询到20条相似文献,搜索用时 46 毫秒
1.
The paper focuses on the adaptive relational association rule mining problem. Relational association rules represent a particular type of association rules which describe frequent relations that occur between the features characterizing the instances within a data set. We aim at re-mining an object set, previously mined, when the feature set characterizing the objects increases. An adaptive relational association rule method, based on the discovery of interesting relational association rules, is proposed. This method, called ARARM (Adaptive Relational Association Rule Mining) adapts the set of rules that was established by mining the data before the feature set changed, preserving the completeness. We aim to reach the result more efficiently than running the mining algorithm again from scratch on the feature-extended object set. Experiments testing the method's performance on several case studies are also reported. The obtained results highlight the efficiency of the ARARM method and confirm the potential of our proposal. 相似文献
2.
Linh T. T. Nguyen Bay Vo Loan T. T. Nguyen Philippe Fournier-Viger Ali Selamat 《Applied Intelligence》2018,48(5):1148-1160
Mining association rules plays an important role in data mining and knowledge discovery since it can reveal strong associations between items in databases. Nevertheless, an important problem with traditional association rule mining methods is that they can generate a huge amount of association rules depending on how parameters are set. However, users are often only interested in finding the strongest rules, and do not want to go through a large amount of rules or wait for these rules to be generated. To address those needs, algorithms have been proposed to mine the top-k association rules in databases, where users can directly set a parameter k to obtain the k most frequent rules. However, a major issue with these techniques is that they remain very costly in terms of execution time and memory. To address this issue, this paper presents a novel algorithm named ETARM (Efficient Top-k Association Rule Miner) to efficiently find the complete set of top-k association rules. The proposed algorithm integrates two novel candidate pruning properties to more effectively reduce the search space. These properties are applied during the candidate selection process to identify items that should not be used to expand a rule based on its confidence, to reduce the number of candidates. An extensive experimental evaluation on six standard benchmark datasets show that the proposed approach outperforms the state-of-the-art TopKRules algorithm both in terms of runtime and memory usage. 相似文献
3.
Quantitative association rule (QAR) mining has been recognized an influential research problem over the last decade due to
the popularity of quantitative databases and the usefulness of association rules in real life. Unlike boolean association
rules (BARs), which only consider boolean attributes, QARs consist of quantitative attributes which contain much richer information
than the boolean attributes. However, the combination of these quantitative attributes and their value intervals always gives
rise to the generation of an explosively large number of itemsets, thereby severely degrading the mining efficiency. In this
paper, we propose an information-theoretic approach to avoid unrewarding combinations of both the attributes and their value
intervals being generated in the mining process. We study the mutual information between the attributes in a quantitative
database and devise a normalization on the mutual information to make it applicable in the context of QAR mining. To indicate
the strong informative relationships among the attributes, we construct a mutual information graph (MI graph), whose edges
are attribute pairs that have normalized mutual information no less than a predefined information threshold. We find that
the cliques in the MI graph represent a majority of the frequent itemsets. We also show that frequent itemsets that do not
form a clique in the MI graph are those whose attributes are not informatively correlated to each other. By utilizing the
cliques in the MI graph, we devise an efficient algorithm that significantly reduces the number of value intervals of the
attribute sets to be joined during the mining process. Extensive experiments show that our algorithm speeds up the mining
process by up to two orders of magnitude. Most importantly, we are able to obtain most of the high-confidence QARs, whereas
the QARs that are not returned by MIC are shown to be less interesting. 相似文献
4.
Loan T.T. Nguyen Bay Vo Tzung-Pei Hong Hoang Chi Thanh 《Expert systems with applications》2012,39(13):11357-11366
Classification plays an important role in decision support systems. A lot of methods for mining classification rules have been developed in recent years, such as C4.5 and ILA. These methods are, however, based on heuristics and greedy approaches to generate rule sets that are either too general or too overfitting for a given dataset. They thus often yield high error ratios. Recently, a new method for classification from data mining, called the Classification Based on Associations (CBA), has been proposed for mining class-association rules (CARs). This method has more advantages than the heuristic and greedy methods in that the former could easily remove noise, and the accuracy is thus higher. It can additionally generate a rule set that is more complete than C4.5 and ILA. One of the weaknesses of mining CARs is that it consumes more time than C4.5 and ILA because it has to check its generated rule with the set of the other rules. We thus propose an efficient pruning approach to build a classifier quickly. Firstly, we design a lattice structure and propose an algorithm for fast mining CARs using this lattice. Secondly, we develop some theorems and propose an algorithm for pruning redundant rules quickly based on these theorems. Experimental results also show that the proposed approach is more efficient than those used previously. 相似文献
5.
6.
《Computers & chemistry》1986,10(2):153-161
We present a package of FORTRAN modules to perform general and efficient I/O operations in external sort, bin sort and related environments. A partition of a user logical record of length IRLU into NREC records of length IRL is carried out when IRLU exceeds the maximum permissible record length for a given computer and file organization. Efficient random direct access to a file requires that IRL be a multiple of the smallest addressable unit on a disk. Overcoming implied DO lists in I/O statements becomes significant in tight memory environments. Corresponding gains in execution times are discussed. 相似文献
7.
In the rapidly changing financial market, investors always have difficulty in deciding the right time to trade. In order to enhance investment profitability, investors desire a decision support system. The proposed artificial intelligence methodology provides investors with the ability to learn the association among different parameters. After the associations are extracted, investors can apply the rules in their decision support systems. In this work, the model is built with the ultimate goal of predicting the level of the Hang Seng Index in Hong Kong. The movement of Hang Seng Index, which is associated with other economics indices including the gross domestic product (GDP) index, the consumer price index (CPI), the interest rate, and the export value of goods from Hong Kong, is learnt by the proposed method. The case study shows that the proposed method is a feasible way to provide decision support for investors who may not be able to identify the hidden rules between the Hang Seng Index and other economics indices. 相似文献
8.
Jeffrey Xu Yu Zhiheng Li Guimei Liu 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(4):947-970
Data mining has attracted a lot of research efforts during the past decade. However, little work has been reported on the
efficiency of supporting a large number of users who issue different data mining queries periodically when there are new needs
and when data is updated. Our work is motivated by the fact that the pattern-growth method is one of the most efficient methods
for frequent pattern mining which constructs an initial tree and mines frequent patterns on top of the tree. In this paper,
we present a data mining proxy approach that can reduce the I/O costs to construct an initial tree by utilizing the trees that have already been resident in memory. The tree we construct
is the smallest for a given data mining query. In addition, our proxy approach can also reduce CPU cost in mining patterns,
because the cost of mining relies on the sizes of trees. The focus of the work is to construct an initial tree efficiently.
We propose three tree operations to construct a tree. With a unique coding scheme, we can efficiently project subtrees from
on-disk trees or in-memory trees. Our performance study indicated that the data mining proxy significantly reduces the I/O cost to construct trees and CPU cost to mine patterns over the trees constructed. 相似文献
9.
Recently, a utility-based mining approach has emerged as an alternative mechanism to frequency-based mining in an attempt to reflect not only the statistical correlation but also the semantic significance (e.g., price and quantity) of items. However, existing mining trajectories utilizing high-utility itemsets may not offer firms sufficient business insights unless they can precisely assess the value of association rules, which may vary substantially depending on many business parameters included in the assessment. In this study, we propose a utility-based association-rule mining method that valuates association rules by measuring their specific business benefits accruing to firms. Based on previous studies, three key elements (opportunity, effectiveness, and probability) are identified to define and operationalize a users’ preference as a utility function. To apply the utility-based mechanism to the processing of large transaction databases, we constructed functional algorithms, with heightened attention paid to their pruning strategies, and evaluated them based on real-world databases. Experimental results show that the proposed approach can provide users with greater business benefits than the high-utility itemset mining approach, suggesting several important strategic implications for both research and practice. 相似文献
10.
《Journal of Network and Computer Applications》2007,30(3):1216-1227
This paper introduces a new approach to a problem of data sharing among multiple parties, without disclosing the data between the parties. Our focus is data sharing among parties involved in a data mining task. We study how to share private or confidential data in the following scenario: multiple parties, each having a private data set, want to collaboratively conduct association rule mining without disclosing their private data to each other or any other parties. To tackle this demanding problem, we develop a secure protocol for multiple parties to conduct the desired computation. The solution is distributed, i.e., there is no central, trusted party having access to all the data. Instead, we define a protocol using homomorphic encryption techniques to exchange the data while keeping it private. 相似文献
11.
随着旅游业的发展,从海量旅行数据中挖掘旅客类型和环境因素之间内在的、隐含的相关性,是分析旅游市场状况、预测对相关行业影响的一种有效方法。结合旅行数据特点,并针对现有约束方法的局限性,提出一种基于关系延展路径约束的关联规则并行挖掘算法。该算法有效结合MapReduce并行机制,在关系延展路径约束下生成事务集,提升后续并行效率;同时利用并行方法改进Apriori算法的逐层搜索,带来“二次”效率提升,从而更好更快地把握旅游业发展动态,调整旅游业宏观政策。 相似文献
12.
The most computationally demanding aspect of Association Rule Mining is the identification and counting of support of the frequent sets of items that occur together sufficiently often to be the basis of potentially interesting rules. The task increases in difficulty with the scale of the data and also with its density. The greatest challenge is posed by data that is too large to be contained in primary memory, especially when high data density and/or low support thresholds give rise to very large numbers of candidates that must be counted. In this paper, we consider strategies for partitioning the data to deal effectively with such cases. We describe a partitioning approach which organises the data into tree structures that can be processed independently. We present experimental results that show the method scales well for increasing dimensions of data and performs significantly better than alternatives, especially when dealing with dense data and low support thresholds.
Shakil Ahmed received a first class BSc (Hons) degree from Dhaka University, Bangladesh, in 1990; and an MSc (first class), also Dhaka University, in 1992. He received his PhD from The University of Liverpool, UK, in 2005. From 2000 onwards he is a member of the Data Mining Group at the Department of Computer Science of the University of Liverpool, UK. His research interests include data mining, Association Rule Mining and pattern recognition.
Frans Coenen has been working in the field of Data Mining for many years and has written widely on the subject. He received his PhD from Liverpool Polytechnic in 1989, after which he took up a post as a RA within the Department of Computer Science at the University of Liverpool. In 1997, he took up a lecturing post within the same department. His current Data Mining research interests include Association rule Mining, Classification algorithms and text mining. He is on the programme committee for ICDM'05 and was the chair for the UK KDD symposium (UKKDD'05).
Paul Leng is professor of e-Learning at the University of Liverpool and director of the e-Learning Unit, which is responsible for overseeing the University's online degree programmes, leading to degrees of MSc in IT and MBA. Along with e-Learning, his main research interests are in Data Mining, especially in methods of discovering Association Rules. In collaboration with Frans Coenen, he has developed efficient new algorithms for finding frequent sets and is exploring applications in text mining and classification. 相似文献
13.
In this paper, we present an alternative approach for mining regular association rules and maximal association rules from transactional datasets using soft set theory. This approach is started by a transformation of a transactional dataset into a Boolean-valued information system. Since the “standard” soft set deals with such information system, thus a transactional dataset can be represented as a soft set. Using the concept of parameters co-occurrence in a transaction, we define the notion of regular and maximal association rules between two sets of parameters, also their support, confidence and maximal support, maximal confidences, respectively properly using soft set theory. The results show that the soft regular and soft maximal association rules provide identical rules as compared to the regular and maximal association rules. 相似文献
14.
15.
Bilal Sowan Keshav Dahal M.A. Hossain Li Zhang Linda Spencer 《Expert systems with applications》2013,40(17):6928-6937
This paper presents an investigation into two fuzzy association rule mining models for enhancing prediction performance. The first model (the FCM–Apriori model) integrates Fuzzy C-Means (FCM) and the Apriori approach for road traffic performance prediction. FCM is used to define the membership functions of fuzzy sets and the Apriori approach is employed to identify the Fuzzy Association Rules (FARs). The proposed model extracts knowledge from a database for a Fuzzy Inference System (FIS) that can be used in prediction of a future value. The knowledge extraction process and the performance of the model are demonstrated through two case studies of road traffic data sets with different sizes. The experimental results show the merits and capability of the proposed KD model in FARs based knowledge extraction. The second model (the FCM–MSapriori model) integrates FCM and a Multiple Support Apriori (MSapriori) approach to extract the FARs. These FARs provide the knowledge base to be utilized within the FIS for prediction evaluation. Experimental results have shown that the FCM–MSapriori model predicted the future values effectively and outperformed the FCM–Apriori model and other models reported in the literature. 相似文献
16.
17.
基于TD-FP-growth的模糊关联规则挖掘算法 总被引:1,自引:0,他引:1
提出一种基于TD_FP-growth的模糊关联规则挖掘算法.首先,使用3种t-模算子以及由其产生的蕴涵算子计算模糊频繁项的支持度和规则的蕴涵度,产生的关联规则能表示模糊项间的确定性和渐近性逻辑语义;然后,以事务的惟一标识为键值,散列存储每个事务相对FP-tree中每个结点所表示模糊项的隶属度,使TD-FP-growth适用于模糊频繁项的挖掘,并分析了算法的时间和空间复杂度;最后,实验结果表明该算法比基于apriori的模糊频繁项挖掘算法在时间方面更加有效.Abstract: An algorithm based on TD-FP-growth is proposed for mining fuzzy association rule, which uses three kinds of t-norm operator to calculate the support degree of fuzzy frequent items, and adopts corresponding implication operator to measure implication degree of fuzzy association rule.The association rule mined by the algorithm can express the logic semantic of graduality and certainty between fuzzy items.Each transaction's membership degree versus fuzzy item denoted by FP-tree's node is stored by hash technology, and each transaction's identifier is regarded as key value, which adapts TD-FP-growth to mine fuzzy frequent items.The time and space complexity of the algorithm are analyzed.The experimental results show that the algorithm is more effective than the fuzzy frequent item mining algorithm based on apriori in term of time. 相似文献
18.
Recent research has shown that association rules are useful in gene expression data analysis. Interestingness measure plays an important role in the association rule mining on small sample size, high dimensionality, and noisy gene expression data. This work introduces two interestingness measures by exploring prior knowledge contained in open biological databases. They are Max-Pathway-Distance (MaxPD), which explores the gene’s relativity in Kyoto encyclopedia of genes and genomes Pathway, and Max-Chromosomal-Distance (MaxCD), which makes use of the distance among genes in the chromosome. The properties of our proposed interestingness measures are also explored to mine the interesting rules efficiently. Experimental results on four real-life gene expression datasets show the effectiveness of MaxPD and MaxCD in both classification accuracy and biological interpretability. 相似文献
19.
Wei Ding Christoph F. Eick Xiaojing Yuan Jing Wang Jean-Philippe Nicot 《GeoInformatica》2011,15(1):1-28
The motivation for regional association rule mining and scoping is driven by the facts that global statistics seldom provide
useful insight and that most relationships in spatial datasets are geographically regional, rather than global. Furthermore,
when using traditional association rule mining, regional patterns frequently fail to be discovered due to insufficient global
confidence and/or support. In this paper, we systematically study this problem and address the unique challenges of regional
association mining and scoping: (1) region discovery: how to identify interesting regions from which novel and useful regional
association rules can be extracted; (2) regional association rule scoping: how to determine the scope of regional association
rules. We investigate the duality between regional association rules and regions where the associations are valid: interesting
regions are identified to seek novel regional patterns, and a regional pattern has a scope of a set of regions in which the
pattern is valid. In particular, we present a reward-based region discovery framework that employs a divisive grid-based supervised
clustering for region discovery. We evaluate our approach in a real-world case study to identify spatial risk patterns from
arsenic in the Texas water supply. Our experimental results confirm and validate research results in the study of arsenic
contamination, and our work leads to the discovery of novel findings to be further explored by domain scientists. 相似文献
20.
Chiraz Latiri Hatem Haddad Tarek Hamrouni 《Journal of Intelligent Information Systems》2012,39(1):209-247
The steady growth in the size of textual document collections is a key progress-driver for modern information retrieval techniques whose effectiveness and efficiency are constantly challenged. Given a user query, the number of retrieved documents can be overwhelmingly large, hampering their efficient exploitation by the user. In addition, retaining only relevant documents in a query answer is of paramount importance for an effective meeting of the user needs. In this situation, the query expansion technique offers an interesting solution for obtaining a complete answer while preserving the quality of retained documents. This mainly relies on an accurate choice of the added terms to an initial query. Interestingly enough, query expansion takes advantage of large text volumes by extracting statistical information about index terms co-occurrences and using it to make user queries better fit the real information needs. In this respect, a promising track consists in the application of data mining methods to the extraction of dependencies between terms. In this paper, we present a novel approach for mining knowledge supporting query expansion that is based on association rules. The key feature of our approach is a better trade-off between the size of the mining result and the conveyed knowledge. Thus, our association rules mining method implements results from Galois connection theory and compact representations of rules sets in order to reduce the huge number of potentially useful associations. An experimental study has examined the application of our approach to some real collections, whereby automatic query expansion has been performed. The results of the study show a significant improvement in the performances of the information retrieval system, both in terms of recall and precision, as highlighted by the carried out significance testing using the Wilcoxon?test. 相似文献