首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Mining interesting imperfectly sporadic rules   总被引:1,自引:0,他引:1  
Detecting association rules with low support but high confidence is a difficult data mining problem. To find such rules using approaches like the Apriori algorithm, minimum support must be set very low, which results in a large number of redundant rules. We are interested in sporadic rules; i.e. those that fall below a maximum support level but above the level of support expected from random coincidence. There are two types of sporadic rules: perfectly sporadic and imperfectly sporadic. Here we are more concerned about finding imperfectly sporadic rules, where the support of the antecedent as a whole falls below maximum support, but where items may have quite high support individually. In this paper, we introduce an algorithm called Mining Interesting Imperfectly Sporadic Rules (MIISR) to find imperfectly sporadic rules efficiently, e.g. fever, headache, stiff neckmeningitis. Our proposed method uses item constraints and coincidence pruning to discover these rules in reasonable time. This paper is an expanded version of Koh et al. [Advances in knowledge discovery and data mining: 10th Pacific-Asia Conference (PAKDD 2006), Singapore. Lecture Notes in Computer Science 3918, Springer, Berlin, pp 473–482]. Yun Sing Koh is currently a Ph.D. student at the Department of Computer Science, University of Otago, New Zealand. Her main research interest is in association rule mining with particular interest in generating hard-to-find association rules and interestingness measures. She holds a B.Sc. (Honours) degree in computer science and a Master’s degree in software engineering, both from the University of Malaya, Malaysia. Nathan Rountree has been a faculty member of the Department of Computer Science at the University of Otago, Dunedin, since 1999. His research interests are in the fields of data mining, artificial neural networks, and computer science education. He is also a consulting software engineer for Profiler Corporation, a Dunedin-based company specialising in data mining and knowledge discovery. Richard A. O’Keefe holds a B.Sc. (Honours) degree in mathematics and physics, majoring in statistics, and an M.Sc. degree in physics (underwater acoustics), both obtained from the University of Auckland, New Zealand. He received his Ph.D. degree in artificial intelligence from the University of Edinburgh. He is the author of “The Craft of Prolog’’ (MIT Press). Dr. O’Keefe is now a lecturer at the University of Otago, New Zealand. His computing interests include declarative programming languages, especially Prolog and Erlang; statistical applications, including data mining and information retrieval; and applications of logic. He is also a member of the editorial board of theory and practice of logic programming.  相似文献   

2.
We develop techniques for discovering patterns with periodicity in this work. Patterns with periodicity are those that occur at regular time intervals, and therefore there are two aspects to the problem: finding the pattern, and determining the periodicity. The difficulty of the task lies in the problem of discovering these regular time intervals, i.e., the periodicity. Periodicities in the database are usually not very precise and have disturbances, and might occur at time intervals in multiple time granularities. To overcome these difficulties and to be able to discover the patterns with fuzzy periodicity, we propose the fuzzy periodic calendar which defines fuzzy periodicities. Furthermore, we develop algorithms for mining fuzzy periodicities and the fuzzy periodic association rules within them. Experimental results have shown that our method is effective in discovering fuzzy periodic association rules.  相似文献   

3.
Mining dynamic association rules with comments   总被引:2,自引:2,他引:0  
In this paper, we study a new problem of mining dynamic association rules with comments (DAR-C for short). A DAR-C contains not only rule itself, but also its comments that specify when to apply the rule. In order to formalize this problem, we first present the expression method of candidate effective time slots, and then propose several definitions concerning DAR-C. Subsequently, two algorithms, namely ITS2 and EFP-Growth2, are developed for handling the problem of mining DAR-C. In particular, ITS2 is an improved two-stage dynamic association rule mining algorithm, while EFP-Growth2 is based on the EFP-tree structure and is suitable for mining high-density mass data. Extensive experimental results demonstrate that the efficiency and scalability of our proposed two algorithms (i.e., ITS2 and EFP-Growth2) on DAR-C mining tasks, and their practicability on real retail dataset.  相似文献   

4.
Van  Trang  Le  Bac 《Applied Intelligence》2021,51(10):7208-7220
Applied Intelligence - Mining sequential rules from a sequence database usually returns a set of rules with great cardinality. However, in real world applications, the end-users are often...  相似文献   

5.
Mining optimized gain rules for numeric attributes   总被引:7,自引:0,他引:7  
Association rules are useful for determining correlations between attributes of a relation and have applications in the marketing, financial, and retail sectors. Furthermore, optimized association rules are an effective way to focus on the most interesting characteristics involving certain attributes. Optimized association rules are permitted to contain uninstantiated attributes and the problem is to determine instantiations such that either the support, confidence, or gain of the rule is maximized. In this paper, we generalize the optimized gain association rule problem by permitting rules to contain disjunctions over uninstantiated numeric attributes. Our generalized association rules enable us to extract more useful information about seasonal and local patterns involving the uninstantiated attribute. For rules containing a single numeric attribute, we present an algorithm with linear complexity for computing optimized gain rules. Furthermore, we propose a bucketing technique that can result in a significant reduction in input size by coalescing contiguous values without sacrificing optimality. We also present an approximation algorithm based on dynamic programming for two numeric attributes. Using recent results on binary space partitioning trees, we show that the approximations are within a constant factor of the optimal optimized gain rules. Our experimental results with synthetic data sets for a single numeric attribute demonstrate that our algorithm scales up linearly with the attribute's domain size as well as the number of disjunctions. In addition, we show that applying our optimized rule framework to a population survey real-life data set enables us to discover interesting underlying correlations among the attributes.  相似文献   

6.
Association rule mining is an important data analysis method that can discover associations within data. There are numerous previous studies that focus on finding fuzzy association rules from precise and certain data. Unfortunately, real-world data tends to be uncertain due to human errors, instrument errors, recording errors, and so on. Therefore, a question arising immediately is how we can mine fuzzy association rules from uncertain data. To this end, this paper proposes a representation scheme to represent uncertain data. This representation is based on possibility distributions because the possibility theory establishes a close connection between the concepts of similarity and uncertainty, providing an excellent framework for handling uncertain data. Then, we develop an algorithm to mine fuzzy association rules from uncertain data represented by possibility distributions. Experimental results from the survey data show that the proposed approach can discover interesting and valuable patterns with high certainty.  相似文献   

7.
A class-bridge rule is the rule whose antecedent and consequent belong to different conceptual classes. This kind of rules stands for the correlation between conceptual classes. The study on class-bridge rules can benefit many domains such as criminal detection, chemical synthesis, biological grafting, etc. Class-bridge rules are different from other association rules as follows: (1) class-bridge rules can be generated from infrequent itemsets; (2) measurements of class-bridge rules include the distance between conceptual classes and the relation between the antecedents/consequents and their affiliated conceptual classes. This paper proposes an algorithm based on rough sets to mine and evaluate class-bridge rules. The experiment demonstrates its effectiveness.  相似文献   

8.
Mining multiple-level association rules in large databases   总被引:2,自引:0,他引:2  
A top-down progressive deepening method is developed for efficient mining of multiple-level association rules from large transaction databases based on the a priori principle. A group of variant algorithms is proposed based on the ways of sharing intermediate results, with the relative performance tested and analyzed. The enforcement of different interestingness measurements to find more interesting rules, and the relaxation of rule conditions for finding “level-crossing” association rules, are also investigated. The study shows that efficient algorithms can be developed from large databases for the discovery of interesting and strong multiple-level association rules  相似文献   

9.
Conceptual and logical database design are complex tasks for non-expert designers. Currently, the popular data models for conceptual and logical database design are the entity–relationship (ER) and the relational model, respectively. Logical design methodologies for relational databases have relied on mathematically rigorous approaches which are impractical, or textbook approaches which do not provide the rich constructs to capture real applications. Consequently, designers have to use their intuition to develop their own rules and heuristics. There is a need, therefore, to develop practical rules and heuristics that can be used to handle the complexity of design in real applications. This paper proposes a realistic and detailed approach for conceptual design using the ER model for relational databases. The approach is based on four rules that specify the order in which various types of relationships must be modelled, three rules that pertain to detection of derived relationships, and three heuristics based on observation of constructs in real applications. The approach is illustrated by many examples.  相似文献   

10.
In association rule mining, the trade-off between avoiding harmful spurious rules and preserving authentic ones is an ever critical barrier to obtaining reliable and useful results. The statistically sound technique for evaluating statistical significance of association rules is superior in preventing spurious rules, yet can also cause severe loss of true rules in presence of data error. This study presents a new and improved method for statistical test on association rules with uncertain erroneous data. An original mathematical model was established to describe data error propagation through computational procedures of the statistical test. Based on the error model, a scheme combining analytic and simulative processes was designed to correct the statistical test for distortions caused by data error. Experiments on both synthetic and real-world data show that the method significantly recovers the loss in true rules (reduces type-2 error) due to data error occurring in original statistically sound method. Meanwhile, the new method maintains effective control over the familywise error rate, which is the distinctive advantage of the original statistically sound technique. Furthermore, the method is robust against inaccurate data error probability information and situations not fulfilling the commonly accepted assumption on independent error probabilities of different data items. The method is particularly effective for rules which were most practically meaningful yet sensitive to data error. The method proves promising in enhancing values of association rule mining results and helping users make correct decisions.  相似文献   

11.
Mining spatial association rules in image databases   总被引:2,自引:0,他引:2  
In this paper, we propose a novel spatial mining algorithm, called 9DLT-Miner, to mine the spatial association rules from an image database, where every image is represented by the 9DLT representation. The proposed method consists of two phases. First, we find all frequent patterns of length one. Next, we use frequent k-patterns (k ? 1) to generate all candidate (k + 1)-patterns. For each candidate pattern generated, we scan the database to count the pattern’s support and check if it is frequent. The steps in the second phase are repeated until no more frequent patterns can be found. Since our proposed algorithm prunes most of impossible candidates, it is more efficient than the Apriori algorithm. The experiment results show that 9DLT-Miner runs 2-5 times faster than the Apriori algorithm.  相似文献   

12.
Association rule mining is one of most popular data analysis methods that can discover associations within data. Association rule mining algorithms have been applied to various datasets, due to their practical usefulness. Little attention has been paid, however, on how to apply the association mining techniques to analyze questionnaire data. Therefore, this paper first identifies the various data types that may appear in a questionnaire. Then, we introduce the questionnaire data mining problem and define the rule patterns that can be mined from questionnaire data. A unified approach is developed based on fuzzy techniques so that all different data types can be handled in a uniform manner. After that, an algorithm is developed to discover fuzzy association rules from the questionnaire dataset. Finally, we evaluate the performance of the proposed algorithm, and the results indicate that our method is capable of finding interesting association rules that would have never been found by previous mining algorithms.  相似文献   

13.
宫雨 《计算机工程与设计》2007,28(24):5838-5840
约束关联规则是关联规则研究中的重要问题,目前的研究大多集中在单变量约束,对双变量约束的研究较少,而双变量约束在实际中也有重要作用.针对这种情况,提出了双变量约束中具有下界约束的关联规则问题.在此基础上,给出了下界约束的定义,然后分析了满足下界约束频繁集的性质,并给出了相关的证明.最后提出了基于FP-Tree的下界约束算法,采用了预先测试的方法,降低了需要测试项集的数量和计算成本.实验结果表明,该算法具有较高的效率.  相似文献   

14.
Data mining is most commonly used in attempts to induce association rules from databases which can help decision-makers easily analyze the data and make good decisions regarding the domains concerned. Different studies have proposed methods for mining association rules from databases with crisp values. However, the data in many real-world applications have a certain degree of imprecision. In this paper we address this problem, and propose a new data-mining algorithm for extracting interesting knowledge from databases with imprecise data. The proposed algorithm integrates imprecise data concepts and the fuzzy apriori mining algorithm to find interesting fuzzy association rules in given databases. Experiments for diagnosing dyslexia in early childhood were made to verify the performance of the proposed algorithm.  相似文献   

15.
《Intelligent Data Analysis》1998,2(1-4):165-185
Classification, which involves finding rules that partition a given dataset into disjoint groups, is one class of data mining problems. Approaches proposed so far for mining classification rules from databases are mainly decision tree based on symbolic learning methods. In this paper, we combine artificial neural network and genetic algorithm to mine classification rules. Some experiments have demonstrated that our method generates rules of better performance than the decision tree approach and the number of extracted rules is fewer than that of C4.5.  相似文献   

16.
Mining optimized association rules with categorical and numericattributes   总被引:1,自引:0,他引:1  
Mining association rules on large data sets has received considerable attention in recent years. Association rules are useful for determining correlations between attributes of a relation and have applications in marketing, financial, and retail sectors. Furthermore, optimized association rules are an effective way to focus on the most interesting characteristics involving certain attributes. Optimized association rules are permitted to contain uninstantiated attributes and the problem is to determine instantiations such that either the support or confidence of the rule is maximized. In this paper, we generalize the optimized association rules problem in three ways: (1) association rules are allowed to contain disjunctions over uninstantiated attributes, (2) association rules are permitted to contain an arbitrary number of uninstantiated attributes, and (3) uninstantiated attributes can be either categorical or numeric. Our generalized association rules enable us to extract more useful information about seasonal and local patterns involving multiple attributes. We present effective techniques for pruning the search space when computing optimized association rules for both categorical and numeric attributes. Finally, we report the results of our experiments that indicate that our pruning algorithms are efficient for a large number of uninstantiated attributes, disjunctions, and values in the domain of the attributes  相似文献   

17.
18.
Mining fuzzy association rules in a bank-account database   总被引:1,自引:0,他引:1  
This paper describes how we applied a fuzzy technique to a data-mining task involving a large database that was provided by an international bank with offices in Hong Kong. The database contains the demographic data of over 320,000 customers and their banking transactions, which were collected over a six-month period. By mining the database, the bank would like to be able to discover interesting patterns in the data. The bank expected that the hidden patterns would reveal different characteristics about different customers so that they could better serve and retain them. To help the bank achieve its goal, we developed a fuzzy technique, called fuzzy association rule mining II (FARM II). FARM II is able to handle both relational and transactional data. It can also handle fuzzy data. The former type of data allows FARM II to discover multidimensional association rules, whereas the latter data allows some of the patterns to be more easily revealed and expressed. To effectively uncover the hidden associations in the bank-account database, FARM II performs several steps which are described in detail in this paper. With FARM II, the bank discovered that they had identified some interesting characteristics about the customers who had once used the bank's loan services but then decided later to cease using them. The bank translated what they discovered into actionable items by offering some incentives to retain their existing customers.  相似文献   

19.
Journal of Intelligent Information Systems - Social Media have enabled users to keep inter-personal relationships, but also to voice personal sensations, emotions and feelings. The recent...  相似文献   

20.
Temporal data mining is still one of important research topic since there are application areas that need knowledge from temporal data such as sequential patterns, similar time sequences, cyclic and temporal association rules, and so on. Although there are many studies for temporal data mining, they do not deal with discovering knowledge from temporal interval data such as patient histories, purchaser histories, and web logs etc. We propose a new temporal data mining technique that can extract temporal interval relation rules from temporal interval data by using Allen’s theory: a preprocessing algorithm designed for the generalization of temporal interval data and a temporal relation algorithm for mining temporal relation rules from the generalized temporal interval data. This technique can provide more useful knowledge in comparison with conventional data mining techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号