首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Although the shared memory abstraction is gaining ground as a programming abstraction for parallel computing, the main platforms that support it, small-scale symmetric multiprocessors (SMPs) and hardware cache-coherent distributed shared memory systems (DSMs), seem to lie inherently at the extremes of the cost-performance spectrum for parallel systems. In this paper we examine if shared virtual memory (SVM) clusters can bridge this gap by examining how application performance scales on a state-of-the-art shared virtual memory cluster. We find that: (i) The level of application restructuring needed is quite high compared to applications that perform well on a DSM system of the same scale and larger problem sizes are needed for good performance. (ii) However, surprisingly, SVM performs quite well for a fairly wide range of applications, achieving at least half the parallel efficiency of a high-end DSM system at the same scale and often much more.  相似文献   

2.
Mining interesting imperfectly sporadic rules   总被引:1,自引:0,他引:1  
Detecting association rules with low support but high confidence is a difficult data mining problem. To find such rules using approaches like the Apriori algorithm, minimum support must be set very low, which results in a large number of redundant rules. We are interested in sporadic rules; i.e. those that fall below a maximum support level but above the level of support expected from random coincidence. There are two types of sporadic rules: perfectly sporadic and imperfectly sporadic. Here we are more concerned about finding imperfectly sporadic rules, where the support of the antecedent as a whole falls below maximum support, but where items may have quite high support individually. In this paper, we introduce an algorithm called Mining Interesting Imperfectly Sporadic Rules (MIISR) to find imperfectly sporadic rules efficiently, e.g. fever, headache, stiff neckmeningitis. Our proposed method uses item constraints and coincidence pruning to discover these rules in reasonable time. This paper is an expanded version of Koh et al. [Advances in knowledge discovery and data mining: 10th Pacific-Asia Conference (PAKDD 2006), Singapore. Lecture Notes in Computer Science 3918, Springer, Berlin, pp 473–482]. Yun Sing Koh is currently a Ph.D. student at the Department of Computer Science, University of Otago, New Zealand. Her main research interest is in association rule mining with particular interest in generating hard-to-find association rules and interestingness measures. She holds a B.Sc. (Honours) degree in computer science and a Master’s degree in software engineering, both from the University of Malaya, Malaysia. Nathan Rountree has been a faculty member of the Department of Computer Science at the University of Otago, Dunedin, since 1999. His research interests are in the fields of data mining, artificial neural networks, and computer science education. He is also a consulting software engineer for Profiler Corporation, a Dunedin-based company specialising in data mining and knowledge discovery. Richard A. O’Keefe holds a B.Sc. (Honours) degree in mathematics and physics, majoring in statistics, and an M.Sc. degree in physics (underwater acoustics), both obtained from the University of Auckland, New Zealand. He received his Ph.D. degree in artificial intelligence from the University of Edinburgh. He is the author of “The Craft of Prolog’’ (MIT Press). Dr. O’Keefe is now a lecturer at the University of Otago, New Zealand. His computing interests include declarative programming languages, especially Prolog and Erlang; statistical applications, including data mining and information retrieval; and applications of logic. He is also a member of the editorial board of theory and practice of logic programming.  相似文献   

3.
借助模糊概念和模糊运算,对时间区间的描述很容易实现。对于指定的日历模式,不同的时间区间可根据它们的隶属度具有不同的权重。在模糊日历代数基础上,结合增量挖掘和累进计数的思想,提出了一种基于模糊日历的模糊时序关联规则挖掘方法。理论分析和实验结果均表明,该算法是高效可行的。  相似文献   

4.
We develop techniques for discovering patterns with periodicity in this work. Patterns with periodicity are those that occur at regular time intervals, and therefore there are two aspects to the problem: finding the pattern, and determining the periodicity. The difficulty of the task lies in the problem of discovering these regular time intervals, i.e., the periodicity. Periodicities in the database are usually not very precise and have disturbances, and might occur at time intervals in multiple time granularities. To overcome these difficulties and to be able to discover the patterns with fuzzy periodicity, we propose the fuzzy periodic calendar which defines fuzzy periodicities. Furthermore, we develop algorithms for mining fuzzy periodicities and the fuzzy periodic association rules within them. Experimental results have shown that our method is effective in discovering fuzzy periodic association rules.  相似文献   

5.
Abstract: The concept of fuzzy sets is one of the most fundamental and influential tools in the development of computational intelligence. In this paper the fuzzy pincer search algorithm is proposed. It generates fuzzy association rules by adopting combined top-down and bottom-up approaches. A fuzzy grid representation is used to reduce the number of scans of the database and our algorithm trims down the number of candidate fuzzy grids at each level. It has been observed that fuzzy association rules provide more realistic visualization of the knowledge extracted from databases.  相似文献   

6.
高置信度关联规则的挖掘   总被引:2,自引:1,他引:2       下载免费PDF全文
传统的关联规则和基于效用的关联规则,会忽略一些支持度或效用值不高、置信度(又称可信度)却非常高的规则,这些置信度很高的规则能帮助人们满足规避风险、提高成功率的期望。为挖掘这些低支持度(或效用值)、高置信度的规则,提出了HCARM算法。HCARM采用了划分的方法来处理大数据集,利用新的剪枝策略压缩搜索空间。同时,通过设定长度阈值minlen,使HCARM适合长模式挖掘。实验结果表明,该方法对高置信度长模式有效。  相似文献   

7.
数据库中动态关联规则的挖掘   总被引:7,自引:0,他引:7  
关联规则能挖掘变量间的相互依赖关系,但是不能反映规则本身的变化规律.为此本文提出了动态关联规则.首先将整个待挖掘数据集按时间划分成若干子集,每个子集挖掘得到的每条规则分别生成一个支持度和一个置信度,这样每条规则在全集上就对应了一个支持度向量和一个置信度向量.通过分析支持度向量和置信度向量,不仅可以发现规则随时间变化的情况,也能够预测规则的发展趋势.本文还提出了两个挖掘动态关联规则的算法,且对他们做了比较.并给出了柱状图和时间序列两种方法分析这两个向量.最后给出了一个挖掘动态关联规则的应用实例。  相似文献   

8.
Data-mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Most conventional data-mining algorithms identify the relationships among transactions using binary values, however, transactions with quantitative values are commonly seen in real-world applications. This paper thus proposes a new data-mining algorithm for extracting interesting knowledge from transactions stored as quantitative values. The proposed algorithm integrates fuzzy set concepts and the apriori mining algorithm to find interesting fuzzy association rules in given transaction data sets. Experiments with student grades at I-Shou University were also made to verify the performance of the proposed algorithm.  相似文献   

9.
Mining dynamic association rules with comments   总被引:2,自引:2,他引:0  
In this paper, we study a new problem of mining dynamic association rules with comments (DAR-C for short). A DAR-C contains not only rule itself, but also its comments that specify when to apply the rule. In order to formalize this problem, we first present the expression method of candidate effective time slots, and then propose several definitions concerning DAR-C. Subsequently, two algorithms, namely ITS2 and EFP-Growth2, are developed for handling the problem of mining DAR-C. In particular, ITS2 is an improved two-stage dynamic association rule mining algorithm, while EFP-Growth2 is based on the EFP-tree structure and is suitable for mining high-density mass data. Extensive experimental results demonstrate that the efficiency and scalability of our proposed two algorithms (i.e., ITS2 and EFP-Growth2) on DAR-C mining tasks, and their practicability on real retail dataset.  相似文献   

10.
Van  Trang  Le  Bac 《Applied Intelligence》2021,51(10):7208-7220
Applied Intelligence - Mining sequential rules from a sequence database usually returns a set of rules with great cardinality. However, in real world applications, the end-users are often...  相似文献   

11.
Conceptual and logical database design are complex tasks for non-expert designers. Currently, the popular data models for conceptual and logical database design are the entity–relationship (ER) and the relational model, respectively. Logical design methodologies for relational databases have relied on mathematically rigorous approaches which are impractical, or textbook approaches which do not provide the rich constructs to capture real applications. Consequently, designers have to use their intuition to develop their own rules and heuristics. There is a need, therefore, to develop practical rules and heuristics that can be used to handle the complexity of design in real applications. This paper proposes a realistic and detailed approach for conceptual design using the ER model for relational databases. The approach is based on four rules that specify the order in which various types of relationships must be modelled, three rules that pertain to detection of derived relationships, and three heuristics based on observation of constructs in real applications. The approach is illustrated by many examples.  相似文献   

12.
Mining optimized gain rules for numeric attributes   总被引:7,自引:0,他引:7  
Association rules are useful for determining correlations between attributes of a relation and have applications in the marketing, financial, and retail sectors. Furthermore, optimized association rules are an effective way to focus on the most interesting characteristics involving certain attributes. Optimized association rules are permitted to contain uninstantiated attributes and the problem is to determine instantiations such that either the support, confidence, or gain of the rule is maximized. In this paper, we generalize the optimized gain association rule problem by permitting rules to contain disjunctions over uninstantiated numeric attributes. Our generalized association rules enable us to extract more useful information about seasonal and local patterns involving the uninstantiated attribute. For rules containing a single numeric attribute, we present an algorithm with linear complexity for computing optimized gain rules. Furthermore, we propose a bucketing technique that can result in a significant reduction in input size by coalescing contiguous values without sacrificing optimality. We also present an approximation algorithm based on dynamic programming for two numeric attributes. Using recent results on binary space partitioning trees, we show that the approximations are within a constant factor of the optimal optimized gain rules. Our experimental results with synthetic data sets for a single numeric attribute demonstrate that our algorithm scales up linearly with the attribute's domain size as well as the number of disjunctions. In addition, we show that applying our optimized rule framework to a population survey real-life data set enables us to discover interesting underlying correlations among the attributes.  相似文献   

13.
Association rule mining is an important data analysis method that can discover associations within data. There are numerous previous studies that focus on finding fuzzy association rules from precise and certain data. Unfortunately, real-world data tends to be uncertain due to human errors, instrument errors, recording errors, and so on. Therefore, a question arising immediately is how we can mine fuzzy association rules from uncertain data. To this end, this paper proposes a representation scheme to represent uncertain data. This representation is based on possibility distributions because the possibility theory establishes a close connection between the concepts of similarity and uncertainty, providing an excellent framework for handling uncertain data. Then, we develop an algorithm to mine fuzzy association rules from uncertain data represented by possibility distributions. Experimental results from the survey data show that the proposed approach can discover interesting and valuable patterns with high certainty.  相似文献   

14.
Mining multiple-level association rules in large databases   总被引:2,自引:0,他引:2  
A top-down progressive deepening method is developed for efficient mining of multiple-level association rules from large transaction databases based on the a priori principle. A group of variant algorithms is proposed based on the ways of sharing intermediate results, with the relative performance tested and analyzed. The enforcement of different interestingness measurements to find more interesting rules, and the relaxation of rule conditions for finding “level-crossing” association rules, are also investigated. The study shows that efficient algorithms can be developed from large databases for the discovery of interesting and strong multiple-level association rules  相似文献   

15.
《Information Systems》2001,26(6):425-444
Mining association rules on large data sets have received considerable attention in recent years. Association rules are useful for determining correlations between attributes of a relation and have applications in marketing, financial and retail sectors. Furthermore, optimized association rules are an effective way to focus on the most interesting characteristics involving certain attributes. Optimized association rules are permitted to contain uninstantiated attributes and the problem is to determine instantiations such that either the support, confidence or gain of the rule is maximized. In this paper, we generalize the optimized support association rule problem by permitting rules to contain disjunctions over uninstantiated numeric attributes. Our generalized association rules enable us to extract more useful information about seasonal and local patterns involving the uninstantiated attribute. For rules containing a single numeric attribute, we present a dynamic programming algorithm for computing optimized association rules. Furthermore, we propose bucketing technique for reducing the input size, and a divide and conquer strategy that improves the performance significantly without sacrificing optimality. We also present approximation algorithms based on dynamic programming for two numeric attributes. Our experimental results for a single numeric attribute indicate that our bucketing and divide and conquer enhancements are very effective in reducing the execution times and memory requirements of our dynamic programming algorithm. Furthermore, they show that our algorithms scale up almost linearly with the attribute's domain size as well as the number of disjunctions.  相似文献   

16.
A class-bridge rule is the rule whose antecedent and consequent belong to different conceptual classes. This kind of rules stands for the correlation between conceptual classes. The study on class-bridge rules can benefit many domains such as criminal detection, chemical synthesis, biological grafting, etc. Class-bridge rules are different from other association rules as follows: (1) class-bridge rules can be generated from infrequent itemsets; (2) measurements of class-bridge rules include the distance between conceptual classes and the relation between the antecedents/consequents and their affiliated conceptual classes. This paper proposes an algorithm based on rough sets to mine and evaluate class-bridge rules. The experiment demonstrates its effectiveness.  相似文献   

17.
Association rule mining is one of most popular data analysis methods that can discover associations within data. Association rule mining algorithms have been applied to various datasets, due to their practical usefulness. Little attention has been paid, however, on how to apply the association mining techniques to analyze questionnaire data. Therefore, this paper first identifies the various data types that may appear in a questionnaire. Then, we introduce the questionnaire data mining problem and define the rule patterns that can be mined from questionnaire data. A unified approach is developed based on fuzzy techniques so that all different data types can be handled in a uniform manner. After that, an algorithm is developed to discover fuzzy association rules from the questionnaire dataset. Finally, we evaluate the performance of the proposed algorithm, and the results indicate that our method is capable of finding interesting association rules that would have never been found by previous mining algorithms.  相似文献   

18.
Data mining is most commonly used in attempts to induce association rules from databases which can help decision-makers easily analyze the data and make good decisions regarding the domains concerned. Different studies have proposed methods for mining association rules from databases with crisp values. However, the data in many real-world applications have a certain degree of imprecision. In this paper we address this problem, and propose a new data-mining algorithm for extracting interesting knowledge from databases with imprecise data. The proposed algorithm integrates imprecise data concepts and the fuzzy apriori mining algorithm to find interesting fuzzy association rules in given databases. Experiments for diagnosing dyslexia in early childhood were made to verify the performance of the proposed algorithm.  相似文献   

19.
宫雨 《计算机工程与设计》2007,28(24):5838-5840
约束关联规则是关联规则研究中的重要问题,目前的研究大多集中在单变量约束,对双变量约束的研究较少,而双变量约束在实际中也有重要作用.针对这种情况,提出了双变量约束中具有下界约束的关联规则问题.在此基础上,给出了下界约束的定义,然后分析了满足下界约束频繁集的性质,并给出了相关的证明.最后提出了基于FP-Tree的下界约束算法,采用了预先测试的方法,降低了需要测试项集的数量和计算成本.实验结果表明,该算法具有较高的效率.  相似文献   

20.
《Information Systems》2002,27(5):345-362
The problem addressed in this paper is to discover the frequently occurred sequential patterns from databases. Basically, the existing studies on finding sequential patterns can be roughly classified into two main categories. In the first category, the discovered patterns are continuous patterns, where all the elements in the pattern appear in consecutive positions in transactions. The second category is to mine discontinuous patterns, where the adjacent elements in the pattern need not appear consecutively in transactions. Although there are many researches on finding either kind of patterns, no previous researches can find both of them. Neither can they find the discontinuous patterns formed of several continuous sub-patterns. Therefore, we define a new kind of patterns, called hybrid pattern, which is the combination of continuous patterns and discontinuous patterns. In this paper, two algorithms are developed to mine hybrid patterns, where the first algorithm is easy but slow while the second complicated but much faster than the first one. Finally, the simulation result shows that our second algorithm is as fast as the currently best algorithm for mining sequential patterns.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号