共查询到20条相似文献,搜索用时 0 毫秒
1.
We develop techniques for discovering patterns with periodicity in this work. Patterns with periodicity are those that occur at regular time intervals, and therefore there are two aspects to the problem: finding the pattern, and determining the periodicity. The difficulty of the task lies in the problem of discovering these regular time intervals, i.e., the periodicity. Periodicities in the database are usually not very precise and have disturbances, and might occur at time intervals in multiple time granularities. To overcome these difficulties and to be able to discover the patterns with fuzzy periodicity, we propose the fuzzy periodic calendar which defines fuzzy periodicities. Furthermore, we develop algorithms for mining fuzzy periodicities and the fuzzy periodic association rules within them. Experimental results have shown that our method is effective in discovering fuzzy periodic association rules. 相似文献
2.
Mining dynamic association rules with comments 总被引:2,自引:2,他引:0
In this paper, we study a new problem of mining dynamic association rules with comments (DAR-C for short). A DAR-C contains not only rule itself, but also its comments that specify when to apply the rule. In order to formalize this problem, we first present the expression method of candidate effective time slots, and then propose several definitions concerning DAR-C. Subsequently, two algorithms, namely ITS2 and EFP-Growth2, are developed for handling the problem of mining DAR-C. In particular, ITS2 is an improved two-stage dynamic association rule mining algorithm, while EFP-Growth2 is based on the EFP-tree structure and is suitable for mining high-density mass data. Extensive experimental results demonstrate that the efficiency and scalability of our proposed two algorithms (i.e., ITS2 and EFP-Growth2) on DAR-C mining tasks, and their practicability on real retail dataset. 相似文献
3.
In association rule mining, the trade-off between avoiding harmful spurious rules and preserving authentic ones is an ever critical barrier to obtaining reliable and useful results. The statistically sound technique for evaluating statistical significance of association rules is superior in preventing spurious rules, yet can also cause severe loss of true rules in presence of data error. This study presents a new and improved method for statistical test on association rules with uncertain erroneous data. An original mathematical model was established to describe data error propagation through computational procedures of the statistical test. Based on the error model, a scheme combining analytic and simulative processes was designed to correct the statistical test for distortions caused by data error. Experiments on both synthetic and real-world data show that the method significantly recovers the loss in true rules (reduces type-2 error) due to data error occurring in original statistically sound method. Meanwhile, the new method maintains effective control over the familywise error rate, which is the distinctive advantage of the original statistically sound technique. Furthermore, the method is robust against inaccurate data error probability information and situations not fulfilling the commonly accepted assumption on independent error probabilities of different data items. The method is particularly effective for rules which were most practically meaningful yet sensitive to data error. The method proves promising in enhancing values of association rule mining results and helping users make correct decisions. 相似文献
4.
Mining spatial association rules in image databases 总被引:2,自引:0,他引:2
In this paper, we propose a novel spatial mining algorithm, called 9DLT-Miner, to mine the spatial association rules from an image database, where every image is represented by the 9DLT representation. The proposed method consists of two phases. First, we find all frequent patterns of length one. Next, we use frequent k-patterns (k ? 1) to generate all candidate (k + 1)-patterns. For each candidate pattern generated, we scan the database to count the pattern’s support and check if it is frequent. The steps in the second phase are repeated until no more frequent patterns can be found. Since our proposed algorithm prunes most of impossible candidates, it is more efficient than the Apriori algorithm. The experiment results show that 9DLT-Miner runs 2-5 times faster than the Apriori algorithm. 相似文献
5.
Association rule mining is one of most popular data analysis methods that can discover associations within data. Association rule mining algorithms have been applied to various datasets, due to their practical usefulness. Little attention has been paid, however, on how to apply the association mining techniques to analyze questionnaire data. Therefore, this paper first identifies the various data types that may appear in a questionnaire. Then, we introduce the questionnaire data mining problem and define the rule patterns that can be mined from questionnaire data. A unified approach is developed based on fuzzy techniques so that all different data types can be handled in a uniform manner. After that, an algorithm is developed to discover fuzzy association rules from the questionnaire dataset. Finally, we evaluate the performance of the proposed algorithm, and the results indicate that our method is capable of finding interesting association rules that would have never been found by previous mining algorithms. 相似文献
6.
Association rule mining is an important data analysis method that can discover associations within data. There are numerous
previous studies that focus on finding fuzzy association rules from precise and certain data. Unfortunately, real-world data
tends to be uncertain due to human errors, instrument errors, recording errors, and so on. Therefore, a question arising immediately
is how we can mine fuzzy association rules from uncertain data. To this end, this paper proposes a representation scheme to
represent uncertain data. This representation is based on possibility distributions because the possibility theory establishes
a close connection between the concepts of similarity and uncertainty, providing an excellent framework for handling uncertain
data. Then, we develop an algorithm to mine fuzzy association rules from uncertain data represented by possibility distributions.
Experimental results from the survey data show that the proposed approach can discover interesting and valuable patterns with
high certainty. 相似文献
7.
Mining multiple-level association rules in large databases 总被引:2,自引:0,他引:2
Jiawei Han Yongjian Fu 《Knowledge and Data Engineering, IEEE Transactions on》1999,11(5):798-805
A top-down progressive deepening method is developed for efficient mining of multiple-level association rules from large transaction databases based on the a priori principle. A group of variant algorithms is proposed based on the ways of sharing intermediate results, with the relative performance tested and analyzed. The enforcement of different interestingness measurements to find more interesting rules, and the relaxation of rule conditions for finding “level-crossing” association rules, are also investigated. The study shows that efficient algorithms can be developed from large databases for the discovery of interesting and strong multiple-level association rules 相似文献
8.
宫雨 《计算机工程与设计》2007,28(24):5838-5840
约束关联规则是关联规则研究中的重要问题,目前的研究大多集中在单变量约束,对双变量约束的研究较少,而双变量约束在实际中也有重要作用.针对这种情况,提出了双变量约束中具有下界约束的关联规则问题.在此基础上,给出了下界约束的定义,然后分析了满足下界约束频繁集的性质,并给出了相关的证明.最后提出了基于FP-Tree的下界约束算法,采用了预先测试的方法,降低了需要测试项集的数量和计算成本.实验结果表明,该算法具有较高的效率. 相似文献
9.
A. M. Palacios M. J. Gacto J. Alcalá-Fdez 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2012,16(5):883-901
Data mining is most commonly used in attempts to induce association rules from databases which can help decision-makers easily
analyze the data and make good decisions regarding the domains concerned. Different studies have proposed methods for mining
association rules from databases with crisp values. However, the data in many real-world applications have a certain degree
of imprecision. In this paper we address this problem, and propose a new data-mining algorithm for extracting interesting
knowledge from databases with imprecise data. The proposed algorithm integrates imprecise data concepts and the fuzzy apriori
mining algorithm to find interesting fuzzy association rules in given databases. Experiments for diagnosing dyslexia in early
childhood were made to verify the performance of the proposed algorithm. 相似文献
10.
James J.H. Liou Ching-Hui Tang Wen-Chien Yeh Chieh-Yuan Tsai 《Expert systems with applications》2011,38(11):13723-13730
The passenger’s perception of the airport’s level of service (LOS) may have a significant impact on promoting or discouraging future tourism and business activities. In this study, we take a look at this problem, but unlike in traditional statistical analysis, we apply a new method, the dominance-based rough set approach (DRSA), to an airport service survey. A set of “if … then … ” decision rules is used in the preference model. The passengers indicate their perception of airport LOS by rating a set of criteria/attributes. The proposed method provides practical information that should be of help to airport planners, designers, operators, and managers to develop LOS improvement strategies. The model was implemented using survey data from a large sample of customers from an international airport in Taiwan. 相似文献
11.
Mining fuzzy association rules in a bank-account database 总被引:1,自引:0,他引:1
This paper describes how we applied a fuzzy technique to a data-mining task involving a large database that was provided by an international bank with offices in Hong Kong. The database contains the demographic data of over 320,000 customers and their banking transactions, which were collected over a six-month period. By mining the database, the bank would like to be able to discover interesting patterns in the data. The bank expected that the hidden patterns would reveal different characteristics about different customers so that they could better serve and retain them. To help the bank achieve its goal, we developed a fuzzy technique, called fuzzy association rule mining II (FARM II). FARM II is able to handle both relational and transactional data. It can also handle fuzzy data. The former type of data allows FARM II to discover multidimensional association rules, whereas the latter data allows some of the patterns to be more easily revealed and expressed. To effectively uncover the hidden associations in the bank-account database, FARM II performs several steps which are described in detail in this paper. With FARM II, the bank discovered that they had identified some interesting characteristics about the customers who had once used the bank's loan services but then decided later to cease using them. The bank translated what they discovered into actionable items by offering some incentives to retain their existing customers. 相似文献
12.
Emerging applications introduce the requirement for novel association-rule mining algorithms that will be scalable not only with respect to the number of records (number of rows) but also with respect to the domain's size (number of columns). In this paper, we focus on the cases where the items of a large domain correlate with each other in a way that small worlds are formed, that is, the domain is clustered into groups with a large number of intra-group and a small number of inter-group correlations. This property appears in several real-world cases, e.g., in bioinformatics, e-commerce applications, and bibliographic analysis, and can help to significantly prune the search space so as to perform efficient association-rule mining. We develop an algorithm that partitions the domain of items according to their correlations and we describe a mining algorithm that carefully combines partitions to improve the efficiency. Our experiments show the superiority of the proposed method against existing algorithms, and that it overcomes the problems (e.g., increase in CPU cost and possible I/O thrashing) caused by existing algorithms due to the combination of a large domain and a large number of records. 相似文献
13.
Mining association rules on large data sets has received considerable attention in recent years. Association rules are useful for determining correlations between attributes of a relation and have applications in marketing, financial, and retail sectors. Furthermore, optimized association rules are an effective way to focus on the most interesting characteristics involving certain attributes. Optimized association rules are permitted to contain uninstantiated attributes and the problem is to determine instantiations such that either the support or confidence of the rule is maximized. In this paper, we generalize the optimized association rules problem in three ways: (1) association rules are allowed to contain disjunctions over uninstantiated attributes, (2) association rules are permitted to contain an arbitrary number of uninstantiated attributes, and (3) uninstantiated attributes can be either categorical or numeric. Our generalized association rules enable us to extract more useful information about seasonal and local patterns involving multiple attributes. We present effective techniques for pruning the search space when computing optimized association rules for both categorical and numeric attributes. Finally, we report the results of our experiments that indicate that our pruning algorithms are efficient for a large number of uninstantiated attributes, disjunctions, and values in the domain of the attributes 相似文献
14.
15.
Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. In real-world applications, transactions may contain quantitative values and each item may have a lifespan from a temporal database. In this paper, we thus propose a data mining algorithm for deriving fuzzy temporal association rules. It first transforms each quantitative value into a fuzzy set using the given membership functions. Meanwhile, item lifespans are collected and recorded in a temporal information table through a transformation process. The algorithm then calculates the scalar cardinality of each linguistic term of each item. A mining process based on fuzzy counts and item lifespans is then performed to find fuzzy temporal association rules. Experiments are finally performed on two simulation datasets and the foodmart dataset to show the effectiveness and the efficiency of the proposed approach. 相似文献
16.
分组多支持度关联规则研究 总被引:3,自引:1,他引:3
宫雨 《计算机工程与设计》2007,28(5):1205-1207
关联规则是数据挖掘的重要任务之一,传统关联规则算法只有一个最小支持度,假设项出现的频率大致相同,而在谮实际中并非如此,由此产生了多支持度关联规则问题.该问题针对每个项给定不同的支持度,而在实际应用中项可以划分成若干个组,每组有一个支持度.由此提出了分组多支持度关联规则问题,针对该问题给出了基于多支持度性质对项进行分组的方法.该方法可以降低2-项候选集的数目.在此基础上,进一步给出了相应的多支持度关联规则发现算法,并通过实验证明了算法的有效性. 相似文献
17.
Mining association rules using inverted hashing and pruning 总被引:2,自引:0,他引:2
John D. HoltSoon M. Chung 《Information Processing Letters》2002,83(4):211-220
In this paper, we propose a new algorithm named Inverted Hashing and Pruning (IHP) for mining association rules between items in transaction databases. The performance of the IHP algorithm was evaluated for various cases and compared with those of two well-known mining algorithms, Apriori algorithm [Proc. 20th VLDB Conf., 1994, pp. 487-499] and Direct Hashing and Pruning algorithm [IEEE Trans. on Knowledge Data Engrg. 9 (5) (1997) 813-825]. It has been shown that the IHP algorithm has better performance for databases with long transactions. 相似文献
18.
Eliseo Clementini Paolino Di Felice Krzysztof Koperski 《Data & Knowledge Engineering》2000,34(3):251-270
Spatial data mining, i.e., mining knowledge from large amounts of spatial data, is a demanding field since huge amounts of spatial data have been collected in various applications, ranging from remote sensing to geographical information systems (GIS), computer cartography, environmental assessment and planning. The collected data far exceeds people's ability to analyze it. Thus, new and efficient methods are needed to discover knowledge from large spatial databases. Most of the spatial data mining methods do not take into account the uncertainty of spatial information. In our work we use objects with broad boundaries, the concept that absorbs all the uncertainty by which spatial data is commonly affected and allows computations in the presence of uncertainty without rough simplifications of the reality. The topological relations between objects with a broad boundary can be organized into a three-level concept hierarchy. We developed and implemented a method for an efficient determination of such topological relations. Based on the hierarchy of topological relations we present a method for mining spatial association rules for objects with uncertainty. The progressive refinement approach is used for the optimization of the mining process. 相似文献
19.
20.
关联规则是数据挖掘中的核心任务之一,近年来国内外对关联规则算法的改进取得了比较大的成果.概念格是由二元关系导出的形式化工具.体现了概念内涵和外延的统一,非常适合于发现数据中的潜在关系,因此关联规则的提取也是概念格的一个主要的应用领域,极大的提高了关联规则的挖掘效率,然而由于缺乏领域知识的指导,所挖掘出的规则有些是无意义的或无法满足用户的需要,所以在规则的提取中需要引入领域知识,而领域本体是领域知识的清晰而结构化的表示,因此提出了应用领域本体对生成的概念格进行调整,从而实现对规则提取的指导,以发掘出高层关联规则以及多层次间的关联规则,以满足用户的需要. 相似文献