首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
《Information Systems》2001,26(1):1-14
In this paper, we examine the two issues of mining association rules and mining sequential patterns in a large database of sales transactions. The problems of mining association rules and mining sequential patterns focus on discovering large itemsets and large sequences, respectively. We present PSI and PSI_seq for efficient large itemsets generation and large sequences generation, respectively. The main ideas of these two algorithms are using prestored information to minimize the numbers of candidate itemsets and candidate sequences counted in each database scan. The prestored informations for PSI and PSI_seq include the itemsets and the sequences along with their support counts found in the last mining, respectively. Typically a user may require to tune the value of the minimum support many times before a set of useful association rules can be obtained from the transaction database. Using prestored information, the total computation time will be reduced effectively. Empirical results show that our approaches outperform previous methods by an order of magnitude, using little storage space for the prestored information.  相似文献   

2.

This article explores the combined application of inductive learning algorithms and causal inference techniques to the problem of discovering causal rules among the attributes of a relational database. Given some relational data each field can be considered as a random variable and a hybrid graph can be built by detecting conditional independencies among variables. The induced graph represents genuine and potential causal relations as well as spurious associations. When the variables are discrete or have been discretized to test condi tional independencies supervised induction algorithms can be used to learn causal rules that is conditional statements in which causes appear as antecedents and effects as consequences. The approach is illustrated by means of some experiments conducted on different data sets.  相似文献   

3.
The discovery of dependencies between attributes in databases is an important problem in data mining, and can be applied to facilitate future decision-making. In the present paper some properties of the branching dependencies are examined. We define a minimal branching dependency and we propose an algorithm for finding all minimal branching dependencies between a given set of attributes and a given attribute in a relation of a database. Our examination of the branching dependencies is motivated by their application in a database storing realized sales of products. For example, finding out that arbitrary p products have totally attracted at most q new users can prove to be crucial in supporting the decision making.In addition, we also consider the fractional and the fractional branching dependencies. Some properties of these dependencies are examined. An algorithm for finding all fractional dependencies between a given set of attributes and a given attribute in a database relation is proposed. We examine the general case of an arbitrary relation, as well as a particular case where the problem of discovering the fractional dependencies is considerably simplified.  相似文献   

4.
Sequential pattern mining is one of the most important data mining techniques. Previous research on mining sequential patterns discovered patterns from point-based event data, interval-based event data, and hybrid event data. In many real life applications, however, an event may involve many statuses; it might not occur only at one certain point in time or over a period of time. In this work, we propose a generalized representation of temporal events. We treat events as multi-label events with many statuses, and introduce an algorithm called MLTPM to discover multi-label temporal patterns from temporal databases. The experimental results show that the efficiency and scalability of the MLTPM algorithm are satisfactory. We also discuss interesting multi-label temporal patterns discovered when MLTPM was applied to historical Nasdaq data.  相似文献   

5.
Chen  Guisheng  Li  Zhanshan 《Applied Intelligence》2022,52(13):15387-15404
Applied Intelligence - Since periodic events are very common everywhere, periodic pattern mining is increasingly more important in today’s data mining domain. However, there is currently no...  相似文献   

6.
In the context of a federated database containing similar information in local component databases, we consider the problem of discovering the different local representations which database designers might use to map the same entity. An extension of the normal SQL system catalog to hold additional domain metadata is discussed. This enables common local mappings to be discovered and allows SQL retrieval queries expressed in terms of the logical federated schema to be transformed to match the appropriate local database schema. The result is a practical means of coping with differing mappings in federated databases without having to resort to special higher order languages or a layer of information mediation middleware. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

7.
We consider the problem of discovering the functional and inclusion dependencies that a given database instance satisfies. This technique is used in a database design tool that uses example databases to give feedback to the designer. If the examples show deficiencies in the design, the designer can directly modify the examples. the tool then infers new dependencies and the database schema can be modified, if necessary. the discovery of the functional and inclusion dependencies can also be used in analyzing an existing database. the problem of inferring functional dependencies has several connections to other topics in knowledge discovery and machine learning. In this article we discuss the use of examples in the design of databases, and give an overview of the complexity results and algorithms that have been developed for this problem. © 1992 John Wiley & Sons, Inc.  相似文献   

8.
Discovering fuzzy time-interval sequential patterns in sequence databases.   总被引:1,自引:0,他引:1  
Given a sequence database and minimum support threshold, the task of sequential pattern mining is to discover the complete set of sequential patterns in databases. From the discovered sequential patterns, we can know what items are frequently brought together and in what order they appear. However, they cannot tell us the time gaps between successive items in patterns. Accordingly, Chen et al. have proposed a generalization of sequential patterns, called time-interval sequential patterns, which reveals not only the order of items, but also the time intervals between successive items. An example of time-interval sequential pattern has a form like (A, I2, B, I1, C), meaning that we buy A first, then after an interval of I2 we buy B, and finally after an interval of I1 we buy C, where I2 and I1 are predetermined time ranges. Although this new type of pattern can alleviate the above concern, it causes the sharp boundary problem. That is, when a time interval is near the boundary of two predetermined time ranges, we either ignore or overemphasize it. Therefore, this paper uses the concept of fuzzy sets to extend the original research so that fuzzy time-interval sequential patterns are discovered from databases. Two efficient algorithms, the fuzzy time interval (FTI)-Apriori algorithm and the FTI-PrefixSpan algorithm, are developed for mining fuzzy time-interval sequential patterns. In our simulation results, we find that the second algorithm outperforms the first one, not only in computing time but also in scalability with respect to various parameters.  相似文献   

9.
We address the problem of outlying aspects mining: given a query object and a reference multidimensional data set, how can we discover what aspects (i.e., subsets of features or subspaces) make the query object most outlying? Outlying aspects mining can be used to explain any data point of interest, which itself might be an inlier or outlier. In this paper, we investigate several open challenges faced by existing outlying aspects mining techniques and propose novel solutions, including (a) how to design effective scoring functions that are unbiased with respect to dimensionality and yet being computationally efficient, and (b) how to efficiently search through the exponentially large search space of all possible subspaces. We formalize the concept of dimensionality unbiasedness, a desirable property of outlyingness measures. We then characterize existing scoring measures as well as our novel proposed ones in terms of efficiency, dimensionality unbiasedness and interpretability. Finally, we evaluate the effectiveness of different methods for outlying aspects discovery and demonstrate the utility of our proposed approach on both large real and synthetic data sets.  相似文献   

10.
Mining very large databases   总被引:1,自引:0,他引:1  
Ganti  V. Gehrke  J. Ramakrishnan  R. 《Computer》1999,32(8):38-45
Established companies have had decades to accumulate masses of data about their customers, suppliers, products and services, and employees. Data mining, also known as knowledge discovery in databases, gives organizations the tools to sift through these vast data stores to find the trends, patterns, and correlations that can guide strategic decision making. Traditionally, algorithms for data analysis assume that the input data contains relatively few records. Current databases however, are much too large to be held in main memory. To be efficient, the data mining techniques applied to very large databases must be highly scalable. An algorithm is said to be scalable if (given a fixed amount of main memory), its runtime increases linearly with the number of records in the input database. Recent work has focused on scaling data mining algorithms to very large data sets. The authors describe a broad range of algorithms that address three classical data mining problems: market basket analysis, clustering, and classification  相似文献   

11.
Mining constrained gradients in large databases   总被引:1,自引:0,他引:1  
Many data analysis tasks can be viewed as search or mining in a multidimensional space (MDS). In such MDSs, dimensions capture potentially important factors for given applications, and cells represent combinations of values for the factors. To systematically analyze data in MDS, an interesting notion, called "cubegrade" was recently introduced by Imielinski et al. [2002], which focuses on the notable changes in measures in MDS by comparing a cell (which we refer to as probe cell) with its gradient cells, namely, its ancestors, descendants, and siblings. We call such queries gradient analysis queries (GQs). Since an MDS can contain billions of cells, it is important to answer GQs efficiently. We focus on developing efficient methods for mining GQs constrained by certain (weakly) antimonotone constraints. Instead of conducting an independent gradient-cell search once per probe cell, which is inefficient due to much repeated work, we propose an efficient algorithm, LiveSet-Driven. This algorithm finds all good gradient-probe cell pairs in one search pass. It utilizes measure-value analysis and dimension-match analysis in a set-oriented manner, to achieve bidirectional pruning between the sets of hopeful probe cells and of hopeful gradient cells. Moreover, it adopts a hypertree structure and an H-cubing method to compress data and to maximize sharing of computation. Our performance study shows that this algorithm is efficient and scalable. In addition to data cubes, we extend our study to another important scenario: mining constrained gradients in transactional databases where each item is associated with some measures such as price. Such transactional databases can be viewed as sparse MDSs where items represent dimensions, although they have significantly different characteristics than data cubes. We outline efficient mining methods for this problem.  相似文献   

12.
Query-by-example and query-by-keyword both suffer from the problem of “aliasing,” meaning that example-images and keywords potentially have variable interpretations or multiple semantics. For discerning which semantic is appropriate for a given query, we have established that combining active learning with kernel methods is a very effective approach. In this work, we first examine active-learning strategies, and then focus on addressing the challenges of two scalability issues: scalability in concept complexity and in dataset size. We present remedies, explain limitations, and discuss future directions that research might take.  相似文献   

13.
This paper extends the work on discovering fuzzy association rules with degrees of support and implication (ARsi). The effort is twofold: one is to discover ARsi with hierarchy so as to express more semantics due to the fact that hierarchical relationships usually exist among fuzzy sets associated with the attribute concerned; the other is to generate a “core” set of rules, namely the rule cover set, that are of more interest in a sense that all other rules could be derived by the cover set. Corresponding algorithms for ARsi with hierarchy and the cover set are proposed along with pruning strategies incorporated to improve the computational efficiency. Some data experiments are conducted as well to show the effectiveness of the approach.  相似文献   

14.
Mining multiple-level association rules in large databases   总被引:2,自引:0,他引:2  
A top-down progressive deepening method is developed for efficient mining of multiple-level association rules from large transaction databases based on the a priori principle. A group of variant algorithms is proposed based on the ways of sharing intermediate results, with the relative performance tested and analyzed. The enforcement of different interestingness measurements to find more interesting rules, and the relaxation of rule conditions for finding “level-crossing” association rules, are also investigated. The study shows that efficient algorithms can be developed from large databases for the discovery of interesting and strong multiple-level association rules  相似文献   

15.
Structural equation models with latent variables are used widely in psychometrics, econometrics, and sociology to explore the causal relations among latent variables. Since such models often involve dozens of variables, the number of theoretically feasible alternatives can be astronomical. Without computational aids with which to search such a space, researchers can only explore a handful of alternative models. We describe a procedure that can find information about the causal structure among latent, or unmeasured variables. the procedure is asymptotically reliable, feasible on data sets with as many as a hundred variables, and has already proved useful in modeling an empirical data set collected by the U.S. Navy. © 1992 John Wiley & Sons, Inc.  相似文献   

16.
In this paper, we present an innovative system, coined as DISTROD (a.k.a DISTRibuted Outlier Detector), for detecting outliers, namely abnormal instances or observations, from multiple large distributed databases. DISTROD is able to effectively detect the so-called global outliers from distributed databases that are consistent with those produced by the centralized detection paradigm. DISTROD is equipped with a number of optimization/boosting strategies which empower it to significantly enhance its speed performance and reduce its communication overhead. Experimental evaluation demonstrates the good performance of DISTROD in terms of speed and communication overhead.  相似文献   

17.
Nowadays, most fingerprint sensors capture partial fingerprint images. Incomplete, fragmentary, or partial fingerprint identification in large databases is an attractive research topic and is remained as an important and challenging problem. Accordingly, conventional fingerprint identification systems are not capable of providing convincing results. To overcome this problem, we need a fast and accurate identification strategy. In this context, fingerprint indexing is commonly used to speed up the identification process. This paper proposes a robust and fast identification system that combines two indexing algorithms. One of the indexing algorithms uses minutiae triplets, and the other uses orientation field (OF) to index and retrieve fingerprints. Furthermore, the proposal uses some partial fingerprint matching methods on final candidate list obtained from the indexing stage. The proposal is evaluated over two national institutes of standards and technology (NIST) datasets and four fingerprint verification competition (FVC) datasets leading to low identification times with no accuracy loss.  相似文献   

18.
《Pattern recognition》2014,47(2):588-602
Fingerprint matching has emerged as an effective tool for human recognition due to the uniqueness, universality and invariability of fingerprints. Many different approaches have been proposed in the literature to determine faithfully if two fingerprint images belong to the same person. Among them, minutiae-based matchers highlight as the most relevant techniques because of their discriminative capabilities, providing precise results. However, performing a fingerprint identification over a large database can be an inefficient task due to the lack of scalability and high computing times of fingerprint matching algorithms.In this paper, we propose a distributed framework for fingerprint matching to tackle large databases in a reasonable time. It provides a general scheme for any kind of matcher, so that its precision is preserved and its time of response can be reduced.To test the proposed system, we conduct an extensive study that involves both synthetic and captured fingerprint databases, which have different characteristics, analyzing the performance of three well-known minutiae-based matchers within the designed framework. With the available hardware resources, our distributed model is able to address up to 400 000 fingerprints in approximately half a second. Additional details are provided at http://sci2s.ugr.es/ParallelMatching.  相似文献   

19.
Faudemay  P. Mhiri  M. 《Micro, IEEE》1991,11(6):22-34
The RAPID-1 (relational access processor for intelligent data), an associative accelerator that recognizes tuples and logical formulas, is presented. It evaluates logical formulas instantiated by the current tuple, or record, and operates on whole relations or on hashing buckets. RAPID- 1 uses a reduced instruction set and hardwired control and executes all comparisons in a bit-parallel mode. It speeds up the database by a significant factor and will adapt to future generations of microprocessors. The principal design issues, data structures, instruction set, architecture, environments and performance are discussed  相似文献   

20.
International Journal of Parallel Programming - A time decomposition technique is suggested for large-database (DB) models. The problem of network aggregation is studied and the results used to...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号