首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 703 毫秒
1.
There have been many kinds of association rule mining (ARM) algorithms, e.g., Apriori and FP-tree, to discover meaningful frequent patterns from a large dataset. Particularly, it is more difficult for such ARM algorithms to be applied for temporal databases which are continuously changing over time. Such algorithms are generally based on repeating time-consuming tasks, e.g., scanning databases. To deal with this problem, in this paper, we propose a constraint graph-based method for maintaining frequent patterns (FP) discovered from the temporal databases. Particularly, the constraint graph, which is represented as a set of constraint between two items, can be established by temporal persistency of the patterns. It means that some patterns can be used to build the constraint graph, when the patterns have been shown in a set of the FP. Two types of constraints can be generated by users and adaptation. Based on our scheme, we find that a large number of dataset has been efficiently reduced during mining process and the gathering information while updating.  相似文献   

2.
The goal of analyzing a time series database is to find whether and how frequent a periodic pattern is repeated within the series. Periodic pattern mining is the problem that regards temporal regularity. However, most of the existing algorithms have a major limitation in mining interesting patterns of users interest, that is, they can mine patterns of specific length with all the events sequentially one after another in exact positions within this pattern. Though there are certain scenarios where a pattern can be flexible, that is, it may be interesting and can be mined by neglecting any number of unimportant events in between important events with variable length of the pattern. Moreover, existing algorithms can detect only specific type of periodicity in various time series databases and require the interaction from user to determine periodicity. In this paper, we have proposed an algorithm for the periodic pattern mining in time series databases which does not rely on the user for the period value or period type of the pattern and can detect all types of periodic patterns at the same time, indeed these flexibilities are missing in existing algorithms. The proposed algorithm facilitates the user to generate different kinds of patterns by skipping intermediate events in a time series database and find out the periodicity of the patterns within the database. It is an improvement over the generating pattern using suffix tree, because suffix tree based algorithms have weakness in this particular area of pattern generation. Comparing with the existing algorithms, the proposed algorithm improves generating different kinds of interesting patterns and detects whether the generated pattern is periodic or not. We have tested the performance of our algorithm on both synthetic and real life data from different domains and found a large number of interesting event sequences which were missing in existing algorithms and the proposed algorithm was efficient enough in generating and detecting periodicity of flexible patterns on both types of data.  相似文献   

3.
Maximal biclique (also known as complete bipartite) subgraphs can model many applications in Web mining, business, and bioinformatics. Enumerating maximal biclique subgraphs from a graph is a computationally challenging problem, as the size of the output can become exponentially large with respect to the vertex number when the graph grows. In this paper, we efficiently enumerate them through the use of closed patterns of the adjacency matrix of the graph. For an undirected graph G without self-loops, we prove that 1) the number of closed patterns in the adjacency matrix of G is even, 2) the number of the closed patterns is precisely double the number of maximal biclique subgraphs of G, and 3) for every maximal biclique subgraph, there always exists a unique pair of closed patterns that matches the two vertex sets of the subgraph. Therefore, the problem of enumerating maximal bicliques can be solved by using efficient algorithms for mining closed patterns, which are algorithms extensively studied in the data mining field. However, this direct use of existing algorithms causes a duplicated enumeration. To achieve high efficiency, we propose an O(mn) time delay algorithm for a nonduplicated enumeration, in particular, for enumerating those maximal bicliques with a large size, where m and n. are the number of edges and vertices of the graph, respectively. We evaluate the high efficiency of our algorithm by comparing it to state- of-the-art algorithms on three categories of graphs: randomly generated graphs, benchmarks, and a real-life protein interaction network. In this paper, we also prove that if self-loops are allowed in a graph, then the number of closed patterns in the adjacency matrix is not necessarily even, but the maximal bicliques are exactly the same as those of the graph after removing all the self-loops.  相似文献   

4.
Various data mining methods have been developed last few years for hepatitis study using a large temporal and relational database given to the research community. In this work we introduce a novel temporal abstraction method to this study by detecting and exploiting temporal patterns and relations between events in viral hepatitis such as “event A slightly happened before event B and B simultaneously ended with event C”. We developed algorithms to first detect significant temporal patterns in temporal sequences and then to identify temporal relations between these temporal patterns. Many findings by data mining methods applied to transactions/graphs of temporal relations shown to be significant by physician evaluation and matching with published in Medline.  相似文献   

5.
An important usage of time sequences is to discover temporal patterns. The discovery process usually starts with a user specified skeleton, called an event structure, which consists of a number of variables representing events and temporal constraints among these variables; the goal of the discovery is to find temporal patterns, i.e., instantiations of the variables in the structure that appear frequently in the time sequence. The paper introduces event structures that have temporal constraints with multiple granularities, defines the pattern discovery problem with these structures, and studies effective algorithms to solve it. The basic components of the algorithms include timed automata with granularities (TAGs) and a number of heuristics. The TAGs are for testing whether a specific temporal pattern, called a candidate complex event type, appears frequently in a time sequence. Since there are often a huge number of candidate event types for a usual event structure, heuristics are presented aiming at reducing the number of candidate event types and reducing the time spent by the TAGs testing whether a candidate type does appear frequently in the sequence. These heuristics exploit the information provided by explicit and implicit temporal constraints with granularity in the given event structure. The paper also gives the results of an experiment to show the effectiveness of the heuristics on a real data set  相似文献   

6.
Mining sequential patterns is to discover sequential purchasing behaviours for most of the customers from a large number of customer transactions. The strategy of mining sequential patterns focuses on discovering frequent sequences. A frequent sequence is an ordered list of the itemsets purchased by a sufficient number of customers. The previous approaches for mining sequential patterns need to repeatedly scan the database so that they take a large amount of computation time to find frequent sequences. The customer transactions will grow rapidly in a short time, and some of the customer transactions may be antiquated. Consequently, the frequent sequences may be changed due to the insertion of new customer transactions or the deletion of old customer transactions from the database. It may require rediscovering all the patterns by scanning the entire updated customer transaction database. In this paper, we propose an incremental updating technique to maintain the discovered sequential patterns when transactions are inserted into or deleted from the database. Our approach partitions the database into some segments and scans the database segment by segment. For each segment scan, our approach prunes those sequences that cannot be frequent sequences any more to accelerate the finding process of the frequent sequences. Therefore, the number of database scans can be significantly reduced by our approach. The experimental results show that our algorithms are more efficient than other algorithms for the maintenance of mining sequential patterns.  相似文献   

7.
Compression of trajectory data: a comprehensive evaluation and new approach   总被引:1,自引:0,他引:1  
GPS-equipped mobile devices such as smart phones and in-car navigation units are collecting enormous amounts of spatial and temporal information that traces a moving object’s path. The exponential increase in the amount of such trajectory data has caused three major problems. First, transmission of large amounts of data is expensive and time-consuming. Second, queries on large amounts of trajectory data require computationally expensive operations to extract useful patterns and information. Third, GPS trajectories often contain large amounts of redundant data that waste storage and cause increased disk I/O time. These issues can be addressed by algorithms that reduce the size of trajectory data. A key requirement for these algorithms is to minimize the loss of information essential to location-based applications. This paper presents a new compression method called SQUISH-E (Spatial QUalIty Simplification Heuristic - Extended) that provides improved run-time performance and usability. A comprehensive comparison of SQUISH-E with other algorithms is carried out through an empirical study across three types of real-world datasets and a variety of error metrics.  相似文献   

8.
The ability to model the temporal dimension is essential to many applications. Furthermore, the rate of increase in database size and stringency of response time requirements has out-paced advancements in processor and mass storage technology, leading to the need for parallel temporal database management systems. In this paper, we introduce a variety of parallel temporal aggregation algorithms for the shared-nothing architecture; these algorithms are based on the sequential Aggregation Tree algorithm. We are particularly interested in developing parallel algorithms that can maximally exploit available memory to quickly compute large-scale temporal aggregates without intermediate disk writes and reads. Via an empirical study, we found that the number of processing nodes, the partitioning of the data, the placement of results, and the degree of data reduction effected by the aggregation impacted the performance of the algorithms. For distributed result placement, we discovered that Greedy Time Division Merge was the obvious choice. For centralized results and high data reduction, Pairwise Merge was preferred for a large number of processing nodes; for low data reduction, it only performed well up to 32 nodes. This led us to a centralized variant of Greedy Time Division Merge which was best for the remaining cases. We present a cost model that closely predicts the running time of Greedy Time Division Merge.  相似文献   

9.
高效用模式挖掘是数据挖掘领域的一个基础研究方向,其中关于top-k高效用模式的挖掘算法也越来越多,其中k指的是用户需要挖掘的高效用模式的个数。它们可以归纳为两类:二阶段top-k算法和一阶段top-k算法。两者的主要区别是,前者在挖掘的过程中会产生大量的候选模式,这个是影响算法性能的主要因素;后者在挖掘的过程中不产生候选模式。为了更加高效地挖掘效用值最高的k个模式,一阶段算法TKHUP被提出。该算法在进行数据挖掘的过程中主要是通过四个有效策略来减少时间和空间消耗的。通过大量的实验数据表明,TKHUP在时间性能上优于其它top-k高效用模式挖掘算法。  相似文献   

10.
This paper studies the problem of mining frequent itemsets along with their temporal patterns from large transaction sets. A model is proposed in which users define a large set of temporal patterns that are interesting or meaningful to them. A temporal pattern defines the set of time points where the user expects a discovered itemset to be frequent. The model is general in that (i) no constraints are placed on the interesting patterns given by the users, and (ii) two measures—inclusiveness and exclusiveness—are used to capture how well the temporal patterns match the time points given by the discovered itemsets. Intuitively, these measures indicate to what extent a discovered itemset is frequent at time points included in a temporal pattern p, but not at time points not in p. Using these two measures, one is able to model many temporal data mining problems appeared in the literature, as well as those that have not been studied. By exploiting the relationship within and between itemset space and pattern space simultaneously, a series of pruning techniques are developed to speed up the mining process. Experiments show that these pruning techniques allow one to obtain performance benefits up to 100 times over a direct extension of non-temporal data mining algorithms.  相似文献   

11.
Mining very large databases   总被引:1,自引:0,他引:1  
Ganti  V. Gehrke  J. Ramakrishnan  R. 《Computer》1999,32(8):38-45
Established companies have had decades to accumulate masses of data about their customers, suppliers, products and services, and employees. Data mining, also known as knowledge discovery in databases, gives organizations the tools to sift through these vast data stores to find the trends, patterns, and correlations that can guide strategic decision making. Traditionally, algorithms for data analysis assume that the input data contains relatively few records. Current databases however, are much too large to be held in main memory. To be efficient, the data mining techniques applied to very large databases must be highly scalable. An algorithm is said to be scalable if (given a fixed amount of main memory), its runtime increases linearly with the number of records in the input database. Recent work has focused on scaling data mining algorithms to very large data sets. The authors describe a broad range of algorithms that address three classical data mining problems: market basket analysis, clustering, and classification  相似文献   

12.
从图数据库中挖掘频繁跳跃模式   总被引:4,自引:0,他引:4  
刘勇  李建中  高宏 《软件学报》2010,21(10):2477-2493
很多频繁子图挖掘算法已被提出.然而,这些算法产生的频繁子图数量太多而不能被用户有效地利用.为此,提出了一个新的研究问题:挖掘图数据库中的频繁跳跃模式.挖掘频繁跳跃模式既可以大幅度地减少输出模式的数量,又能使有意义的图模式保留在挖掘结果中.此外,跳跃模式还具有抗噪声干扰能力强等优点.然而,由于跳跃模式不具有反单调性质,挖掘它们非常具有挑战性.通过研究跳跃模式自身的特性,提出了两种新的裁剪技术:基于内扩展的裁剪和基于外扩展的裁剪.在此基础上又给出了一种高效的挖掘算法GraphJP(an algorithm for mining jump patterns from graph databases).另外,还严格证明了裁剪技术和算法GraphJP的正确性.实验结果表明,所提出的裁剪技术能够有效地裁剪图模式搜索空间,算法GraphJP是高效、可扩展的.  相似文献   

13.
吴军  欧阳艾嘉  张琳 《计算机工程》2021,47(8):45-53,61
传统的对比序列模式挖掘算法存在一定数量的假阳性对比序列模式,其提供的错误信息会干扰后续任务的决策。设计一种IEP-DSP算法过滤假阳性对比序列模式。运用spade方法和WRAcc对比性度量找到候选对比序列模式和所有置换数据集合中的对比序列模式,通过模拟置换过程,使用独立精确置换检验方法为不同长度的模式建立独立精确零分布,并计算每个候选对比序列模式的精确p-value,运用错误发现率度量将各个长度的假阳性对比序列模式数量控制在置信度为α的统计显著水平下。在真实数据集和仿真数据集上的实验结果表明,IEP-DSP算法够过滤掉大量的假阳性对比序列模式,相比基于统计显著性检验的方法能保留更多的真对比序列模式,验证了独立精确置换检验相较于标准置换检验的优越性。  相似文献   

14.
Mining association rules is most commonly seen among the techniques for knowledge discovery from databases (KDD). It is used to discover relationships among items or itemsets. Furthermore, temporal data mining is concerned with the analysis of temporal data and the discovery of temporal patterns and regularities. In this paper, a new concept of up-to-date patterns is proposed, which is a hybrid of the association rules and temporal mining. An itemset may not be frequent (large) for an entire database but may be large up-to-date since the items seldom occurring early may often occur lately. An up-to-date pattern is thus composed of an itemset and its up-to-date lifetime, in which the user-defined minimum-support threshold must be satisfied. The proposed approach can mine more useful large itemsets than the conventional ones which discover large itemsets valid only for the entire database. Experimental results show that the proposed algorithm is more effective than the traditional ones in discovering such up-to-date temporal patterns especially when the minimum-support threshold is high.  相似文献   

15.
Biological neural networks are high dimensional nonlinear systems, which presents complex dynamical phenomena, such as chaos. Thus, the study of coupled dynamical systems is important for understanding functional mechanism of real neural networks and it is also important for modeling more realistic artificial neural networks. In this direction, the study of a ring of phase oscillators has been proved to be useful for pattern recognition. Such an approach has at least three nontrivial advantages over the traditional dynamical neural networks: first, each input pattern can be encoded in a vector instead of a matrix; second, the connection weights can be determined analytically; third, due to its dynamical nature, it has the ability to capture temporal patterns. In the previous studies of this topic, all patterns were encoded as stable periodic solutions of the oscillator network. In this paper, we continue to explore the oscillator ring for pattern recognition. Specifically, we propose algorithms, which use the chaotic dynamics of the closed loops of Stuart–Landau oscillators as artificial neurons, to recognize randomly generated fractal patterns. We manipulate the number of neurons and initial conditions of the oscillator ring to encode fractal patterns. It is worth noting that fractal pattern recognition is a challenging problem due to their discontinuity nature and their complex forms. Computer simulations confirm good performance of the proposed algorithms.  相似文献   

16.
In the late 1980s it was shown that juggling patterns can be described by strings of numbers with fascinating combinatorial properties that have since then been studied by many mathematicians and computer scientists. In this paper we propose to study juggling patterns from a pattern matching point of view. Inspired by pattern matching algorithms based on convolution, we propose a fast algorithm for finding transitions between juggling states. Apart from being a fun application of pattern matching theory, it provides a practical tool in the experimental design of (large) juggling patterns. We also provide a compact formula for counting the number of transitions of a given length between two juggling states.  相似文献   

17.
The paper presents the implementation of a new class of massively parallel algorithms for solving certain time-dependent partial differential equations (PDEs) on massively parallel supercomputers. Such PDEs are usually solved numerically, by discretization in time and space, and by applying a time-stepping procedure to data and algorithms potentially parallelized in the spatial domain. In a radical departure from such a strictly sequential temporal paradigm, we have developed a concept of time-parallel algorithms, which allows the marching in time to be fully parallelized. This is achieved by using a set of transformations based on eigenvalue-eigenvector decomposition of the matrices involved in the discrete formalism. Our time-parallel algorithms possess a highly decoupled structure, and can therefore be efficiently implemented on emerging, massively parallel, high-performance supercomputers, with a minimum of communication and synchronization overhead. We have successfully carried out a proof-of-concept demonstration of the basic ideas using a two-dimensional heat equation example implemented on the Intel Touchstone Delta supercomputer. Our results indicate that linear, and even superlinear, speed-up can be achieved and maintained for a very large number of processor nodes.  相似文献   

18.
Scalable cache invalidation algorithms for mobile data access   总被引:2,自引:0,他引:2  
In this paper, we address the problem of cache invalidation in mobile and wireless client/server environments. We present cache invalidation techniques that can scale not only to a large number of mobile clients, but also to a large number of data items that can be cached in the mobile clients. We propose two scalable algorithms: the Multidimensional Bit-Sequence (MD-BS) algorithm and the Multilevel Bit-Sequence (ML-BS) algorithm. Both algorithms are based on our prior work on the Basic Bit-Sequences (BS) algorithm. Our study shows that the proposed algorithms are effective for a large number of cached data items with low update rates. The study also illustrates that the algorithms ran be used with other complementary techniques to address the problem of cache invalidation for data items with varied update and access rates.  相似文献   

19.
In this paper, we present compiler algorithms for detecting references to stale data in shared-memory multiprocessors. The algorithm consists of two key analysis techniques, state reference detection and locality preserving analysis. While the stale reference detection finds the memory reference patterns that may violate cache coherence, the locality preserving analysis minimizes the number of such stale references by analyzing both temporal and spatial reuses. By computing the regions referenced by arrays inside loops, we extend the previous scalar algorithms for more precise analysis. We develop a full interprocedural array data-flow algorithm, which performs both bottom-up side-effect analysis and top-down context analysis on the procedure call graph to further exploit locality across procedure boundaries. The interprocedural algorithm eliminates cache invalidations at procedure boundaries, which were assumed in the previous compiler algorithms. We have fully implemented the algorithm in the Polaris parallelizing compiler. Using execution-driven simulations on Perfect Club benchmarks, we demonstrate how unnecessary cache misses can be eliminated by the automatic stale reference detection. The algorithm can be used to implement cache coherence in the shared-memory multiprocessors that do not have hardware directories, such as Cray T3D.  相似文献   

20.
This paper proposes a fast algorithm for integrating connected-component labeling and Euler number computation. Based on graph theory, the Euler number of a binary image in the proposed algorithm is calculated by counting the occurrences of four patterns of the mask for processing foreground pixels in the first scan of a connected-component labeling process, where these four patterns can be found directly without any additional calculation; thus, connected-component labeling and Euler number computation can be integrated more efficiently. Moreover, when computing the Euler number, unlike other conventional algorithms, the proposed algorithm does not need to process background pixels. Experimental results demonstrate that the proposed algorithm is much more efficient than conventional algorithms either for calculating the Euler number alone or simultaneously calculating the Euler number and labeling connected components.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号