共查询到20条相似文献,搜索用时 15 毫秒
1.
Sequential pattern mining is one of the most important data mining techniques. Previous research on mining sequential patterns discovered patterns from point-based event data, interval-based event data, and hybrid event data. In many real life applications, however, an event may involve many statuses; it might not occur only at one certain point in time or over a period of time. In this work, we propose a generalized representation of temporal events. We treat events as multi-label events with many statuses, and introduce an algorithm called MLTPM to discover multi-label temporal patterns from temporal databases. The experimental results show that the efficiency and scalability of the MLTPM algorithm are satisfactory. We also discuss interesting multi-label temporal patterns discovered when MLTPM was applied to historical Nasdaq data. 相似文献
2.
This paper studies the problem of mining frequent itemsets along with their temporal patterns from large transaction sets. A model is proposed in which users define a large set of temporal patterns that are interesting or meaningful to them. A temporal pattern defines the set of time points where the user expects a discovered itemset to be frequent. The model is general in that (i) no constraints are placed on the interesting patterns given by the users, and (ii) two measures—inclusiveness and exclusiveness—are used to capture how well the temporal patterns match the time points given by the discovered itemsets. Intuitively, these measures indicate to what extent a discovered itemset is frequent at time points included in a temporal pattern p, but not at time points not in p. Using these two measures, one is able to model many temporal data mining problems appeared in the literature, as well as those that have not been studied. By exploiting the relationship within and between itemset space and pattern space simultaneously, a series of pruning techniques are developed to speed up the mining process. Experiments show that these pruning techniques allow one to obtain performance benefits up to 100 times over a direct extension of non-temporal data mining algorithms. 相似文献
3.
Yong Joon Lee Author Vitae 《Journal of Systems and Software》2009,82(1):155-167
Temporal data mining is still one of important research topic since there are application areas that need knowledge from temporal data such as sequential patterns, similar time sequences, cyclic and temporal association rules, and so on. Although there are many studies for temporal data mining, they do not deal with discovering knowledge from temporal interval data such as patient histories, purchaser histories, and web logs etc. We propose a new temporal data mining technique that can extract temporal interval relation rules from temporal interval data by using Allen’s theory: a preprocessing algorithm designed for the generalization of temporal interval data and a temporal relation algorithm for mining temporal relation rules from the generalized temporal interval data. This technique can provide more useful knowledge in comparison with conventional data mining techniques. 相似文献
4.
Ira Assent Ralph Krieger Boris Glavic Thomas Seidl 《Knowledge and Information Systems》2008,16(1):29-51
Many environmental, scientific, technical or medical database applications require effective and efficient mining of time
series, sequences or trajectories of measurements taken at different time points and positions forming large temporal or spatial
databases. Particularly the analysis of concurrent and multidimensional sequences poses new challenges in finding clusters
of arbitrary length and varying number of attributes. We present a novel algorithm capable of finding parallel clusters in
different subspaces and demonstrate our results for temporal and spatial applications. Our analysis of structural quality
parameters in rivers is successfully used by hydrologists to develop measures for river quality improvements.
相似文献
Thomas SeidlEmail: |
5.
In the UK alone there are currently over 4.2 million operational CCTV cameras, that is virtually one camera for every 14th person, and this figure is increasing at a fast rate throughout the world (especially after the tragic events of 9/11 and 7/7) (Norris, McCahill, & Wood, 2004). Security concerns are not the only factor driving the rapid growth of CCTV cameras. Another important reason is the access of hidden knowledge extracted from CCTV footage to be used for effective business decision making, such as store designing, customer services, product marketing, reducing store shrinkage, etc.Events occurring in observed scenes are one of the most important semantic entities that can be extracted from videos (Anwar & Naftel, 2008). Most of the work presented in the past is based upon finding frequent event patterns or deals with discovering already known abnormal events. In contrast, in this paper we present a framework to discover unknown anomalous events associated with a frequent sequence of events (AEASP); that is to discover events, which are unlikely to follow a frequent sequence of events. This information can be very useful for discovering unknown abnormal events and can provide early actionable intelligence to redeploy resources to specific areas of view (such as PTZ camera or attention of a CCTV user). Discovery of anomalous events against a sequential pattern can also provide business intelligence for store management in the retail sector. The proposed event mining framework is an extension to our previous research work presented in Anwar et al. (2010) and also takes the temporal aspect of anomalous events against frequent sequence of events into consideration, that is to discover anomalous events which are true for a specific time interval only and might not be an anomalous events against frequent sequence of events over a whole time spectrum and vice versa. To confront the memory expensive process of searching all the instances of multiple sequential patterns in each data sequence an efficient dynamic sequential pattern search mechanism is introduced. Different experiments are conducted to evaluate the proposed anomalous events against frequent sequence of events mining algorithm’s accuracy and performance. 相似文献
6.
Jung-Hsien Chiang Hsiao-Sheng Liu Shih-Yi Chao Cheng-Yu Chen 《Expert systems with applications》2007,33(4):1036-1041
In this paper, we have developed a gene–gene relation browser (DiGG) that integrates sequential pattern-mining and information-extraction model to extract from biomedical literature knowledge on gene–gene interactions. DiGG combines efficient mining technique to enable the discovery of frequent gene–gene sequences even for very long sentences. Our approach aims to detect associated gene relations that are often discussed in documents. Integration of the related relations will lead to an individual gene relation network. Graphic presentation will be used to demonstrate the relationships between gene products. A salient feature of this approach is that it incrementally outputs new frequent gene relations in an online visualization fashion. 相似文献
7.
Seo-Young Noh Author Vitae Shashi K. Gadia Author Vitae 《Journal of Systems and Software》2008,81(11):1931-1943
Starting from mid 1980s, there has been a debate about what data model is most appropriate for temporal databases. A fundamental choice one has to make is whether to use intervals of time or temporal elements to timestamp objects and events with the periods of validity. The advantage of using interval timestamps is that Start and End columns can be added to relations for treating them within the framework of classical databases, leading to quick implementation. Temporal elements are finite unions of intervals. The advantage of temporal elements is that timestamps become implicitly associated with values, tuples, and relations. Furthermore, since temporal elements, by design, are closed under set theoretical operations such as union, intersection and complementation, they lead to query languages that are natural. Here, we investigate the ease of use as well as system performance for the two approaches to help settle the debate. 相似文献
8.
9.
In this paper, given a set of sequence databases across multiple domains, we aim at mining multi-domain sequential patterns, where a multi-domain sequential pattern is a sequence of events whose occurrence time is within a pre-defined time window. We first propose algorithm Naive in which multiple sequence databases are joined as one sequence database for utilizing traditional sequential pattern mining algorithms (e.g., PrefixSpan). Due to the nature of join operations, algorithm Naive is costly and is developed for comparison purposes. Thus, we propose two algorithms without any join operations for mining multi-domain sequential patterns. Explicitly, algorithm IndividualMine derives sequential patterns in each domain and then iteratively combines sequential patterns among sequence databases of multiple domains to derive candidate multi-domain sequential patterns. However, not all sequential patterns mined in the sequence database of each domain are able to form multi-domain sequential patterns. To avoid the mining cost incurred in algorithm IndividualMine, algorithm PropagatedMine is developed. Algorithm PropagatedMine first performs one sequential pattern mining from one sequence database. In light of sequential patterns mined, algorithm PropagatedMine propagates sequential patterns mined to other sequence databases. Furthermore, sequential patterns mined are represented as a lattice structure for further reducing the number of sequential patterns to be propagated. In addition, we develop some mechanisms to allow some empty sets in multi-domain sequential patterns. Performance of the proposed algorithms is comparatively analyzed and sensitivity analysis is conducted. Experimental results show that by exploring propagation and lattice structures, algorithm PropagatedMine outperforms algorithm IndividualMine in terms of efficiency (i.e., the execution time). 相似文献
10.
11.
Tzung-Pei Hong Kuei-Ying Lin Shyue-Liang Wang 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2006,10(10):925-932
Many researchers in database and machine learning fields are primarily interested in data mining because it offers opportunities to discover useful information and important relevant patterns in large databases. Most previous studies have shown how binary valued transaction data may be handled. Transaction data in real-world applications usually consist of quantitative values, so designing a sophisticated data-mining algorithm able to deal with various types of data presents a challenge to workers in this research field. In the past, we proposed a fuzzy data-mining algorithm to find association rules. Since sequential patterns are also very important for real-world applications, this paper thus focuses on finding fuzzy sequential patterns from quantitative data. A new mining algorithm is proposed, which integrates the fuzzy-set concepts and the AprioriAll algorithm. It first transforms quantitative values in transactions into linguistic terms, then filters them to find sequential patterns by modifying the AprioriAll mining algorithm. Each quantitative item uses only the linguistic term with the maximum cardinality in later mining processes, thus making the number of fuzzy regions to be processed the same as the number of the original items. The patterns mined out thus exhibit the sequential quantitative regularity in databases and can be used to provide some suggestions to appropriate supervisors. 相似文献
12.
In this paper, we propose an efficient algorithm, called CMP-Miner, to mine closed patterns in a time-series database where each record in the database, also called a transaction, contains multiple time-series sequences. Our proposed algorithm consists of three phases. First, we transform each time-series sequence in a transaction into a symbolic sequence. Second, we scan the transformed database to find frequent patterns of length one. Third, for each frequent pattern found in the second phase, we recursively enumerate frequent patterns by a frequent pattern tree in a depth-first search manner. During the process of enumeration, we apply several efficient pruning strategies to remove frequent but non-closed patterns. Thus, the CMP-Miner algorithm can efficiently mine the closed patterns from a time-series database. The experimental results show that our proposed algorithm outperforms the modified Apriori and BIDE algorithms. 相似文献
13.
王炳雪 《计算机工程与应用》2010,46(11):142-144
为了研究时态序列模式演化特征,在给出模式演化片段、模式演化片段集合和频繁模式演化片段定义之后,基于Takens定理,论证了重构空间内模式演化与原空间模式演化之间的等价性关系;给出了重构后的频繁模式演化范型挖掘方法和频繁模式演化范型生成规则的方法;针对周期、混沌和利率三种不同类型的序列数据进行方法的有效性研究。 相似文献
14.
Discovering patterns with great significance is an important problem in data mining discipline. An episode is defined to be a partially ordered set of events for consecutive and fixed-time intervals in a sequence. Most of previous studies on episodes consider only frequent episodes in a sequence of events (called simple sequence). In real world, we may find a set of events at each time slot in terms of various intervals (hours, days, weeks, etc.). We refer to such sequences as complex sequences. Mining frequent episodes in complex sequences has more extensive applications than that in simple sequences. In this paper, we discuss the problem on mining frequent episodes in a complex sequence. We extend previous algorithm MINEPI to MINEPI+ for episode mining from complex sequences. Furthermore, a memory-anchored algorithm called EMMA is introduced for the mining task. Experimental evaluation on both real-world and synthetic data sets shows that EMMA is more efficient than MINEPI+. 相似文献
15.
16.
As the total amount of traffic data in networks has been growing at an alarming rate, there is currently a substantial body of research that attempts to mine traffic data with the purpose of obtaining useful information. For instance, there are some investigations into the detection of Internet worms and intrusions by discovering abnormal traffic patterns. However, since network traffic data contain information about the Internet usage patterns of users, network users’ privacy may be compromised during the mining process. In this paper, we propose an efficient and practical method that preserves privacy during sequential pattern mining on network traffic data. In order to discover frequent sequential patterns without violating privacy, our method uses the N-repository server model, which operates as a single mining server and the retention replacement technique, which changes the answer to a query probabilistically. In addition, our method accelerates the overall mining process by maintaining the meta tables in each site so as to determine quickly whether candidate patterns have ever occurred in the site or not. Extensive experiments with real-world network traffic data revealed the correctness and the efficiency of the proposed method. 相似文献
17.
In this paper, we explore a new data mining capability that involves mining calling path patterns in global system for mobile communication (GSM) networks. Our proposed method consists of two phases. First, we devise a data structure to convert the original calling paths in the log file into a frequent calling path graph. Second, we design an algorithm to mine the calling path patterns from the frequent calling path graph obtained. By using the frequent calling path graph to mine the calling path patterns, our proposed algorithm does not generate unnecessary candidate patterns and requires less database scans. If the corresponding calling path graph of the GSM network can be fitted in the main memory, our proposed algorithm scans the database only once. Otherwise, the cellular structure of the GSM network is divided into several partitions so that the corresponding calling path sub-graph of each partition can be fitted in the main memory. The number of database scans for this case is equal to the number of partitioned sub-graphs. Therefore, our proposed algorithm is more efficient than the PrefixSpan and a priori-like approaches. The experimental results show that our proposed algorithm outperforms the a priori-like and PrefixSpan approaches by several orders of magnitude. 相似文献
18.
Vincent S. Tseng Kawuu Weicheng Lin Jeng-Chuan Chang 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2008,12(2):157-163
Advances in the data mining technologies have enabled the intelligent Web abilities in various applications by utilizing the
hidden user behavior patterns discovered from the Web logs. Intelligent methods for discovering and predicting user’s patterns
is important in supporting intelligent Web applications like personalized services. Although numerous studies have been done
on Web usage mining, few of them consider the temporal evolution characteristic in discovering web user’s patterns. In this paper, we propose
a novel data mining algorithm named Temporal N-Gram (TN-Gram) for constructing prediction models of Web user navigation by considering the temporality property in Web usage evolution.
Moreover, three kinds of new measures are proposed for evaluating the temporal evolution of navigation patterns under different
time periods. Through experimental evaluation on both of real-life and simulated datasets, the proposed TN-Gram model is shown to outperform other approaches like N-gram modeling in terms of prediction precision, in particular when the
web user’s navigating behavior changes significantly with temporal evolution. 相似文献
19.
Sequential pattern mining is essential in many applications, including computational biology, consumer behavior analysis, web log analysis, etc. Although sequential patterns can tell us what items are frequently to be purchased together and in what order, they cannot provide information about the time span between items for decision support. Previous studies dealing with this problem either set time constraints to restrict the patterns discovered or define time-intervals between two successive items to provide time information. Accordingly, the first approach falls short in providing clear time-interval information while the second cannot discover time-interval information between two non-successive items in a sequential pattern. To provide more time-related knowledge, we define a new variant of time-interval sequential patterns, called multi-time-interval sequential patterns, which can reveal the time-intervals between all pairs of items in a pattern. Accordingly, we develop two efficient algorithms, called the MI-Apriori and MI-PrefixSpan algorithms, to solve this problem. The experimental results show that the MI-PrefixSpan algorithm is faster than the MI-Apriori algorithm, but the MI-Apriori algorithm has better scalability in long sequence data. 相似文献
20.
Tony Cheng-Kui Huang 《Information Sciences》2010,180(17):3316-418
Mining sequential patterns to find ordered events or subsequence patterns is essential in many applications, such as analysis of consumer shopping data, web clickstreams, and biological sequences. Traditional patterns reveal which items are frequently purchased together and in what order. However, information about the time intervals between purchases is missing. Therefore, Yang proposed using multi-time-interval sequential patterns to consider the time intervals between each pair of items in a pattern. For example, 〈Bread, ti1, Milk, (ti2, ti1), Jam〉 means that Bread is bought before Milk within an interval of ti1, and Jam is bought after Bread and Milk within intervals of ti2 and ti1, respectively, where ti1 and ti2 are predefined time intervals. Although this new type of pattern considers the intervals between all pairs of items, it contains a sharp boundary problem; that is, when the time interval between two purchases is near the boundary of two predetermined time ranges, we either ignore or overemphasize it. In this study, we applied the concept of fuzzy sets to solve the sharp boundary problem. The discovered patterns, called fuzzy multi-time-interval sequential patterns, describe time intervals in linguistic terms for better understanding. Two algorithms, FuzzMI-Apriori and FuzzMI-PrefixSpan, were developed for mining fuzzy multi-time-interval patterns. Experiments using synthetic and real datasets showed the algorithms’ computational efficiency, scalability, and effectiveness. 相似文献