首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Sequential pattern mining is essential in many applications, including computational biology, consumer behavior analysis, web log analysis, etc. Although sequential patterns can tell us what items are frequently to be purchased together and in what order, they cannot provide information about the time span between items for decision support. Previous studies dealing with this problem either set time constraints to restrict the patterns discovered or define time-intervals between two successive items to provide time information. Accordingly, the first approach falls short in providing clear time-interval information while the second cannot discover time-interval information between two non-successive items in a sequential pattern. To provide more time-related knowledge, we define a new variant of time-interval sequential patterns, called multi-time-interval sequential patterns, which can reveal the time-intervals between all pairs of items in a pattern. Accordingly, we develop two efficient algorithms, called the MI-Apriori and MI-PrefixSpan algorithms, to solve this problem. The experimental results show that the MI-PrefixSpan algorithm is faster than the MI-Apriori algorithm, but the MI-Apriori algorithm has better scalability in long sequence data.  相似文献   

2.
闫伟  张浩  陆剑峰 《计算机应用》2005,25(7):1584-1586
采用数据挖掘中的时间序列模式对流程企业中的运行数据进行分析,首先采用模糊理论对实际数据进行处理,找出偏离常规运行状态但未到报警界限的参数点,然后采用时间窗对参数离散处理,划分时间间隔得到时间序列数据库。采用TimeSeq_PrefixSpan算法并编程实现,得到了按次序排列且有时间间隔的异常参数点对设备故障影响的规则,起到了对设备故障预警监控的作用。  相似文献   

3.
Weighted sequential pattern mining has recently been discussed in the field of data mining. Different from traditional sequential pattern mining, this kind of mining considers different significances of items in real applications, such as cost or profit. Most of the related studies adopt the maximum weighted upper-bound model to find weighted sequential patterns, but they generate a large number of unpromising candidate subsequences. In this study, we thus propose an efficient approach for finding weighted sequential patterns from sequence databases. In particular, a tightening strategy in the proposed approach is proposed to obtain more accurate weighted upper-bounds for subsequences in mining. Through the experimental evaluation, the results also show the proposed approach has good performance in terms of pruning effectiveness and execution efficiency.  相似文献   

4.
Discovering fuzzy time-interval sequential patterns in sequence databases.   总被引:1,自引:0,他引:1  
Given a sequence database and minimum support threshold, the task of sequential pattern mining is to discover the complete set of sequential patterns in databases. From the discovered sequential patterns, we can know what items are frequently brought together and in what order they appear. However, they cannot tell us the time gaps between successive items in patterns. Accordingly, Chen et al. have proposed a generalization of sequential patterns, called time-interval sequential patterns, which reveals not only the order of items, but also the time intervals between successive items. An example of time-interval sequential pattern has a form like (A, I2, B, I1, C), meaning that we buy A first, then after an interval of I2 we buy B, and finally after an interval of I1 we buy C, where I2 and I1 are predetermined time ranges. Although this new type of pattern can alleviate the above concern, it causes the sharp boundary problem. That is, when a time interval is near the boundary of two predetermined time ranges, we either ignore or overemphasize it. Therefore, this paper uses the concept of fuzzy sets to extend the original research so that fuzzy time-interval sequential patterns are discovered from databases. Two efficient algorithms, the fuzzy time interval (FTI)-Apriori algorithm and the FTI-PrefixSpan algorithm, are developed for mining fuzzy time-interval sequential patterns. In our simulation results, we find that the second algorithm outperforms the first one, not only in computing time but also in scalability with respect to various parameters.  相似文献   

5.
Unil Yun  Keun Ho Ryu 《Knowledge》2011,24(1):73-82
In data mining area, weighted frequent pattern mining has been suggested to find important frequent patterns by considering the weights of patterns. More extensions with weight constraints have been proposed such as mining weighted association rules, weighted sequential patterns, weighted closed patterns, frequent patterns with dynamic weights, weighted graphs, and weighted sub-trees or sub structures. In previous approaches of weighted frequent pattern mining, weighted supports of patterns were exactly matched to prune weighted infrequent patterns. However, in the noisy environment, the small change in weights or supports of items affects the result sets seriously. This may make the weighted frequent patterns less useful in the noisy environment. In this paper, we propose the robust concept of mining approximate weighted frequent patterns. Based on the framework of weight based pattern mining, an approximate factor is defined to relax the requirement for exact equality between weighted supports of patterns and a minimum threshold. After that, we address the concept of mining approximate weighted frequent patterns to find important patterns with/without the noisy data. We analyze characteristics of approximate weighted frequent patterns and run extensive performance tests.  相似文献   

6.
Finding correlated sequential patterns in large sequence databases is one of the essential tasks in data mining since a huge number of sequential patterns are usually mined, but it is hard to find sequential patterns with the correlation. According to the requirement of real applications, the needed data analysis should be different. In previous mining approaches, after mining the sequential patterns, sequential patterns with the weak affinity are found even with a high minimum support. In this paper, a new framework is suggested for mining weighted support affinity patterns in which an objective measure, sequential ws-confidence is developed to detect correlated sequential patterns with weighted support affinity patterns. To efficiently prune the weak affinity patterns, it is proved that ws-confidence measure satisfies the anti-monotone and cross weighted support properties which can be applied to eliminate sequential patterns with dissimilar weighted support levels. Based on the framework, a weighted support affinity pattern mining algorithm (WSMiner) is suggested. The performance study shows that WSMiner is efficient and scalable for mining weighted support affinity patterns.  相似文献   

7.
In this paper, given a set of sequence databases across multiple domains, we aim at mining multi-domain sequential patterns, where a multi-domain sequential pattern is a sequence of events whose occurrence time is within a pre-defined time window. We first propose algorithm Naive in which multiple sequence databases are joined as one sequence database for utilizing traditional sequential pattern mining algorithms (e.g., PrefixSpan). Due to the nature of join operations, algorithm Naive is costly and is developed for comparison purposes. Thus, we propose two algorithms without any join operations for mining multi-domain sequential patterns. Explicitly, algorithm IndividualMine derives sequential patterns in each domain and then iteratively combines sequential patterns among sequence databases of multiple domains to derive candidate multi-domain sequential patterns. However, not all sequential patterns mined in the sequence database of each domain are able to form multi-domain sequential patterns. To avoid the mining cost incurred in algorithm IndividualMine, algorithm PropagatedMine is developed. Algorithm PropagatedMine first performs one sequential pattern mining from one sequence database. In light of sequential patterns mined, algorithm PropagatedMine propagates sequential patterns mined to other sequence databases. Furthermore, sequential patterns mined are represented as a lattice structure for further reducing the number of sequential patterns to be propagated. In addition, we develop some mechanisms to allow some empty sets in multi-domain sequential patterns. Performance of the proposed algorithms is comparatively analyzed and sensitivity analysis is conducted. Experimental results show that by exploring propagation and lattice structures, algorithm PropagatedMine outperforms algorithm IndividualMine in terms of efficiency (i.e., the execution time).  相似文献   

8.
Sequential Pattern Mining in Multi-Databases via Multiple Alignment   总被引:2,自引:0,他引:2  
To efficiently find global patterns from a multi-database, information in each local database must first be mined and summarized at the local level. Then only the summarized information is forwarded to the global mining process. However, conventional sequential pattern mining methods based on support cannot summarize the local information and is ineffective for global pattern mining from multiple data sources. In this paper, we present an alternative local mining approach for finding sequential patterns in the local databases of a multi-database. We propose the theme of approximate sequential pattern mining roughly defined as identifying patterns approximately shared by many sequences. Approximate sequential patterns can effectively summerize and represent the local databases by identifying the underlying trends in the data. We present a novel algorithm, ApproxMAP, to mine approximate sequential patterns, called consensus patterns, from large sequence databases in two steps. First, sequences are clustered by similarity. Then, consensus patterns are mined directly from each cluster through multiple alignment. We conduct an extensive and systematic performance study over synthetic and real data. The results demonstrate that ApproxMAP is effective and scalable in mining large sequences databases with long patterns. Hence, ApproxMAP can efficiently summarize a local database and reduce the cost for global mining. Furthremore, we present an elegant and uniform model to identify both high vote sequential patterns and exceptional sequential patterns from the collection of these consensus patterns from each local databases.  相似文献   

9.
在加权序列模式挖掘中,基于候选码生成-测试方法的MWSP是目前应用性最好的算法之一,然而在挖掘过程中容易出现候选组合爆炸的情况,为此文章提出了一种高效的加权序列模式挖掘算法(PWSM)。PWSM算法引入k-最小加权支持数概念并利用前缀投影数据库原理有效地避免了候选组合爆炸的发生,并且在挖掘的过程中充分利用最小加权支持数,再次对算法进行优化。实验表明,该算法较MWSP算法能更加有效地从序列数据库中挖掘加权序列模式。  相似文献   

10.
高校图书管理系统经过多年运行产生了大量借阅数据,为从借阅数据中发现读者借阅图书的行为模式和借阅规律,提出使用PrefixSpan算法对借阅数据进行序列模式挖掘。为平衡序列模式中支持度和长度各自的重要性,将挖掘结果进行规范化处理,得到带有权值的序列模式。通过对带有权值序列模式进行分析,可得到读者借阅图书的前后衔接关系和借阅规律,根据这些借阅规律可对读者进行借阅指导。  相似文献   

11.
Given a time stamped transaction database and a user-defined reference sequence of interest over time, similarity-profiled temporal association mining discovers all associated item sets whose prevalence variations over time are similar to the reference sequence. The similar temporal association patterns can reveal interesting relationships of data items which co-occur with a particular event over time. Most works in temporal association mining have focused on capturing special temporal regulation patterns such as cyclic patterns and calendar scheme-based patterns. However, our model is flexible in representing interesting temporal patterns using a user-defined reference sequence. The dissimilarity degree of the sequence of support values of an item set to the reference sequence is used to capture how well its temporal prevalence variation matches the reference pattern. By exploiting interesting properties such as an envelope of support time sequence and a lower bounding distance for early pruning candidate item sets, we develop an algorithm for effectively mining similarity-profiled temporal association patterns. We prove the algorithm is correct and complete in the mining results and provide the computational analysis. Experimental results on real data as well as synthetic data show that the proposed algorithm is more efficient than a sequential method using a traditional support-pruning scheme.  相似文献   

12.
为了发现网络流量的规律,本文引入了一种有效的网络流量挖掘算法。网络流量模式是一种反映网络访问频率规律的序列模式,引入了一种扩展的prefixspan算法,将这些序列作为前缀去递归挖掘,并构造一个投影数据库,该算法改进了候选子序列生成效率,前缀投影减少了投影数据库的大小,从而改进了处理效率。  相似文献   

13.
序列模式挖掘就是在时序数据库中挖掘相对时间或其他模式出现频率高的模式.序列模式发现是最重要的数据挖掘任务之一,并有着广阔的应用前景.针对静态数据库,序列模式挖掘已经被深入的研究.近年来,出现了一种新的数据形式:数据流.针对基于数据流的序列模式挖掘的研究还不是十分深入.提出一个有效的基于数据流的挖掘频繁序列模式的算法SSPM,利用到2个数据结构(F-list和Tatree)来处理基于数据流的序列模式挖掘的复杂性问题.SSPM的优点是可以最大限度地降低负正例的产生,实验表明SSPM具有较高的准确率.  相似文献   

14.
序列模式在基因分析、金融预测等方面有着重要的应用,是数据挖掘的一个主要分支,鉴于数据流应用的日益增多。本文在研究传统序列模式挖掘算法的基础上,提出了一种基于可扩展滑动窗口和贝叶斯概率过滤的面向数据流的序列模式挖掘算法(BMSP—DS算法),目的是简化序列模式发现的中间结果,提高挖掘效率.以便在小的存储空间和低的运算时间内快速发现流数据的频繁序列模式,同时算法也减少了因主观支持度取值不当对模式发现造成的负面影响,实验结果表明,该算法是可行、较优的.  相似文献   

15.
Comprehending changes of customer behavior is an essential problem that must be faced for survival in a fast-changing business environment. Particularly in the management of electronic commerce (EC), many companies have developed on-line shopping stores to serve customers and immediately collect buying logs in databases. This trend has led to the development of data-mining applications. Fuzzy time-interval sequential pattern mining is one type of serviceable data-mining technique that discovers customer behavioral patterns over time. To take a shopping example, (Bread, Short, Milk, Long, Jam), means that Bread is bought before Milk in a Short period, and Jam is bought after Milk in a Long period, where Short and Long are predetermined linguistic terms given by managers. This information shown in this example reveals more general and concise knowledge for managers, allowing them to make quick-response decisions, especially in business. However, no studies, to our knowledge, have yet to address the issue of changes in fuzzy time-interval sequential patterns. The fuzzy time-interval sequential pattern, (Bread, Short, Milk, Long, Jam), became available in last year; however, is not a trend this year, and has been substituted by (Bread, Short, Yogurt, Short, Jam). Without updating this knowledge, managers might map out inappropriate marketing plans for products or services and dated inventory strategies with respect to time-intervals. To deal with this problem, we propose a novel change mining model, MineFuzzChange, to detect the change in fuzzy time-interval sequential patterns. Using a brick-and-mortar transactional dataset collected from a retail chain in Taiwan and a B2C EC dataset, experiments are carried out to evaluate the proposed model. We empirically demonstrate how the model helps managers to understand the changing behaviors of their customers and to formulate timely marketing and inventory strategies.  相似文献   

16.
《Information Systems》2002,27(5):345-362
The problem addressed in this paper is to discover the frequently occurred sequential patterns from databases. Basically, the existing studies on finding sequential patterns can be roughly classified into two main categories. In the first category, the discovered patterns are continuous patterns, where all the elements in the pattern appear in consecutive positions in transactions. The second category is to mine discontinuous patterns, where the adjacent elements in the pattern need not appear consecutively in transactions. Although there are many researches on finding either kind of patterns, no previous researches can find both of them. Neither can they find the discontinuous patterns formed of several continuous sub-patterns. Therefore, we define a new kind of patterns, called hybrid pattern, which is the combination of continuous patterns and discontinuous patterns. In this paper, two algorithms are developed to mine hybrid patterns, where the first algorithm is easy but slow while the second complicated but much faster than the first one. Finally, the simulation result shows that our second algorithm is as fast as the currently best algorithm for mining sequential patterns.  相似文献   

17.
The number of patterns discovered by data mining can become tremendous, in some cases exceeding the size of the original database. Therefore, there is a requirement for querying previously generated mining results or for querying the database against discovered patters. In this paper, we focus on developing methods for the storage and querying of large collections of sequential patterns. We describe a family of algorithms, which address the problem of considering the ordering among elements, that is crucial when dealing with sequential patterns. Moreover, we take into account the fact that the distribution of elements within sequential patterns is highly skewed, to propose a novel approach for the effective encoding of patterns. Experimental results, which examine a variety of factors, illustrate the efficiency of the proposed method.  相似文献   

18.
The goal of analyzing a time series database is to find whether and how frequent a periodic pattern is repeated within the series. Periodic pattern mining is the problem that regards temporal regularity. However, most of the existing algorithms have a major limitation in mining interesting patterns of users interest, that is, they can mine patterns of specific length with all the events sequentially one after another in exact positions within this pattern. Though there are certain scenarios where a pattern can be flexible, that is, it may be interesting and can be mined by neglecting any number of unimportant events in between important events with variable length of the pattern. Moreover, existing algorithms can detect only specific type of periodicity in various time series databases and require the interaction from user to determine periodicity. In this paper, we have proposed an algorithm for the periodic pattern mining in time series databases which does not rely on the user for the period value or period type of the pattern and can detect all types of periodic patterns at the same time, indeed these flexibilities are missing in existing algorithms. The proposed algorithm facilitates the user to generate different kinds of patterns by skipping intermediate events in a time series database and find out the periodicity of the patterns within the database. It is an improvement over the generating pattern using suffix tree, because suffix tree based algorithms have weakness in this particular area of pattern generation. Comparing with the existing algorithms, the proposed algorithm improves generating different kinds of interesting patterns and detects whether the generated pattern is periodic or not. We have tested the performance of our algorithm on both synthetic and real life data from different domains and found a large number of interesting event sequences which were missing in existing algorithms and the proposed algorithm was efficient enough in generating and detecting periodicity of flexible patterns on both types of data.  相似文献   

19.
To capture the dynamic nature of data addition and deletion, we propose a general model of sequential pattern mining with a progressive database while the data in the database may be static, inserted or deleted. In addition, we present a progressive algorithm Pisa, standing for Progressive mIning of Sequential pAtterns, to progressively discover sequential patterns in defined time period of interest. The period of interest is a sliding window continuously advancing as the time goes by. Pisa utilizes a progressive sequential tree to efficiently maintain the latest data sequences, discover the complete set of up-to-date sequential patterns, and delete obsolete data and patterns accordingly. The height of the sequential pattern tree proposed is bounded by the length of period of interest, thereby effectively limiting the memory space required by Pisa that is significantly smaller than the memory needed by alternative methods. Note that the sequential pattern mining with a static database and with an incremental database are special cases of the progressive sequential pattern mining. By changing Start time and End time of the period of interest, Pisa can easily deal with a static database or an incremental database as well. Complexity of algorithms proposed is analyzed.  相似文献   

20.
Inter-sequence pattern mining can find associations across several sequences in a sequence database, which can discover both a sequential pattern within a transaction and sequential patterns across several different transactions. However, inter-sequence pattern mining algorithms usually generate a large number of recurrent frequent patterns. We have observed mining closed inter-sequence patterns instead of frequent ones can lead to a more compact yet complete result set. Therefore, in this paper, we propose a model of closed inter-sequence pattern mining and an efficient algorithm called CISP-Miner for mining such patterns, which enumerates closed inter-sequence patterns recursively along a search tree in a depth-first search manner. In addition, several effective pruning strategies and closure checking schemes are designed to reduce the search space and thus accelerate the algorithm. Our experiment results demonstrate that the proposed CISP-Miner algorithm is very efficient and outperforms a compared EISP-Miner algorithm in most cases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号