首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
《Information Systems》2002,27(5):345-362
The problem addressed in this paper is to discover the frequently occurred sequential patterns from databases. Basically, the existing studies on finding sequential patterns can be roughly classified into two main categories. In the first category, the discovered patterns are continuous patterns, where all the elements in the pattern appear in consecutive positions in transactions. The second category is to mine discontinuous patterns, where the adjacent elements in the pattern need not appear consecutively in transactions. Although there are many researches on finding either kind of patterns, no previous researches can find both of them. Neither can they find the discontinuous patterns formed of several continuous sub-patterns. Therefore, we define a new kind of patterns, called hybrid pattern, which is the combination of continuous patterns and discontinuous patterns. In this paper, two algorithms are developed to mine hybrid patterns, where the first algorithm is easy but slow while the second complicated but much faster than the first one. Finally, the simulation result shows that our second algorithm is as fast as the currently best algorithm for mining sequential patterns.  相似文献   

2.
The main task of mining sequential patterns is to analyze the transaction database of a company in order to find out the priorities of items that most customers take when consuming. In this article, we propose a new method—the ISP Algorithm. With this method, we can find out not only the order of consumer items of each customer, but also offer the periodic interval of consumer items of each customer. Compared with other previous periodic association rules, the difference is that the period the algorithm provides is not the repeated purchases in a regular time, but the possible repurchases within a certain time frame. The algorithm utilizes the transaction time interval of individual customers and that of all the customers to find out when and who will buy goods, and what items of goods they will buy. © 2005 Wiley Periodicals, Inc. Int J Int Syst 20: 359–373, 2005.  相似文献   

3.
In this paper we aim at extending the non-derivable condensed representation in frequent itemset mining to sequential pattern mining. We start by showing a negative example: in the context of frequent sequences, the notion of non-derivability is meaningless. Therefore, we extend our focus to the mining of conjunctions of sequences. Besides of being of practical importance, this class of patterns has some nice theoretical properties. Based on a new unexploited theoretical definition of equivalence classes for sequential patterns, we are able to extend the notion of a non-derivable itemset to the sequence domain. We present a new depth-first approach to mine non-derivable conjunctive sequential patterns and show its use in mining association rules for sequences. This approach is based on a well known combinatorial theorem: the Möbius inversion. A performance study using both synthetic and real datasets illustrates the efficiency of our mining algorithm. These new introduced patterns have a high-potential for real-life applications, especially for network monitoring and biomedical fields with the ability to get sequential association rules with all the classical statistical metrics such as confidence, conviction, lift etc.  相似文献   

4.
Mining non-redundant time-gap sequential patterns   总被引:1,自引:1,他引:0  
Mining sequential patterns is to discover sequential purchasing behaviors for most of the customers from a large amount of customer transactions. An example of such a pattern is that most of the customers purchased item B after purchasing item A, and then they purchased item C after using item B. The manager can use this information to promote item B and item C when a customer purchased item A and item B, respectively. However, the manager cannot know what time the customers will need these products if we only discover the sequential patterns without any extra information. In this paper, we develop a new algorithm to discover not only the sequential patterns but also the time interval between any two items in the pattern. We call this information the time-gap sequential patterns. An example of time-gap sequential pattern is that most of the customers purchased item A, and then they bought item B after m to n days, and then after p to q days, they bought item C. When a customer bought item A, the information about item B can be sent to this customer after m to n days, that is, we can provide the product information in which the customer is interested on the appropriate date.  相似文献   

5.
Sequential pattern mining is essential in many applications, including computational biology, consumer behavior analysis, web log analysis, etc. Although sequential patterns can tell us what items are frequently to be purchased together and in what order, they cannot provide information about the time span between items for decision support. Previous studies dealing with this problem either set time constraints to restrict the patterns discovered or define time-intervals between two successive items to provide time information. Accordingly, the first approach falls short in providing clear time-interval information while the second cannot discover time-interval information between two non-successive items in a sequential pattern. To provide more time-related knowledge, we define a new variant of time-interval sequential patterns, called multi-time-interval sequential patterns, which can reveal the time-intervals between all pairs of items in a pattern. Accordingly, we develop two efficient algorithms, called the MI-Apriori and MI-PrefixSpan algorithms, to solve this problem. The experimental results show that the MI-PrefixSpan algorithm is faster than the MI-Apriori algorithm, but the MI-Apriori algorithm has better scalability in long sequence data.  相似文献   

6.
7.
对比序列模式可以用来表征不同类别数据集之间的差异。在生物信息、物流管理、电子商务等领域,对比序列模式有着广泛的应用。Top-k对比序列模式挖掘的目标是发现数据集中对比度最高的前k个序列模式。在Top-k对比序列模式挖掘中,可能挖掘出冗余的序列模式。目前,虽然有Top-k对比序列模式发现算法被提出,但这些算法并未考虑冗余序列模式的问题。为此,本文提出了基于广度优先生成树的去冗余Top-k对比序列模式挖掘算法BFM(breadth-first miner)。使用BFM算法可以有效地解决冗余问题,得到去冗余的Top-k对比序列模式。在BFM算法的基础上,提出了性能更好的算法PBFM(pruning breadth-first miner)。通过在真实数据集上的实验分析与对比 ,验证了本文算法的有效性。  相似文献   

8.
针对序列模式挖掘,提出频繁2序列图(F2SG)来表示数据库中的序列信息,通过扫描一次数据库,将与挖掘任务相关的信息映射到F2SG中,并在此基础上提出一种新的序列模式发现算法——GBSP。GBSP算法充分利用F2SG中表示的项目之间的次序关系进行频繁序列挖掘,提高了其生成效率。理论分析与实验表明,该算法较传统的序列模式发现算法在时间和空间性能上具有优越性。  相似文献   

9.
10.
传统的数据挖掘方法会生成大量的模式和规则,且难以理解,而实际上用户感兴趣的只是其中的一小部分.针对该问题,在挖掘序列模式的PrefixSpan算法基础上提出一种带数据项约束的序列模式挖掘方法,通过数据项约束,减少了搜索空间.实验结果表明,该方法可以有效地挖掘出满足数据项约束的序列模式.  相似文献   

11.
Mining sequential patterns is used to discover all the frequent sequences in a sequence database. However, the mining may return a huge number of patterns, while the users are only interested in a particular subset of these. In this paper, we consider the problem of mining sequential patterns with itemset constraints. In order to solve this problem, we propose a new algorithm named MSPIC-DBV, which is a pattern-growth algorithm that uses prefixes and dynamic bit vectors. This algorithm prunes the search space at the beginning and during the mining process. Moreover, it reduces the number of candidates that need to be checked. The experimental results show that the proposed algorithm outperforms the previous methods.  相似文献   

12.
Knowledge and Information Systems - This paper considers the problem of sequential pattern mining (SPM) in probabilistic databases. Specifically, we consider SPM in situations where there is...  相似文献   

13.
提出一种基于最大频繁模式、模式相似与属性描述相结合的多维序列模式挖掘算法MSP,该算法包括3个步骤:挖掘数据集中的最大频繁模式,每个频繁模式成为一个模式类;比较数据中各序列项序列与各模式类的包含与相似关系;按照一定的规则抽取与各模式类相关的属性,给出以属性为前件、模式类为后件的多维序列规则为形式的多维序列模式挖掘结果....  相似文献   

14.
Mining sequential patterns from multidimensional sequence data   总被引:1,自引:0,他引:1  
The problem addressed in This work is to discover the frequently occurred sequential patterns from databases. Although much work has been devoted to this subject, to the best of our knowledge, no previous research was able to find sequential patterns from d-dimensional sequence data, where d>2. Without such a capability, many practical data would be impossible to mine. For example, an online stock-trading site may have a customer database, where each customer may visit a Web site in a series of days; each day takes a series of sessions and each session visits a series of Web pages. Then, the data for each customer forms a 3-dimensional list, where the first dimension is days, the second is sessions, and the third is visited pages. To mine sequential patterns from this kind of sequence data, two efficient algorithms have been developed in This work.  相似文献   

15.
Mining frequent patterns with periodic wildcard gaps is a critical data mining problem to deal with complex real-world problems. This problem can be described as follows: given a subject sequence, a pre-specified threshold, and a variable gap-length with wildcards between each two consecutive letters. The task is to gain all frequent patterns with periodic wildcard gaps. State-of-the-art mining algorithms which use matrices or other linear data structures to solve the problem not only consume a large amount of memory but also run slowly. In this study, we use an Incomplete Nettree structure (the last layer of a Nettree which is an extension of a tree) of a sub-pattern P to efficiently create Incomplete Nettrees of all its super-patterns with prefix pattern P and compute the numbers of their supports in a one-way scan. We propose two new algorithms, MAPB (Mining sequentiAl Pattern using incomplete Nettree with Breadth first search) and MAPD (Mining sequentiAl Pattern using incomplete Nettree with Depth first search), to solve the problem effectively with low memory requirements. Furthermore, we design a heuristic algorithm MAPBOK (MAPB for tOp-K) based on MAPB to deal with the Top-K frequent patterns for each length. Experimental results on real-world biological data demonstrate the superiority of the proposed algorithms in running time and space consumption and also show that the pattern matching approach can be employed to mine special frequent patterns effectively.  相似文献   

16.
王华东  杨杰  李亚娟 《计算机应用》2014,34(9):2612-2616
研究这样一个问题:给定多序列、支持度阈值和间隔约束,从多序列中挖掘所有出现次数不小于支持度阈值的频繁序列模式,这里要求模式中任意两个相邻元素在序列中的出现都要满足用户自定义的间隔约束,并且模式在序列中的出现要满足one-off条件。在解决该问题上,已有算法M-OneOffMine在计算模式的支持度时,只考虑模式的每个字符在序列中的首次出现,导致计算的模式支持度远小于其真实支持度,以致许多频繁的模式没有被挖掘出来。为此,设计了一个有效的带有间隔约束的多序列模式挖掘算法--MMSP算法:首先,通过采用二维表保存模式的候选位置;然后,根据候选位置采用最左最优的思想选择匹配位置。通过生物DNA序列进行实验,多序列中元素序列数目不变而序列长度变化时,MMSP挖掘出的频繁模式总数是同类算法M-OneOffMine的3.23倍;在元素序列个数变化时,MMSP挖掘出的频繁模式个数平均是M-OneOffMine的4.11倍;这两种情况下MMSP都有更好的时间性能。在模式长度变化时,MMSP挖掘出的频繁模式个数分别平均是M-OneOffMine的2.21倍和MPP的5.24倍。同时还验证了M-OneOffMine挖掘到的模式是MMSP挖掘到的频繁的子集。实验结果表明,MMSP算法不仅可以挖掘到更多的频繁模式,而且时间花费更少,更适合于实际的应用。  相似文献   

17.
在Chen等人提出的时间间隔序列模式概念的基础上,给出了一种利用有向图搜索时间间隔序列模式的算法。实验表明所提出的算法较I-Apriori算法更加快速和高效。  相似文献   

18.
In this paper, given a set of sequence databases across multiple domains, we aim at mining multi-domain sequential patterns, where a multi-domain sequential pattern is a sequence of events whose occurrence time is within a pre-defined time window. We first propose algorithm Naive in which multiple sequence databases are joined as one sequence database for utilizing traditional sequential pattern mining algorithms (e.g., PrefixSpan). Due to the nature of join operations, algorithm Naive is costly and is developed for comparison purposes. Thus, we propose two algorithms without any join operations for mining multi-domain sequential patterns. Explicitly, algorithm IndividualMine derives sequential patterns in each domain and then iteratively combines sequential patterns among sequence databases of multiple domains to derive candidate multi-domain sequential patterns. However, not all sequential patterns mined in the sequence database of each domain are able to form multi-domain sequential patterns. To avoid the mining cost incurred in algorithm IndividualMine, algorithm PropagatedMine is developed. Algorithm PropagatedMine first performs one sequential pattern mining from one sequence database. In light of sequential patterns mined, algorithm PropagatedMine propagates sequential patterns mined to other sequence databases. Furthermore, sequential patterns mined are represented as a lattice structure for further reducing the number of sequential patterns to be propagated. In addition, we develop some mechanisms to allow some empty sets in multi-domain sequential patterns. Performance of the proposed algorithms is comparatively analyzed and sensitivity analysis is conducted. Experimental results show that by exploring propagation and lattice structures, algorithm PropagatedMine outperforms algorithm IndividualMine in terms of efficiency (i.e., the execution time).  相似文献   

19.
Mining sequential patterns to find ordered events or subsequence patterns is essential in many applications, such as analysis of consumer shopping data, web clickstreams, and biological sequences. Traditional patterns reveal which items are frequently purchased together and in what order. However, information about the time intervals between purchases is missing. Therefore, Yang proposed using multi-time-interval sequential patterns to consider the time intervals between each pair of items in a pattern. For example, 〈Bread, ti1, Milk, (ti2ti1), Jam〉 means that Bread is bought before Milk within an interval of ti1, and Jam is bought after Bread and Milk within intervals of ti2 and ti1, respectively, where ti1 and ti2 are predefined time intervals. Although this new type of pattern considers the intervals between all pairs of items, it contains a sharp boundary problem; that is, when the time interval between two purchases is near the boundary of two predetermined time ranges, we either ignore or overemphasize it. In this study, we applied the concept of fuzzy sets to solve the sharp boundary problem. The discovered patterns, called fuzzy multi-time-interval sequential patterns, describe time intervals in linguistic terms for better understanding. Two algorithms, FuzzMI-Apriori and FuzzMI-PrefixSpan, were developed for mining fuzzy multi-time-interval patterns. Experiments using synthetic and real datasets showed the algorithms’ computational efficiency, scalability, and effectiveness.  相似文献   

20.
Ranked models in the form of linear transformations of multivariate feature vectors on a line can be found on the basis of a priori given order within particular pairs of objects or events. Such ranked transformations are designed to preserve given sequential order. In this way, the sequential patterns inside sets of the feature vectors can be discovered and modelled. Attention is paid here to combining problems of sequential patterns modelling and recognition with feature selection. The feature selection problem is aimed at the best representation of the sequential patterns. The convex and piecewise linear (CPL) criterion functions are used here both for designing ranked linear models and for feature selection.
Leon BobrowskiEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号