首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Discovering fuzzy time-interval sequential patterns in sequence databases.   总被引:1,自引:0,他引:1  
Given a sequence database and minimum support threshold, the task of sequential pattern mining is to discover the complete set of sequential patterns in databases. From the discovered sequential patterns, we can know what items are frequently brought together and in what order they appear. However, they cannot tell us the time gaps between successive items in patterns. Accordingly, Chen et al. have proposed a generalization of sequential patterns, called time-interval sequential patterns, which reveals not only the order of items, but also the time intervals between successive items. An example of time-interval sequential pattern has a form like (A, I2, B, I1, C), meaning that we buy A first, then after an interval of I2 we buy B, and finally after an interval of I1 we buy C, where I2 and I1 are predetermined time ranges. Although this new type of pattern can alleviate the above concern, it causes the sharp boundary problem. That is, when a time interval is near the boundary of two predetermined time ranges, we either ignore or overemphasize it. Therefore, this paper uses the concept of fuzzy sets to extend the original research so that fuzzy time-interval sequential patterns are discovered from databases. Two efficient algorithms, the fuzzy time interval (FTI)-Apriori algorithm and the FTI-PrefixSpan algorithm, are developed for mining fuzzy time-interval sequential patterns. In our simulation results, we find that the second algorithm outperforms the first one, not only in computing time but also in scalability with respect to various parameters.  相似文献   

2.
Comprehending changes of customer behavior is an essential problem that must be faced for survival in a fast-changing business environment. Particularly in the management of electronic commerce (EC), many companies have developed on-line shopping stores to serve customers and immediately collect buying logs in databases. This trend has led to the development of data-mining applications. Fuzzy time-interval sequential pattern mining is one type of serviceable data-mining technique that discovers customer behavioral patterns over time. To take a shopping example, (Bread, Short, Milk, Long, Jam), means that Bread is bought before Milk in a Short period, and Jam is bought after Milk in a Long period, where Short and Long are predetermined linguistic terms given by managers. This information shown in this example reveals more general and concise knowledge for managers, allowing them to make quick-response decisions, especially in business. However, no studies, to our knowledge, have yet to address the issue of changes in fuzzy time-interval sequential patterns. The fuzzy time-interval sequential pattern, (Bread, Short, Milk, Long, Jam), became available in last year; however, is not a trend this year, and has been substituted by (Bread, Short, Yogurt, Short, Jam). Without updating this knowledge, managers might map out inappropriate marketing plans for products or services and dated inventory strategies with respect to time-intervals. To deal with this problem, we propose a novel change mining model, MineFuzzChange, to detect the change in fuzzy time-interval sequential patterns. Using a brick-and-mortar transactional dataset collected from a retail chain in Taiwan and a B2C EC dataset, experiments are carried out to evaluate the proposed model. We empirically demonstrate how the model helps managers to understand the changing behaviors of their customers and to formulate timely marketing and inventory strategies.  相似文献   

3.
Mining sequential patterns from multidimensional sequence data   总被引:1,自引:0,他引:1  
The problem addressed in This work is to discover the frequently occurred sequential patterns from databases. Although much work has been devoted to this subject, to the best of our knowledge, no previous research was able to find sequential patterns from d-dimensional sequence data, where d>2. Without such a capability, many practical data would be impossible to mine. For example, an online stock-trading site may have a customer database, where each customer may visit a Web site in a series of days; each day takes a series of sessions and each session visits a series of Web pages. Then, the data for each customer forms a 3-dimensional list, where the first dimension is days, the second is sessions, and the third is visited pages. To mine sequential patterns from this kind of sequence data, two efficient algorithms have been developed in This work.  相似文献   

4.
In this paper, given a set of sequence databases across multiple domains, we aim at mining multi-domain sequential patterns, where a multi-domain sequential pattern is a sequence of events whose occurrence time is within a pre-defined time window. We first propose algorithm Naive in which multiple sequence databases are joined as one sequence database for utilizing traditional sequential pattern mining algorithms (e.g., PrefixSpan). Due to the nature of join operations, algorithm Naive is costly and is developed for comparison purposes. Thus, we propose two algorithms without any join operations for mining multi-domain sequential patterns. Explicitly, algorithm IndividualMine derives sequential patterns in each domain and then iteratively combines sequential patterns among sequence databases of multiple domains to derive candidate multi-domain sequential patterns. However, not all sequential patterns mined in the sequence database of each domain are able to form multi-domain sequential patterns. To avoid the mining cost incurred in algorithm IndividualMine, algorithm PropagatedMine is developed. Algorithm PropagatedMine first performs one sequential pattern mining from one sequence database. In light of sequential patterns mined, algorithm PropagatedMine propagates sequential patterns mined to other sequence databases. Furthermore, sequential patterns mined are represented as a lattice structure for further reducing the number of sequential patterns to be propagated. In addition, we develop some mechanisms to allow some empty sets in multi-domain sequential patterns. Performance of the proposed algorithms is comparatively analyzed and sensitivity analysis is conducted. Experimental results show that by exploring propagation and lattice structures, algorithm PropagatedMine outperforms algorithm IndividualMine in terms of efficiency (i.e., the execution time).  相似文献   

5.
Weighted sequential pattern mining has recently been discussed in the field of data mining. Different from traditional sequential pattern mining, this kind of mining considers different significances of items in real applications, such as cost or profit. Most of the related studies adopt the maximum weighted upper-bound model to find weighted sequential patterns, but they generate a large number of unpromising candidate subsequences. In this study, we thus propose an efficient approach for finding weighted sequential patterns from sequence databases. In particular, a tightening strategy in the proposed approach is proposed to obtain more accurate weighted upper-bounds for subsequences in mining. Through the experimental evaluation, the results also show the proposed approach has good performance in terms of pruning effectiveness and execution efficiency.  相似文献   

6.
Mining sequential patterns is used to discover all the frequent sequences in a sequence database. However, the mining may return a huge number of patterns, while the users are only interested in a particular subset of these. In this paper, we consider the problem of mining sequential patterns with itemset constraints. In order to solve this problem, we propose a new algorithm named MSPIC-DBV, which is a pattern-growth algorithm that uses prefixes and dynamic bit vectors. This algorithm prunes the search space at the beginning and during the mining process. Moreover, it reduces the number of candidates that need to be checked. The experimental results show that the proposed algorithm outperforms the previous methods.  相似文献   

7.
传统的数据挖掘方法会生成大量的模式和规则,且难以理解,而实际上用户感兴趣的只是其中的一小部分.针对该问题,在挖掘序列模式的PrefixSpan算法基础上提出一种带数据项约束的序列模式挖掘方法,通过数据项约束,减少了搜索空间.实验结果表明,该方法可以有效地挖掘出满足数据项约束的序列模式.  相似文献   

8.
In this paper we aim at extending the non-derivable condensed representation in frequent itemset mining to sequential pattern mining. We start by showing a negative example: in the context of frequent sequences, the notion of non-derivability is meaningless. Therefore, we extend our focus to the mining of conjunctions of sequences. Besides of being of practical importance, this class of patterns has some nice theoretical properties. Based on a new unexploited theoretical definition of equivalence classes for sequential patterns, we are able to extend the notion of a non-derivable itemset to the sequence domain. We present a new depth-first approach to mine non-derivable conjunctive sequential patterns and show its use in mining association rules for sequences. This approach is based on a well known combinatorial theorem: the Möbius inversion. A performance study using both synthetic and real datasets illustrates the efficiency of our mining algorithm. These new introduced patterns have a high-potential for real-life applications, especially for network monitoring and biomedical fields with the ability to get sequential association rules with all the classical statistical metrics such as confidence, conviction, lift etc.  相似文献   

9.
Mining frequent patterns with periodic wildcard gaps is a critical data mining problem to deal with complex real-world problems. This problem can be described as follows: given a subject sequence, a pre-specified threshold, and a variable gap-length with wildcards between each two consecutive letters. The task is to gain all frequent patterns with periodic wildcard gaps. State-of-the-art mining algorithms which use matrices or other linear data structures to solve the problem not only consume a large amount of memory but also run slowly. In this study, we use an Incomplete Nettree structure (the last layer of a Nettree which is an extension of a tree) of a sub-pattern P to efficiently create Incomplete Nettrees of all its super-patterns with prefix pattern P and compute the numbers of their supports in a one-way scan. We propose two new algorithms, MAPB (Mining sequentiAl Pattern using incomplete Nettree with Breadth first search) and MAPD (Mining sequentiAl Pattern using incomplete Nettree with Depth first search), to solve the problem effectively with low memory requirements. Furthermore, we design a heuristic algorithm MAPBOK (MAPB for tOp-K) based on MAPB to deal with the Top-K frequent patterns for each length. Experimental results on real-world biological data demonstrate the superiority of the proposed algorithms in running time and space consumption and also show that the pattern matching approach can be employed to mine special frequent patterns effectively.  相似文献   

10.
Mining sequential patterns with regular expression constraints   总被引:5,自引:0,他引:5  
Discovering sequential patterns is an important problem in data mining with a host of application domains including medicine, telecommunications, and the World Wide Web. Conventional sequential pattern mining systems provide users with only a very restricted mechanism (based on minimum support) for specifying patterns of interest. As a consequence, the pattern mining process is typically characterized by lack of focus and users often end up paying inordinate computational costs just to be inundated with an overwhelming number of useless results. We propose the use of Regular Expressions (REs) as a flexible constraint specification tool that enables user-controlled focus to be incorporated into the pattern mining process. We develop a family of novel algorithms (termed SPIRIT-Sequential Pattern mining with Regular expression consTraints) for mining frequent sequential patterns that also satisfy user-specified RE constraints. The main distinguishing factor among the proposed schemes is the degree to which the RE constraints are enforced to prune the search space of patterns during computation. Our solutions provide valuable insights into the trade-offs that arise when constraints that do not subscribe to nice properties (like anti monotonicity) are integrated into the mining process  相似文献   

11.
Mining sequential patterns is to discover sequential purchasing behaviors for most of the customers from a large amount of customer transactions. An example of such a pattern is that most of the customers purchased item B after purchasing item A, and then they purchased item C after using item B. The manager can use this information to promote item B and item C when a customer purchased item A and item B, respectively. However, the manager cannot know what time the customers will need these products if we only discover the sequential patterns without any extra information. In this paper, we develop a new algorithm to discover not only the sequential patterns but also the time interval between any two items in the pattern. We call this information the time-gap sequential patterns. An example of time-gap sequential pattern is that most of the customers purchased item A, and then they bought item B after m to n days, and then after p to q days, they bought item C. When a customer bought item A, the information about item B can be sent to this customer after m to n days, that is, we can provide the product information in which the customer is interested on the appropriate date.  相似文献   

12.
13.
为解决加权图遍历模式的挖掘问题,提出了一种从加权有向图中挖掘加权频繁模式算法.在该算法中,利用图全局拓扑结构和顶点权值信息评估遍历模式的权支持度,从而将剪枝问题转化成模式可扩展性问题,再利用可扩展模式产生候选模式集.本算法把图,顶点权值融合进来,提高了挖掘结果的准确度.实验结果表明,该算法可以有效地进行基于加权向图的权频繁模式挖掘.  相似文献   

14.
Knowledge and Information Systems - This paper considers the problem of sequential pattern mining (SPM) in probabilistic databases. Specifically, we consider SPM in situations where there is...  相似文献   

15.
《Knowledge》2007,20(1):86-97
Frequent pattern mining is one of main concerns in data mining tasks. In frequent pattern mining, closed frequent pattern mining and weighted frequent pattern mining are two main approaches to reduce the search space. Although many related studies have been suggested, no mining algorithm considers both paradigms. Even if closed frequent pattern mining represents exactly the same knowledge and weighted frequent pattern mining provides a way to discover more important patterns, the incorporation of closed frequent pattern mining and weight frequent pattern mining may loss information. Based on our analysis of joining orders, we propose closed weighted frequent pattern mining, and present how to discover succinct but lossless closed frequent pattern with weight constraints. To our knowledge, ours is the first work specifically to consider both constraints. An extensive performance study shows that our algorithm outperforms previous algorithms. In addition, it is efficient and scalable.  相似文献   

16.
Sequential pattern mining has been studied extensively in the data mining community. Most previous studies require the specification of a min_support threshold for mining a complete set of sequential patterns satisfying the threshold. However, in practice, it is difficult for users to provide an appropriate min_support threshold. To overcome this difficulty, we propose an alternative mining task: mining top-k frequent closed sequential patterns of length no less than min_, where k is the desired number of closed sequential patterns to be mined and min_ is the minimal length of each pattern. We mine the set of closed patterns because it is a compact representation of the complete set of frequent patterns. An efficient algorithm, called TSP, is developed for mining such patterns without min_support. Starting at (absolute) min_support=1, the algorithm makes use of the length constraint and the properties of top-k closed sequential patterns to perform dynamic support raising and projected database pruning. Our extensive performance study shows that TSP has high performance. In most cases, it outperforms the efficient closed sequential pattern-mining algorithm, CloSpan, even when the latter is running with the best tuned min_support threshold. Thus, we conclude that, for sequential pattern mining, mining top-k frequent closed sequential patterns without min_support is more preferable than the traditional min_support-based mining.  相似文献   

17.
Many researchers in database and machine learning fields are primarily interested in data mining because it offers opportunities to discover useful information and important relevant patterns in large databases. Most previous studies have shown how binary valued transaction data may be handled. Transaction data in real-world applications usually consist of quantitative values, so designing a sophisticated data-mining algorithm able to deal with various types of data presents a challenge to workers in this research field. In the past, we proposed a fuzzy data-mining algorithm to find association rules. Since sequential patterns are also very important for real-world applications, this paper thus focuses on finding fuzzy sequential patterns from quantitative data. A new mining algorithm is proposed, which integrates the fuzzy-set concepts and the AprioriAll algorithm. It first transforms quantitative values in transactions into linguistic terms, then filters them to find sequential patterns by modifying the AprioriAll mining algorithm. Each quantitative item uses only the linguistic term with the maximum cardinality in later mining processes, thus making the number of fuzzy regions to be processed the same as the number of the original items. The patterns mined out thus exhibit the sequential quantitative regularity in databases and can be used to provide some suggestions to appropriate supervisors.  相似文献   

18.
Mining sequential patterns from data streams: a centroid approach   总被引:1,自引:0,他引:1  
In recent years, emerging applications introduced new constraints for data mining methods. These constraints are typical of a new kind of data: the data streams. In data stream processing, memory usage is restricted, new elements are generated continuously and have to be considered in a linear time, no blocking operator can be performed and the data can be examined only once. At this time, only a few methods has been proposed for mining sequential patterns in data streams. We argue that the main reason is the combinatory phenomenon related to sequential pattern mining. In this paper, we propose an algorithm based on sequences alignment for mining approximate sequential patterns in Web usage data streams. To meet the constraint of one scan, a greedy clustering algorithm associated to an alignment method is proposed. We will show that our proposal is able to extract relevant sequences with very low thresholds.  相似文献   

19.
Unil Yun 《Information Sciences》2007,177(17):3477-3499
Most algorithms for frequent pattern mining use a support constraint to prune the combinatorial search space but support-based pruning is not enough. After mining datasets to obtain frequent patterns, the resulting patterns can have weak affinity. Although the minimum support can be increased, it is not effective for finding correlated patterns with increased weight and/or support affinity. Interesting measures have been proposed to detect correlated patterns but any approach does not consider both support and weight. In this paper, we present a new strategy, Weighted interesting pattern mining (WIP) in which a new measure, weight-confidence, is suggested to mine correlated patterns with the weight affinity. A weight range is used to decide weight boundaries and an h-confidence serves to identify support affinity patterns. In WIP, without additional computation cost, original h-confidence is used instead of the upper bound of h-confidence for performance improvement. WIP not only gives a balance between the two measures of weight and support, but also considers weight affinity and/or support affinity between items within patterns so more correlated patterns can be detected. To our knowledge, ours is the first work specifically to consider weight affinity between items of patterns. A comprehensive performance study shows that WIP is efficient and scalable for finding affinity patterns. Moreover, it generates fewer but more valuable patterns with the correlation. To decrease the number of thresholds, w-confidence, h-confidence and weighted support can be used selectively according to requirement of applications.  相似文献   

20.
为解决加权遍历模式挖掘问题,概括了加权有向图的种类,提出一种边加权有向图与顶点加权有向图间的变换模型,并基于该模型提出一种基于图遍历的加权序列模式挖掘算法GTWSPMiner.该算法根据遍历模式中的项的连续性特点,采用一种加权前缀投影序列模式增长方法,将原挖掘序列数据库的任务分解成一组挖掘局部投影数据库的小任务.对比实验结果表明,该算法能快速有效地挖掘加权频繁遍历模式.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号