首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Sequential pattern mining, including weighted sequential pattern mining, has been attracting much attention since it is one of the essential data mining tasks with broad applications. The weighted sequential pattern mining aims to find more interesting sequential patterns, considering the different significance of each data element in a sequence database. In the conventional weighted sequential pattern mining, usually pre-assigned weights of data elements are used to get the importance, which are derived from their quantitative information and their importance in real world application domains. In general sequential pattern mining, the generation order of data elements is considered to find sequential patterns. However, their generation times and time-intervals are also important in real world application domains. Therefore, time-interval information of data elements can be helpful in finding more interesting sequential patterns. This paper presents a new framework for finding time-interval weighted sequential (TiWS) patterns in a sequence database and time-interval weighted support (TiW-support) to find the TiWS patterns. In addition, a new method of mining TiWS patterns in a sequence database is also presented. In the proposed framework of TiWS pattern mining, the weight of each sequence in a sequence database is first obtained from the time-intervals of elements in the sequence, and subsequently TiWS patterns are found considering the weight. A series of evaluation results shows that TIWS pattern mining is efficient and helpful in finding more interesting sequential patterns.  相似文献   

2.
Mining sequential patterns from multidimensional sequence data   总被引:1,自引:0,他引:1  
The problem addressed in This work is to discover the frequently occurred sequential patterns from databases. Although much work has been devoted to this subject, to the best of our knowledge, no previous research was able to find sequential patterns from d-dimensional sequence data, where d>2. Without such a capability, many practical data would be impossible to mine. For example, an online stock-trading site may have a customer database, where each customer may visit a Web site in a series of days; each day takes a series of sessions and each session visits a series of Web pages. Then, the data for each customer forms a 3-dimensional list, where the first dimension is days, the second is sessions, and the third is visited pages. To mine sequential patterns from this kind of sequence data, two efficient algorithms have been developed in This work.  相似文献   

3.
Discovering fuzzy time-interval sequential patterns in sequence databases.   总被引:1,自引:0,他引:1  
Given a sequence database and minimum support threshold, the task of sequential pattern mining is to discover the complete set of sequential patterns in databases. From the discovered sequential patterns, we can know what items are frequently brought together and in what order they appear. However, they cannot tell us the time gaps between successive items in patterns. Accordingly, Chen et al. have proposed a generalization of sequential patterns, called time-interval sequential patterns, which reveals not only the order of items, but also the time intervals between successive items. An example of time-interval sequential pattern has a form like (A, I2, B, I1, C), meaning that we buy A first, then after an interval of I2 we buy B, and finally after an interval of I1 we buy C, where I2 and I1 are predetermined time ranges. Although this new type of pattern can alleviate the above concern, it causes the sharp boundary problem. That is, when a time interval is near the boundary of two predetermined time ranges, we either ignore or overemphasize it. Therefore, this paper uses the concept of fuzzy sets to extend the original research so that fuzzy time-interval sequential patterns are discovered from databases. Two efficient algorithms, the fuzzy time interval (FTI)-Apriori algorithm and the FTI-PrefixSpan algorithm, are developed for mining fuzzy time-interval sequential patterns. In our simulation results, we find that the second algorithm outperforms the first one, not only in computing time but also in scalability with respect to various parameters.  相似文献   

4.
Many researchers in database and machine learning fields are primarily interested in data mining because it offers opportunities to discover useful information and important relevant patterns in large databases. Most previous studies have shown how binary valued transaction data may be handled. Transaction data in real-world applications usually consist of quantitative values, so designing a sophisticated data-mining algorithm able to deal with various types of data presents a challenge to workers in this research field. In the past, we proposed a fuzzy data-mining algorithm to find association rules. Since sequential patterns are also very important for real-world applications, this paper thus focuses on finding fuzzy sequential patterns from quantitative data. A new mining algorithm is proposed, which integrates the fuzzy-set concepts and the AprioriAll algorithm. It first transforms quantitative values in transactions into linguistic terms, then filters them to find sequential patterns by modifying the AprioriAll mining algorithm. Each quantitative item uses only the linguistic term with the maximum cardinality in later mining processes, thus making the number of fuzzy regions to be processed the same as the number of the original items. The patterns mined out thus exhibit the sequential quantitative regularity in databases and can be used to provide some suggestions to appropriate supervisors.  相似文献   

5.
消费者对不同种类的产品具有不同的价格偏好,而传统的序列模式挖掘算法仅考虑序列中不同项目的出现顺序,使得挖掘到的序列模式没有包含产品价格以及种类等重要信息。为了克服传统算法的这一缺陷,在序列模式中体现更多的用户行为信息,本文基于模糊集理论,提出了一种在产品种类维度上进行的跨种类模糊价格序列模式挖掘算法。实验结果表明,与传统序列模式挖掘算法相比,该算法可以有效解决序列数据的稀疏性问题,能够挖掘得到更多个性化的序列模式。  相似文献   

6.
王华东  杨杰  李亚娟 《计算机应用》2014,34(9):2612-2616
研究这样一个问题:给定多序列、支持度阈值和间隔约束,从多序列中挖掘所有出现次数不小于支持度阈值的频繁序列模式,这里要求模式中任意两个相邻元素在序列中的出现都要满足用户自定义的间隔约束,并且模式在序列中的出现要满足one-off条件。在解决该问题上,已有算法M-OneOffMine在计算模式的支持度时,只考虑模式的每个字符在序列中的首次出现,导致计算的模式支持度远小于其真实支持度,以致许多频繁的模式没有被挖掘出来。为此,设计了一个有效的带有间隔约束的多序列模式挖掘算法--MMSP算法:首先,通过采用二维表保存模式的候选位置;然后,根据候选位置采用最左最优的思想选择匹配位置。通过生物DNA序列进行实验,多序列中元素序列数目不变而序列长度变化时,MMSP挖掘出的频繁模式总数是同类算法M-OneOffMine的3.23倍;在元素序列个数变化时,MMSP挖掘出的频繁模式个数平均是M-OneOffMine的4.11倍;这两种情况下MMSP都有更好的时间性能。在模式长度变化时,MMSP挖掘出的频繁模式个数分别平均是M-OneOffMine的2.21倍和MPP的5.24倍。同时还验证了M-OneOffMine挖掘到的模式是MMSP挖掘到的频繁的子集。实验结果表明,MMSP算法不仅可以挖掘到更多的频繁模式,而且时间花费更少,更适合于实际的应用。  相似文献   

7.
传统的数据挖掘方法会生成大量的模式和规则,且难以理解,而实际上用户感兴趣的只是其中的一小部分.针对该问题,在挖掘序列模式的PrefixSpan算法基础上提出一种带数据项约束的序列模式挖掘方法,通过数据项约束,减少了搜索空间.实验结果表明,该方法可以有效地挖掘出满足数据项约束的序列模式.  相似文献   

8.
Sequential pattern mining is essential in many applications, including computational biology, consumer behavior analysis, web log analysis, etc. Although sequential patterns can tell us what items are frequently to be purchased together and in what order, they cannot provide information about the time span between items for decision support. Previous studies dealing with this problem either set time constraints to restrict the patterns discovered or define time-intervals between two successive items to provide time information. Accordingly, the first approach falls short in providing clear time-interval information while the second cannot discover time-interval information between two non-successive items in a sequential pattern. To provide more time-related knowledge, we define a new variant of time-interval sequential patterns, called multi-time-interval sequential patterns, which can reveal the time-intervals between all pairs of items in a pattern. Accordingly, we develop two efficient algorithms, called the MI-Apriori and MI-PrefixSpan algorithms, to solve this problem. The experimental results show that the MI-PrefixSpan algorithm is faster than the MI-Apriori algorithm, but the MI-Apriori algorithm has better scalability in long sequence data.  相似文献   

9.
In this paper, given a set of sequence databases across multiple domains, we aim at mining multi-domain sequential patterns, where a multi-domain sequential pattern is a sequence of events whose occurrence time is within a pre-defined time window. We first propose algorithm Naive in which multiple sequence databases are joined as one sequence database for utilizing traditional sequential pattern mining algorithms (e.g., PrefixSpan). Due to the nature of join operations, algorithm Naive is costly and is developed for comparison purposes. Thus, we propose two algorithms without any join operations for mining multi-domain sequential patterns. Explicitly, algorithm IndividualMine derives sequential patterns in each domain and then iteratively combines sequential patterns among sequence databases of multiple domains to derive candidate multi-domain sequential patterns. However, not all sequential patterns mined in the sequence database of each domain are able to form multi-domain sequential patterns. To avoid the mining cost incurred in algorithm IndividualMine, algorithm PropagatedMine is developed. Algorithm PropagatedMine first performs one sequential pattern mining from one sequence database. In light of sequential patterns mined, algorithm PropagatedMine propagates sequential patterns mined to other sequence databases. Furthermore, sequential patterns mined are represented as a lattice structure for further reducing the number of sequential patterns to be propagated. In addition, we develop some mechanisms to allow some empty sets in multi-domain sequential patterns. Performance of the proposed algorithms is comparatively analyzed and sensitivity analysis is conducted. Experimental results show that by exploring propagation and lattice structures, algorithm PropagatedMine outperforms algorithm IndividualMine in terms of efficiency (i.e., the execution time).  相似文献   

10.

Grouping individual tourists who have the same or similar tourist routes over the same time period makes it more convenient for the tourists at a low cost by providing transportation means such as regular or occasional tour buses, driver, and tourism guides. In this paper, we propose a mathematical formulation for the tour routes clustering problem and two phases for a sequential pattern algorithm for clustering similar or identical routes according to the tourist routes of individual tourists, with illustrative examples. The first phase is to construct a site by site frequency matrix and prune infrequent tour route patterns from the matrix. The second phase is to perform clustering of the tour routes to determine the tour route using a sequential pattern mining algorithm. We compare and evaluate the performance of our algorithms, i.e., in terms of execution time and memory used. The proposed algorithm is efficient in both runtime and memory usage for the increasing number of transactions.

  相似文献   

11.
Li  Yan  Zhang  Shuai  Guo  Lei  Liu  Jing  Wu  Youxi  Wu  Xindong 《Applied Intelligence》2022,52(9):9861-9884
Applied Intelligence - Nonoverlapping sequential pattern mining, as a kind of repetitive sequential pattern mining with gap constraints, can find more valuable patterns. Traditional algorithms...  相似文献   

12.
对比序列模式可以用来表征不同类别数据集之间的差异。在生物信息、物流管理、电子商务等领域,对比序列模式有着广泛的应用。Top-k对比序列模式挖掘的目标是发现数据集中对比度最高的前k个序列模式。在Top-k对比序列模式挖掘中,可能挖掘出冗余的序列模式。目前,虽然有Top-k对比序列模式发现算法被提出,但这些算法并未考虑冗余序列模式的问题。为此,本文提出了基于广度优先生成树的去冗余Top-k对比序列模式挖掘算法BFM(breadth-first miner)。使用BFM算法可以有效地解决冗余问题,得到去冗余的Top-k对比序列模式。在BFM算法的基础上,提出了性能更好的算法PBFM(pruning breadth-first miner)。通过在真实数据集上的实验分析与对比 ,验证了本文算法的有效性。  相似文献   

13.
Mining useful information and helpful knowledge from large databases has evolved into an important research area in recent years. Among the classes of knowledge derived, finding sequential patterns in temporal transaction databases is very important since it can help model customer behavior. In the past, researchers usually assumed databases were static to simplify data-mining problems. In real-world applications, new transactions may be added into databases frequently. Designing an efficient and effective mining algorithm that can maintain sequential patterns as a database grows is thus important. In this paper, we propose a novel incremental mining algorithm for maintaining sequential patterns based on the concept of pre-large sequences to reduce the need for rescanning original databases. Pre-large sequences are defined by a lower support threshold and an upper support threshold that act as gaps to avoid the movements of sequences directly from large to small and vice versa. The proposed algorithm does not require rescanning original databases until the accumulative amount of newly added customer sequences exceeds a safety bound, which depends on database size. Thus, as databases grow larger, the numbers of new transactions allowed before database rescanning is required also grow. The proposed approach thus becomes increasingly efficient as databases grow.  相似文献   

14.
Weighted sequential pattern mining has recently been discussed in the field of data mining. Different from traditional sequential pattern mining, this kind of mining considers different significances of items in real applications, such as cost or profit. Most of the related studies adopt the maximum weighted upper-bound model to find weighted sequential patterns, but they generate a large number of unpromising candidate subsequences. In this study, we thus propose an efficient approach for finding weighted sequential patterns from sequence databases. In particular, a tightening strategy in the proposed approach is proposed to obtain more accurate weighted upper-bounds for subsequences in mining. Through the experimental evaluation, the results also show the proposed approach has good performance in terms of pruning effectiveness and execution efficiency.  相似文献   

15.
并发序列模式挖掘方法研究   总被引:1,自引:0,他引:1  
张洋  陈未如  陈珊珊 《计算机应用》2009,29(11):3096-3099
提出并发关系的概念,在此基础上给出并发度的概念,进而提出并发序列模式的概念。给出了用于挖掘并发序列模式的方法——基于支持向量的并发序列模式挖掘方法。该方法通过产生序列模式的支持向量求得2-分支并发序列模式及其支持向量;然后通过(k-1)-分支并发序列模式的支持向量和序列模式的支持向量产生k-分支并发序列模式及其支持向量,进而求得所有k分支并发序列模式。实验中采用IBM数据生成器产生的合成数据源对算法进行了验证实现,实验表明算法是有效和可行的,在不同的支持度和最小并发度下,挖掘得到并发序列模式总数随最小并发度的增大呈指数递减。  相似文献   

16.
Inter-sequence pattern mining can find associations across several sequences in a sequence database, which can discover both a sequential pattern within a transaction and sequential patterns across several different transactions. However, inter-sequence pattern mining algorithms usually generate a large number of recurrent frequent patterns. We have observed mining closed inter-sequence patterns instead of frequent ones can lead to a more compact yet complete result set. Therefore, in this paper, we propose a model of closed inter-sequence pattern mining and an efficient algorithm called CISP-Miner for mining such patterns, which enumerates closed inter-sequence patterns recursively along a search tree in a depth-first search manner. In addition, several effective pruning strategies and closure checking schemes are designed to reduce the search space and thus accelerate the algorithm. Our experiment results demonstrate that the proposed CISP-Miner algorithm is very efficient and outperforms a compared EISP-Miner algorithm in most cases.  相似文献   

17.
Mining sequential patterns from data streams: a centroid approach   总被引:1,自引:0,他引:1  
In recent years, emerging applications introduced new constraints for data mining methods. These constraints are typical of a new kind of data: the data streams. In data stream processing, memory usage is restricted, new elements are generated continuously and have to be considered in a linear time, no blocking operator can be performed and the data can be examined only once. At this time, only a few methods has been proposed for mining sequential patterns in data streams. We argue that the main reason is the combinatory phenomenon related to sequential pattern mining. In this paper, we propose an algorithm based on sequences alignment for mining approximate sequential patterns in Web usage data streams. To meet the constraint of one scan, a greedy clustering algorithm associated to an alignment method is proposed. We will show that our proposal is able to extract relevant sequences with very low thresholds.  相似文献   

18.
In this paper we consider the problem of discovering sequential patterns by handling time constraints as defined in the Gsp algorithm. While sequential patterns could be seen as temporal relationships between facts embedded in the database where considered facts are merely characteristics of individuals or observations of individual behavior, generalized sequential patterns aim to provide the end user with a more flexible handling of the transactions embedded in the database. We thus propose a new efficient algorithm, called Gtc (Graph for Time Constraints) for mining such patterns in very large databases. It is based on the idea that handling time constraints in the earlier stage of the data mining process can be highly beneficial. One of the most significant new feature of our approach is that handling of time constraints can be easily taken into account in traditional levelwise approaches since it is carried out prior to and separately from the counting step of a data sequence. Our test shows that the proposed algorithm performs significantly faster than a state-of-the-art sequence mining algorithm.  相似文献   

19.
Mining sequential patterns means to discover sequential purchasing behaviors of most customers from a large number of customer transactions. Past transaction data can be analyzed to discover customer purchasing behaviors such that the quality of business decisions can be improved. However, the size of the transaction database can be very large. It is very time consuming to find all the sequential patterns from a large database, and users may be only interested in some sequential patterns. Moreover, the criteria of the discovered sequential patterns for user requirements may not be the same. Many uninteresting sequential patterns for user requirements can be generated when traditional mining methods are applied. Hence, a data mining language needs to be provided such that users can query only knowledge of interest to them from a large database of customer transactions. In this article, a data mining language is presented. From the data mining language, users can specify the items of interest and the criteria of the sequential patterns to be discovered. Also, an efficient data mining technique is proposed to extract the sequential patterns according to the users' requests. © 2005 Wiley Periodicals, Inc. Int J Int Syst 20: 73–87, 2005.  相似文献   

20.
In response to the thriving development in electronic commerce (EC), many on-line retailers have developed Web-based information systems to handle enormous amounts of transactions on the Internet. These systems can automatically capture data on the browsing histories and purchasing records of individual customers. This capability has motivated the development of data-mining applications. Sequential pattern mining (SPM) is a useful data-mining method to discover customers’ purchasing patterns over time. We incorporate the recency, frequency, and monetary (RFM) concept presented in the marketing literature to define the RFM sequential pattern and develop a novel algorithm for generating all RFM sequential patterns from customers’ purchasing data. Using the algorithm, we propose a pattern segmentation framework to generate valuable information on customer purchasing behavior for managerial decision-making. Extensive experiments are carried out, using synthetic datasets and a transactional dataset collected by a retail chain in Taiwan, to evaluate the proposed algorithm and empirically demonstrate the benefits of using RFM sequential patterns in analyzing customers’ purchasing data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号