首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Sequential pattern mining is an important data mining problem with broad applications. While the current methods are inducing sequential patterns within a single attribute, the proposed method is able to detect them among different attributes. By incorporating the additional attributes, the sequential patterns found are richer and more informative to the user. This paper proposes a new method for inducing multi-dimensional sequential patterns with the use of Hellinger entropy measure. A number of theorems are proposed to reduce the computational complexity of the sequential pattern systems. The proposed method is tested on some synthesized transaction databases. Dr. Chang-Hwan Lee is a full professor at the Department of Information and Communications at DongGuk University, Seoul, Korea since 1996. He has received his B.Sc. and M.Sc in Computer Science and Statistics from Seoul National University in 1982 and 1988, respectively. He received his Ph.D. in Computer Science and Engineering from University of Connecticut in 1994. Prior to joining DongGuk University in Korea, he had worked for AT&T Bell Laboratories, Middletown, USA. (1994-1995). He also had been a visiting professor at the University of Illinois at Urbana-Champaign (2000-2001). He is author or co-author of more than 50 refereed articles on topics such as machine learning, data mining, artificial intelligence, pattern recognition, and bioinformatics.  相似文献   

2.
This paper proposes a flexible sequence alignment approach for pattern mining and matching in the recognition of human activities. During pattern mining, the proposed sequence alignment algorithm is invoked to extract out the representative patterns which denote specific activities of a person from the training patterns. It features high performance and robustness on pattern diversity. Besides, the algorithm evaluates the appearance probability of each pattern as weight and allows adapting pattern length to various human activities. Both of them are able to improve the accuracy of activity recognition. In pattern matching, the proposed algorithm adopts a dynamic programming based strategy to evaluate the correlation degree between each representative activity pattern and the observed activity sequence. It can avoid the trouble on segmenting the observed sequence. Moreover, we are able to obtain recognition results continuously. Besides, the proposed matching algorithm favors recognition of concurrent human activities with parallel matching. The experimental result confirms the high accuracy of human activity recognition by the proposed approach.  相似文献   

3.
In this paper, given a set of sequence databases across multiple domains, we aim at mining multi-domain sequential patterns, where a multi-domain sequential pattern is a sequence of events whose occurrence time is within a pre-defined time window. We first propose algorithm Naive in which multiple sequence databases are joined as one sequence database for utilizing traditional sequential pattern mining algorithms (e.g., PrefixSpan). Due to the nature of join operations, algorithm Naive is costly and is developed for comparison purposes. Thus, we propose two algorithms without any join operations for mining multi-domain sequential patterns. Explicitly, algorithm IndividualMine derives sequential patterns in each domain and then iteratively combines sequential patterns among sequence databases of multiple domains to derive candidate multi-domain sequential patterns. However, not all sequential patterns mined in the sequence database of each domain are able to form multi-domain sequential patterns. To avoid the mining cost incurred in algorithm IndividualMine, algorithm PropagatedMine is developed. Algorithm PropagatedMine first performs one sequential pattern mining from one sequence database. In light of sequential patterns mined, algorithm PropagatedMine propagates sequential patterns mined to other sequence databases. Furthermore, sequential patterns mined are represented as a lattice structure for further reducing the number of sequential patterns to be propagated. In addition, we develop some mechanisms to allow some empty sets in multi-domain sequential patterns. Performance of the proposed algorithms is comparatively analyzed and sensitivity analysis is conducted. Experimental results show that by exploring propagation and lattice structures, algorithm PropagatedMine outperforms algorithm IndividualMine in terms of efficiency (i.e., the execution time).  相似文献   

4.
网络处理模式匹配算法研究*   总被引:1,自引:1,他引:1  
主要从多个角度研究了经典的15种单模式和7种多模式匹配算法,并以可编程网络处理器为测试平台对其中的5种单模式和4种多模式匹配算法分别在匹配时间、占用存储空间以及预处理时间方面进行了性能测试。根据测试得出了各自测试中的最优算法。  相似文献   

5.
Constraint-based sequential pattern mining: the pattern-growth methods   总被引:4,自引:0,他引:4  
Constraints are essential for many sequential pattern mining applications. However, there is no systematic study on constraint-based sequential pattern mining. In this paper, we investigate this issue and point out that the framework developed for constrained frequent-pattern mining does not fit our mission well. An extended framework is developed based on a sequential pattern growth methodology. Our study shows that constraints can be effectively and efficiently pushed deep into the sequential pattern mining under this new framework. Moreover, this framework can be extended to constraint-based structured pattern mining as well. This research is supported in part by NSERC Grant 312194-05, NSF Grants IIS-0308001, IIS-0513678, BDI-0515813 and National Science Foundation of China (NSFC) grants No. 60303008 and 69933010. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.  相似文献   

6.
Sequential rule mining is an important data mining task used in a wide range of applications. However, current algorithms for discovering sequential rules common to several sequences use very restrictive definitions of sequential rules, which make them unable to recognize that similar rules can describe a same phenomenon. This can have many undesirable effects such as (1) similar rules that are rated differently, (2) rules that are not found because they are considered uninteresting when taken individually, (3) and rules that are too specific, which makes them less likely to be used for making predictions. In this paper, we address these problems by proposing a more general form of sequential rules such that items in the antecedent and in the consequent of each rule are unordered. We propose an algorithm named CMRules for mining this form of rules. The algorithm proceeds by first finding association rules to prune the search space for items that occur jointly in many sequences. Then it eliminates association rules that do not meet the minimum confidence and support thresholds according to the sequential ordering. We evaluate the performance of CMRules in three different ways. First, we provide an analysis of its time complexity. Second, we compare its performance (in terms of execution time, memory usage and scalability) with an adaptation of an algorithm from the literature that we name CMDeo. For this comparison, we use three real-life public datasets, which have different characteristics and represent three kinds of data. In many cases, results show that CMRules is faster and has a better scalability for low support thresholds than CMDeo. Lastly, we report a successful application of the algorithm in a tutoring agent.  相似文献   

7.
This paper investigates the correspondence matching of point-sets using spectral graph analysis. In particular, we are interested in the problem of how the modal analysis of point-sets can be rendered robust to contamination and drop-out. We make three contributions. First, we show how the modal structure of point-sets can be embedded within the framework of the EM algorithm. Second, we present several methods for computing the probabilities of point correspondences from the modes of the point proximity matrix. Third, we consider alternatives to the Gaussian proximity matrix. We evaluate the new method on both synthetic and real-world data. Here we show that the method can be used to compute useful correspondences even when the level of point contamination is as large as 50%. We also provide some examples on deformed point-set tracking.  相似文献   

8.
An active research topic in data mining is the discovery of sequential patterns, which finds all frequent subsequences in a sequence database. The generalized sequential pattern (GSP) algorithm was proposed to solve the mining of sequential patterns with time constraints, such as time gaps and sliding time windows. Recent studies indicate that the pattern-growth methodology could speed up sequence mining. However, the capabilities to mine sequential patterns with time constraints were previously available only within the Apriori framework. Therefore, we propose the DELISP (delimited sequential pattern) approach to provide the capabilities within the pattern-growth methodology. DELISP features in reducing the size of projected databases by bounded and windowed projection techniques. Bounded projection keeps only time-gap valid subsequences and windowed projection saves nonredundant subsequences satisfying the sliding time-window constraint. Furthermore, the delimited growth technique directly generates constraint-satisfactory patterns and speeds up the pattern growing process. The comprehensive experiments conducted show that DELISP has good scalability and outperforms the well-known GSP algorithm in the discovery of sequential patterns with time constraints.  相似文献   

9.
Mining sequential patterns to find ordered events or subsequence patterns is essential in many applications, such as analysis of consumer shopping data, web clickstreams, and biological sequences. Traditional patterns reveal which items are frequently purchased together and in what order. However, information about the time intervals between purchases is missing. Therefore, Yang proposed using multi-time-interval sequential patterns to consider the time intervals between each pair of items in a pattern. For example, 〈Bread, ti1, Milk, (ti2ti1), Jam〉 means that Bread is bought before Milk within an interval of ti1, and Jam is bought after Bread and Milk within intervals of ti2 and ti1, respectively, where ti1 and ti2 are predefined time intervals. Although this new type of pattern considers the intervals between all pairs of items, it contains a sharp boundary problem; that is, when the time interval between two purchases is near the boundary of two predetermined time ranges, we either ignore or overemphasize it. In this study, we applied the concept of fuzzy sets to solve the sharp boundary problem. The discovered patterns, called fuzzy multi-time-interval sequential patterns, describe time intervals in linguistic terms for better understanding. Two algorithms, FuzzMI-Apriori and FuzzMI-PrefixSpan, were developed for mining fuzzy multi-time-interval patterns. Experiments using synthetic and real datasets showed the algorithms’ computational efficiency, scalability, and effectiveness.  相似文献   

10.
We propose a sequential classification algorithm for on-line analysis of signals. The detection of successive classes is based on the detection of transition sequences, each being associated to an evolution.  相似文献   

11.
Linear relation has been found to be valuable in rule discovery of stocks, such as if stock X goes up a, stock Y will go down b. The traditional linear regression models the linear relation of two sequences faithfully. However, if a user requires clustering of stocks into groups where sequences have high linearity or similarity with each other, it is prohibitively expensive to compare sequences one by one. In this paper, we present generalized regression model (GRM) to match the linearity of multiple sequences at a time. GRM also gives strong heuristic support for graceful and efficient clustering. The experiments on the stocks in the NASDAQ market mined interesting clusters of stock trends efficiently. Hansheng Lei received his BE from Ocean University of China in 1998, MS from the University of Science and Technology of China in 2001, and Ph.D. from the University at Buffalo, the State University of New York in February 2006, all in computer science. He is currently an assistant professor in CS/CIS Department, University of Texas at Brownsville. His research interests include biometrics, pattern recognition, machine learning, and data mining. Venu Govindaraju is a professor of Computer Science and Engineering at the University at Buffalo (UB), State University of New York. He received his B.-Tech. (Honors) from the Indian Institute of Technology (IIT), Kharagpur, India in 1986, and his Ph.D. degree in Computer Science from UB in 1992. His research is focused on pattern recognition applications in the areas of biometrics and digital libraries.  相似文献   

12.
序列模式挖掘综述   总被引:4,自引:0,他引:4  
综述了序列模式挖掘的研究状况。首先介绍了序列模式挖掘背景与相关概念;其次总结了序列模式挖掘的一般方法,介绍并分析了最具代表性的序列模式挖掘算法;最后展望序列模式挖掘的研究方向。便于研究者对已有算法进行改进,提出具有更好性能的新的序列模式挖掘算法。  相似文献   

13.
Recently, high utility sequential pattern mining has been an emerging popular issue due to the consideration of quantities, profits and time orders of items. The utilities of subsequences in sequences in the existing approach are difficult to be calculated due to the three kinds of utility calculations. To simplify the utility calculation, this work then presents a maximum utility measure, which is derived from the principle of traditional sequential pattern mining that the count of a subsequence in the sequence is only regarded as one. Hence, the maximum measure is properly used to simplify the utility calculation for subsequences in mining. Meanwhile, an effective upper-bound model is designed to avoid information losing in mining, and also an effective projection-based pruning strategy is designed as well to cause more accurate sequence-utility upper-bounds of subsequences. The indexing strategy is also developed to quickly find the relevant sequences for prefixes in mining, and thus unnecessary search time can be reduced. Finally, the experimental results on several datasets show the proposed approach has good performance in both pruning effectiveness and execution efficiency.  相似文献   

14.
利用点集的凸包具有仿射不变性和局部可控性,针对图谱方法难以精确匹配旋转角度较大图像的问题,提出了图像点模式匹配的一种凸包序列的图谱方法,使得匹配在图像旋转角度较大的情形下仍具有稳定性。构建图像特征点集新的图模型(凸包),利用改进的图谱方法对凸包进行匹配,并减小原始特征点集,迭代上述过程,通过构造凸包序列,自特征点集的外围到内部逐步匹配,得到较精确的匹配对。实现基于凸包序列的图谱方法的图像点模式匹配。实验结果表明,该方法不但能精确匹配旋转角度较小的图像,而且对于旋转角度大的图像以及多光谱图像匹配精度也较高。  相似文献   

15.
针对CloSpan算法分两个阶段挖掘闭合序列模式中第一阶段需要保持候选序列且未充分利用项的位置信息、存在对数据库重复扫描和计算大小的不足,提出了posCloSpan算法。算法通过对二级索引结构进行检索实现向前剪枝,避免数据库重复扫描以及对超序索引表、子序索引表的检测,实现非闭合序列的修剪,无须保存候选序列。实验结果证明,算法在处理较长序列以及存在大量重复投影数据库的数据源时,有效降低了时间上的开销。  相似文献   

16.
Recently, considerable attention has focused on compound sequence classification methods which integrate multiple data mining techniques. Among these methods, sequential pattern mining (SPM) based sequence classifiers are considered to be efficient for solving complex sequence classification problems. Although previous studies have demonstrated the strength of SPM-based sequence classification methods, the challenges of pattern redundancy, inappropriate sequence similarity measures, and hard-to-classify sequences remain unsolved. This paper proposes an efficient two-stage SPM-based sequence classification method to address these three problems. In the first stage, during the sequential pattern mining process, redundant sequential patterns are identified if the pattern is a sub-sequence of other sequential patterns. A list of compact sequential patterns is generated excluding redundant patterns and used as representative features for the second stage. In the second stage, a sequence similarity measurement is used to evaluate partial similarity between sequences and patterns. Finally, a particle swarm optimization-AdaBoost (PSO-AB) sequence classifier is developed to improve sequence classification accuracy. In the PSO-AB sequence classifier, the PSO algorithm is used to optimize the weights in the individual sequence classifier, while the AdaBoost strategy is used to adaptively change the distribution of patterns that are hard to classify. The experiments show that the proposed two-stage SPM-based sequence classification method is efficient and superior to other approaches.  相似文献   

17.
为解决加权遍历模式挖掘问题,概括了加权有向图的种类,提出一种边加权有向图与顶点加权有向图间的变换模型,并基于该模型提出一种基于图遍历的加权序列模式挖掘算法GTWSPMiner.该算法根据遍历模式中的项的连续性特点,采用一种加权前缀投影序列模式增长方法,将原挖掘序列数据库的任务分解成一组挖掘局部投影数据库的小任务.对比实验结果表明,该算法能快速有效地挖掘加权频繁遍历模式.  相似文献   

18.
BM串匹配的一个改进算法   总被引:5,自引:0,他引:5  
在分析BM算法和文献[12]的基础上,给出了BM串匹配的一个改进算法。该算法有以下重要的特点:1)最坏情况下,算法有效地减少了字符重复比较的次数,提高了匹配效率;2)匹配算法在二维匹配和不精确匹配中较易推广。  相似文献   

19.
This paper addresses the problem of timestamped event sequence matching, a new type of similar sequence matching that retrieves the occurrences of interesting patterns from timestamped sequence databases. The sequential-scan-based method, the trie-based method, and the method based on the iso-depth index are well-known approaches to this problem. In this paper, we point out their shortcomings, and propose a new method that effectively overcomes these shortcomings. The proposed method employs an R-tree, a widely accepted multi-dimensional index structure that efficiently supports timestamped event sequence matching. To build the R-tree, this method extracts time windows from every item in a timestamped event sequence and represents them as rectangles in n-dimensional space by considering the first and last occurring times of each event type. Here, n is the total number of disparate event types that may occur in a target application. To resolve the dimensionality curse in the case when n is large, we suggest an algorithm for reducing the dimensionality by grouping the event types. Our sequence matching method based on the R-tree performs with two steps. First, it efficiently identifies a small number of candidates by searching the R-tree. Second, it picks out true answers from the set of candidates. We prove its robustness formally, and also show its effectiveness via extensive experiments.  相似文献   

20.
《国际计算机数学杂志》2012,89(3-4):149-153
The Aho-Corasick algorithm is a well-known method of determining the occurrences of one of several given pattern strings in a given text string. We address the question of augmenting the pattern matching machine constructed by this algorithm with a new pattern string, both on-line and off-line. We show that augmenting a machine of N nodes with a new pattern string of length m takes Θ(mN) time on-line and Θ(N) time off-line.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号