共查询到20条相似文献,搜索用时 15 毫秒
1.
We present a filtering based algorithm for the k-mismatch pattern matching problem with don't cares. Given a text t of length n and a pattern p of length m with don't care symbols in either p or t (but not both), and a bound k, our algorithm finds all the places that the pattern matches the text with at most k mismatches. The algorithm is deterministic and runs in Θ(nm1/3k1/3log2/3m) time. 相似文献
2.
Kimmo Fredriksson 《Information Processing Letters》2003,87(4):201-204
Given a text T[1…n] and a pattern P[1…m] over some alphabet Σ of size σ, we want to find all the (exact) occurrences of P in T. The well-known shift-or algorithm solves this problem in time O(n⌈m/w⌉), where w is the number of bits in machine word, using bit-parallelism. We show how to extend the bit-parallelism in another direction, using super-alphabets. This gives a speed-up by a factor s, where s is the number of characters processed simultaneously. The algorithm is implemented, and we show that it works well in practice too. The result is the fastest known algorithm for exact string matching for short patterns and small alphabets. 相似文献
3.
The paper shows how to compute exactly expectations, standard deviations, and cumulative probabilities of the searching times of string-matching algorithms based on the use of automata. This is derived from a methodology based on viewing the underlying Markov chains as exponential families and applying recent results on them. 相似文献
4.
Ioannis Koutis 《Information Processing Letters》2005,94(1):7-9
We present an efficient parameterized algorithm for the (k,t)-set packing problem, in which we are looking for a collection of k disjoint sets whose union consists of t elements. The complexity of the algorithm is O(2O(t)nNlogN). For the special case of sets of bounded size, this improves the O(k(ck)n) algorithm of Jia et al. [J. Algorithms 50 (1) (2004) 106]. 相似文献
6.
Efficient string matching with wildcards and length constraints 总被引:1,自引:2,他引:1
Gong Chen Xindong Wu Xingquan Zhu Abdullah N. Arslan Yu He 《Knowledge and Information Systems》2006,10(4):399-419
This paper defines a challenging problem of pattern matching between a pattern P and a text T, with wildcards and length constraints, and designs an efficient algorithm to return each pattern occurrence in an online manner. In this pattern matching problem, the user can specify the constraints on the number of wildcards between each two consecutive letters of P and the constraints on the length of each matching substring in T. We design a complete algorithm, SAIL that returns each matching substring of P in T as soon as it appears in T in an O(n+klmg) time with an O(lm) space overhead, where n is the length of T, k is the frequency of P's last letter occurring in T, l is the user-specified maximum length for each matching substring, m is the length of P, and g is the maximum difference between the user-specified maximum and minimum numbers of wildcards allowed between two consecutive letters in P.SAIL stands for string matching with wildcards and length constraints.
Gong Chen received the B.Eng. degree from the Beijing University of Technology, China, and the M.Sc. degree from the University of Vermont, USA, both in computer science. He is currently a graduate student in the Department of Statistics at the University of California, Los Angeles, USA. His research interests include data mining, statistical learning, machine learning, algorithm analysis and design, and database management.
Xindong Wu is a professor and the chair of the Department of Computer Science at the University of Vermont. He holds a Ph.D. in Artificial Intelligence from the University of Edinburgh, Britain. His research interests include data mining, knowledge-based systems, and Web information exploration. He has published extensively in these areas in various journals and conferences, including IEEE TKDE, TPAMI, ACM TOIS, IJCAI, AAAI, ICML, KDD, ICDM and WWW, as well as 12 books and conference proceedings. Dr. Wu is the Editor-in-Chief of the IEEE Transactions on Knowledge and Data Engineering (by the IEEE Computer Society), the founder and current Steering Committee Chair of the IEEE International Conference on Data Mining (ICDM),an Honorary Editor-in-Chief of Knowledge and Information Systems (by Springer), and a Series Editor of the Springer Book Series on Advanced Information and Knowledge Processing (AI&KP). He is the 2004 ACM SIGKDD Service Award winner.
Xingquan Zhu received his Ph.D degree in Computer Science from Fudan University, Shanghai, China, in 2001. He spent 4 months with Microsoft Research Asia, Beijing, China, where he was working on content-based image retrieval with relevance feedback. From 2001 to 2002, he was a postdoctoral associate in the Department of Computer Science at Purdue University, West Lafayette, IN. He is currently a research assistant professor in the Department of Computer Science, the University of Vermont, Burlington, VT. His research interests include data mining, machine learning, data quality, multimedia computing, and information retrieval. Since 2000, Dr. Zhu has published extensively, including over 50 refereed papers in various journals and conference proceedings.
Abdullah N. Arslan got his Ph.D. degree in Computer Science in 2002 from the University of California at Santa Barbara. Upon his graduation he joined the Department of Computer Science at the University of Vermont as an assistant professor. He has been with the computer science faculty there since then. Dr. Arslan's main research interests are on algorithms on strings, computational biology and bioinformatics. Dr. Arslan earned his Master's degree in Computer Science in 1996 from the University of North Texas, Denton, Texas and his Bachelor's degree in Computer Engineering in 1990 from the Middle East Technical University, Ankara, Turkey. He worked as a programmer for the Central Bank of Turkey between 1991 and 1994.
Yu He received her B.E. degree in Information Engineering from Zhejiang University, China, in 2001. She is currently a graduate student in the Department of Computer Science at the University of Vermont. Her research interests include data mining, bioinformatics and pattern recognition. 相似文献
7.
J. R. Cowie 《Software》1987,17(10):719-728
A technique is presented, based on the method of tries, which allows direct access to variable-length records in a sequential file. No reorganization of the original file is involved. 相似文献
8.
The problem of pattern matching with wildcards is to find all the occurrences of a pattern of length m in a text of length n over a finite alphabet Σ (both the text and the pattern are allowed to contain wildcards). Based on the prime number encoding scheme (Chaim Linhart, Ron Shamir, Faster pattern matching with character classes using prime number encoding, J. Comput. Syst. Sci. 75 (3) (2009) 155-162), we present a new integer encoding and an efficient fast Fourier transforms based algorithm for this problem. The algorithm takes time to search the pattern in the text by computing one convolution. For matching with wildcards, our encoding uses fewer prime numbers and has shorter code words comparing with the prime number encoding. We use at most 2lg|Σ| prime numbers to encode the symbols while in the prime number encoding |Σ| prime numbers are required. This number reduces to 1.5lg|Σ| when |Σ|>40. The code word used in the algorithm is at most 2⌊lg|Σ|⌋⌈lg(5m)⌉ bits while in the prime encoding it is at least bits. We also show that the length of words can be further reduced by increasing the number of convolutions computed. 相似文献
9.
An aggressive algorithm for multiple string matching 总被引:1,自引:0,他引:1
Liuling Dai 《Information Processing Letters》2009,109(11):553-559
A new algorithm based on the Wu-Manber algorithm for multiple string matching is presented in this paper. The algorithm eliminates the functional overlap of the table HASH and SHIFT, and computes the shift distances in an aggressive manner. After each test, the algorithm examines the character next to the scan window to maximize the shift distance. This idea is consistent with that of the quick-search (QS) algorithm. Experimental results on four alphabets show that the new algorithm is more efficient than Wu-Manber and other recent algorithms, particularly on short pattern sets and large alphabet. 相似文献
10.
针对目前已有的算法在计算带有可变长度通配符的模式在文本中的出现次数问题时,需要的时间是多项式级别,而且受文本长度、模式长度和通配符间距的影响比较大。提出了一种基于Aho-Corasick自动机的AAI(pAttern mAtching with wIldcards) 算法,计算中采用了动态规划思想和有效的修剪技术。AAI算法的时间复杂度和空间复杂度分别为[O(n+m+α)]和[O(m+B)],其中[n]和[m]分别表示文本和模式的长度,[α]是所有子模式在文本中出现的数目,[B]是模式中通配符间距下限的总和。通过真实数据和人工数据的实验结果表明,AAI算法与同类算法相比具备显著的优势。 相似文献
11.
Mikhail J. Atallah 《Algorithmica》1993,9(2):156-167
We give an improved parallel algorithm for the problem of computing the tube minima of a totally monotonen ×n ×n matrix, an important matrix searching problem that was formalized by Aggarwal and Park and has many applications. Our algorithm runs inO(log logn) time withO(n2/log logn) processors in theCRCW-PRAM model, whereas the previous best ran inO((log logn)2) time withO(n2/(log logn)2 processors, also in theCRCW-PRAM model. Thus we improve the speed without any deterioration in thetime ×processors product. Our improved bound immediately translates into improvedCRCW-PRAM bounds for the numerous applications of this problem, including string editing, construction of Huffmann codes and other coding trees, and many other combinatorial and geometric problems.This research was supported by the Office of Naval Research under Grants N00014-84-K-0502 and N00014-86-K-0689, the Air Force Office of Scientific Research under Grant AFOSR-90-0107, the National Science Foundation under Grant DCR-8451393, and the National Library of Medicine under Grant R01-LM05118. Part of the research was done while the author was at Princeton University, visiting the DIMACS center. 相似文献
12.
This paper presents simple and deterministic algorithms for partial point set pattern matching in 2D. Given a set P of n points, called sample set, and a query set Q of k points (n?k), the problem is to find a matching of Q with a subset of P under rigid motion. The match may be of two types: exact and approximate. If an exact matching exists, then each point in Q coincides with the corresponding point in P under some translation and/or rotation. For an approximate match, some translation and/or rotation may be allowed such that each point in Q lies in a predefined ε-neighborhood region around some point in P. The proposed algorithm for the exact matching needs O(n2) space and preprocessing time. The existence of a match for a given query set Q can be checked in time in the worst-case, where α is the maximum number of equidistant pairs of point in P. For a set of n points, α may be O(n4/3) in the worst-case. Some applications of the partial point set pattern matching are then illustrated. Experimental results on random point sets and some fingerprint databases show that, in practice, the computation time is much smaller than the worst-case requirement. The algorithm is then extended for checking the exact match of a set of k line segments in the query set with a k-subset of n line segments in the sample set under rigid motion in time. Next, a simple version of the approximate matching problem is studied where one point of Q exactly matches with a point of P, and each of the other points of Q lie in the ε-neighborhood of some point of P. The worst-case time and space complexities of the proposed algorithm are and O(n), respectively. The proposed algorithms will find many applications to fingerprint matching, image registration, and object recognition. 相似文献
13.
提出了一种适用于大规模特征集的快速匹配算法——SRS算法,该算法性能优异,在特征集达到100 000条时,匹配速度比经典算法快10倍以上。该算法适用于内容过滤、防病毒、反垃圾邮件、短信过滤、网络入侵检测和防御等众多领域。 相似文献
14.
15.
G. De V. Smit 《Software》1982,12(1):57-66
Three string matching algorithms—straightforward, Knuth-Morris-Pratt and Boyer-Moor—re examined and their time complexities discussed. A comparison of their actual average behaviour is made, based on empirical data presented. It is shown that the Boyel-Moore algorithm is extremely efficient in most cases and that, contrary to the impression one might get from the analytical results, the Knuth-Morris-Pratt algorithm is not significantly better on the average than the straightforward algorithm. 相似文献
16.
针对目前基于免疫的IDS中匹配算法存在的问题, 提出了一种r可变匹配算法。该算法通过动态调整匹配r值,控制匹配速度,提高了检测性能。同时,给出了检测性能的形式化定义,定义了自体非自体、抗体以及r匹配算法的动态变化方程;最后给出了具体的实现过程。理论分析和实验结果表明,该算法具有较高的效率,提高了入侵检测系统的检测性能。 相似文献
17.
The automatic recognition of dialogue act is a task of crucial importance for the processing of natural language dialogue at discourse level. It is also one of the most challenging problems as most often the dialogue act is not expressed directly in speaker’s utterance. In this paper, a new cue-based model for dialogue act recognition is presented. The model is, essentially, a dynamic Bayesian network induced from manually annotated dialogue corpus via dynamic Bayesian machine learning algorithms. Furthermore, the dynamic Bayesian network’s random variables are constituted from sets of lexical cues selected automatically by means of a variable length genetic algorithm, developed specifically for this purpose. To evaluate the proposed approaches of design, three stages of experiments have been conducted. In the initial stage, the dynamic Bayesian network model is constructed using sets of lexical cues selected manually from the dialogue corpus. The model is evaluated against two previously proposed models and the results confirm the potentiality of dynamic Bayesian networks for dialogue act recognition. In the second stage, the developed variable length genetic algorithm is used to select different sets of lexical cues to constitute the dynamic Bayesian networks’ random variables. The developed approach is evaluated against some of the previously used ranking approaches and the results provide experimental evidences on its ability to avoid the drawbacks of the ranking approaches. In the third stage, the dynamic Bayesian networks model is constructed using random variables constituted from the sets of lexical cues generated in the second stage and the results confirm the effectiveness of the proposed approaches for designing dialogue act recognition model. 相似文献
18.
19.
In this paper we propose a new approach in genetic algorithm called distributed hierarchical genetic algorithm (DHGA) for optimization and pattern matching. It is eventually a hybrid technique combining the advantages of both distributed and hierarchical processes in exploring the search space. The search is initially distributed over the space and then in each subspace the algorithm works in a hierarchical way. The entire space is essentially partitioned into a number of subspaces depending on the dimensionality of the space. This is done in order to spread the search process more evenly over the whole space. In each subspace the genetic algorithm is employed for searching and the search process advances from one hypercube to a neighboring hypercube hierarchically depending on the convergence status of the population and the solution obtained so far. The dimension of the hypercube and the resolution of the search space are altered with iterations. Thus the search process passes through variable resolution (coarse-to-fine) search space. Both analytical and empirical studies have been carried out to evaluate the performance between DHGA and distributed conventional GA (DCGA) for different function optimization problems. Further, the performance of the algorithms is demonstrated on problems like pattern matching and object matching with edge map. 相似文献
20.
针对近似模式匹配算法在处理带有灵活通配符和长度约束近似模式匹配(APMWL)问题时只能解决替换操作, 提出一种基于动态规划的编辑距离矩阵(EDM)构造方法,设计了基于EDM的近似模式匹配算法APM, 可以处理近似匹配中的三种编辑操作,即插入、替换和删除操作。此外,根据文本中字符是否允许被重复使用的约束条件,设计APM-OF算法。实验结果表明,APM和APM-OF与同类算法相比具备显著的优势:与Sail_Approx匹配算法实验对比, 获取解的平均增长率分别达到8.34%和12.37%; 将APM-OF算法应用至模式挖掘中, 挖掘出的频繁近似模式个数为OneoffMining算法的2.07倍。 相似文献