首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In parameterized string matching the pattern P matches a substring t of the text T if there exist a bijective mapping from the symbols of P to the symbols of t. We give simple and practical algorithms for finding all such pattern occurrences in sublinear time on average. The algorithms work for a single and multiple patterns.  相似文献   

2.
The problem of pattern matching with wildcards is to find all the occurrences of a pattern of length m in a text of length n over a finite alphabet Σ (both the text and the pattern are allowed to contain wildcards). Based on the prime number encoding scheme (Chaim Linhart, Ron Shamir, Faster pattern matching with character classes using prime number encoding, J. Comput. Syst. Sci. 75 (3) (2009) 155-162), we present a new integer encoding and an efficient fast Fourier transforms based algorithm for this problem. The algorithm takes time to search the pattern in the text by computing one convolution. For matching with wildcards, our encoding uses fewer prime numbers and has shorter code words comparing with the prime number encoding. We use at most 2lg|Σ| prime numbers to encode the symbols while in the prime number encoding |Σ| prime numbers are required. This number reduces to 1.5lg|Σ| when |Σ|>40. The code word used in the algorithm is at most 2⌊lg|Σ|⌋⌈lg(5m)⌉ bits while in the prime encoding it is at least bits. We also show that the length of words can be further reduced by increasing the number of convolutions computed.  相似文献   

3.
In this paper, we revisit the Property Matching problem studied by Amir et al. [Property Matching and Weighted Matching, CPM 2006] and present a better indexing scheme for the problem. In particular, the data structure by Amir et al., namely PST, requires O(nlog|Σ|+nloglogn) construction time and O(mlog|Σ|+K) query time, where n and m are the length of, respectively, the text and the pattern, Σ is the alphabet and K is the output size. On the other hand, the construction time of our data structure, namely IDS_PIP, is dominated by the suffix tree construction time and hence is O(n) time for alphabets that are natural numbers from 1 to a polynomial in n and O(nlogσ) time otherwise, where σ=min(n,|Σ|). The query time is same as that of PST. Also, IDS_PIP has the advantage that it can be built on either a suffix tree or a suffix array and additionally, it retains the capability of answering normal pattern matching queries.  相似文献   

4.
It has long been known that pattern matching in the Hamming distance metric can be done in time, where n is the length of the text, m is the length of the pattern, and Σ is the alphabet. The classic algorithm for this is due to Abrahamson and Kosaraju. This paper considers the following generalization, motivated by the situation where the entries in the text and pattern are analog, or distorted by additive noise, or imprecisely given for some other reason: in any alignment of the pattern with the text, two aligned symbols a and b contribute +1 to the similarity score if they differ by no more than a given threshold θ, otherwise they contribute zero. We give an time algorithm for this more general version of the problem; the classic Hamming distance matching problem is the special case of θ=0.  相似文献   

5.
We present a new and simple algorithm to reconstruct suffix links in suffix trees and suffix arrays. The algorithm is based on observations regarding suffix tree construction algorithms. With our algorithm we bring suffix arrays even closer to the ease of use and implementation of suffix trees.  相似文献   

6.
7.
Simple deterministic wildcard matching   总被引:2,自引:0,他引:2  
We present a simple and fast deterministic solution to the string matching with don't cares problem. The task is to determine all positions in a text where a pattern occurs, allowing both pattern and text to contain single character wildcards. Our algorithm takes O(nlogm) time for a text of length n and a pattern of length m and in our view the algorithm is conceptually simpler than previous approaches.  相似文献   

8.
Given a text T[1…n] and a pattern P[1…m] over some alphabet Σ of size σ, we want to find all the (exact) occurrences of P in T. The well-known shift-or algorithm solves this problem in time O(nm/w⌉), where w is the number of bits in machine word, using bit-parallelism. We show how to extend the bit-parallelism in another direction, using super-alphabets. This gives a speed-up by a factor s, where s is the number of characters processed simultaneously. The algorithm is implemented, and we show that it works well in practice too. The result is the fastest known algorithm for exact string matching for short patterns and small alphabets.  相似文献   

9.
In this paper the Interpolator-based Kronecker product graph matching (IBKPGM) algorithm for performing attributed graph matching is presented. The IBKPGM algorithm is based on the Kronecker product graph matching (KPGM) formulation. This new formulation incorporates a general approach to a wide class of graph matching problems based on attributed graphs, allowing the structure of the graphs to be based on multiple sets of attributes. Salient features of the IBKPGM algorithm are that no assumption is made about the adjacency structure of the graphs to be matched, and that the explicit calculation of compatibility values between all vertices of the reference and input graphs as well as between all edges of the reference and input graphs are avoided.  相似文献   

10.
Given q+1 strings (a text t of length n and q patterns m1,…,mq) and a natural number w, the multiple serial episode matching problem consists in finding the number of size w windows of text t which contain patterns m1,…,mq as subsequences, i.e., for each mi, if mi=p1,…,pk, the letters p1,…,pk occur in the window, in the same order as in mi, but not necessarily consecutively (they may be interleaved with other letters). Our main contribution here is an algorithm solving this problem on-line in time O(nq) with an MP-RAM model (which is a RAM model equipped with extra operations).  相似文献   

11.
We study the parameterized complexity of four variants of pursuit-evasion on graphs: Seeded Pursuit Evasion, Short Seeded Pursuit Evasion, Directed Pursuit Evasion and Short Directed Pursuit Evasion. Both Seeded Pursuit Evasion and Short Seeded Pursuit Evasion are played on undirected graphs with given starting positions for both the cops and the robber. Directed Pursuit Evasion and its short variant are played on directed graphs, with the players free to choose their starting positions. We show for Seeded Pursuit Evasion and Directed Pursuit Evasion that finding a winning strategy for the cops is AW[*]-hard when we parameterize by the number of cops. Further, we show that the short (k-move) variants of these problems (Short Seeded Pursuit Evasion and Short Directed Pursuit Evasion) are AW[*]-complete when we parameterize by both the number of cops and turns.  相似文献   

12.
Thierry Lecroq 《Software》1998,28(5):561-568
Various string matching algorithms have been designed and some experimental work on string matching over bounded alphabets has been performed, but string matching over unbounded alphabets has been little investigated. We present here experimental results where symbols are taken among potentially infinite sets such as integers, reals or composed structures. These results show that, in most cases, it is better to decompose each symbol into a sequence of bytes and use algorithms which assume that the alphabet is bounded, and use heuristics on symbols. © 1998 John Wiley & Sons, Ltd.  相似文献   

13.
Approximate pattern matching algorithms have become an important tool in computer assisted music analysis and information retrieval. The number of different problem formulations has greatly expanded in recent years, not least because of the subjective nature of measuring musical similarity. From an algorithmic perspective, the complexity of each problem depends crucially on the exact definition of the difference between two strings. We present an overview of advances in approximate string matching in this field focusing on new measures of approximation.  相似文献   

14.
Ian Sommerville 《Software》1982,12(6):517-530
This paper describes a pattern matching system which has been implemented as a set of library procedures. The system provides a concise and consistent method of pattern definition and facilities for defining context sensitive pattern matching, defining repetitive patterns and defining alternatives. The operations available to the user allow him to identify if a substring matches a pattern, to extract that substring, to replace that substring and to associate a name with that substring. The system has applications in information retrieval, text manipulation and language processing.  相似文献   

15.
We present improved variations of the BNDM algorithm for exact string matching. At each alignment our bit-parallel algorithms process a q-gram before testing the state variable. In addition we apply reading a 2-gram in one instruction. Our point of view is practical efficiency of algorithms. Our experiments show that the new variations are faster than earlier algorithms in many cases.  相似文献   

16.
The goal of scaled permuted string matching is to find all occurrences of a pattern in a text, in all possible scales and permutations. Given a text of length n and a pattern of length m we present an O(n) algorithm.  相似文献   

17.
18.
We address the problem of building an index for a set D of n strings, where each string location is a subset of some finite integer alphabet of size σ, so that we can answer efficiently if a given simple query string (where each string location is a single symbol) p occurs in the set. That is, we need to efficiently find a string dD such that p[i]∈d[i] for every i. We show how to build such index in O(nlogσ/Δ(σ)log(n)) average time, where Δ is the average size of the subsets. Our methods have applications e.g. in computational biology (haplotype inference) and music information retrieval.  相似文献   

19.
In this paper, we study a restricted version of the position restricted pattern matching problem introduced and studied by Mäkinen and Navarro [V. Mäkinen, G. Navarro, Position-restricted substring searching, in: J.R. Correa, A. Hevia, M.A. Kiwi (Eds.), LATIN, in: Lecture Notes in Computer Science, vol. 3887, Springer, 2006, pp. 703-714]. In the problem handled in this paper, we are interested in those occurrences of the pattern that lies in a suffix or in a prefix of the given text. We achieve optimal query time for our problem against a data structure which is an extension of the classic suffix tree data structure. The time and space complexity of the data structure is dominated by that of the suffix tree. Notably, the (best) algorithm by Mäkinen and Navarro, if applied to our problem, gives sub-optimal query time and the corresponding data structure also requires more time and space.  相似文献   

20.
G. De V. Smit 《Software》1982,12(1):57-66
Three string matching algorithms—straightforward, Knuth-Morris-Pratt and Boyer-Moor—re examined and their time complexities discussed. A comparison of their actual average behaviour is made, based on empirical data presented. It is shown that the Boyel-Moore algorithm is extremely efficient in most cases and that, contrary to the impression one might get from the analytical results, the Knuth-Morris-Pratt algorithm is not significantly better on the average than the straightforward algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号