首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We propose a new variant of the bit-parallel NFA of Baeza-Yates and Navarro (BPD) for approximate string matching [R. Baeza-Yates, G. Navarro, Faster approximate string matching, Algorithmica 23 (1999) 127-158]. BPD is one of the most practical approximate string matching algorithms under moderate pattern lengths and error levels [G. Myers, A fast bit-vector algorithm for approximate string matching based on dynamic programming, J. ACM 46 (3) 1989 395-415; G. Navarro, M. Raffinot, Flexible Pattern Matching in Strings—Practical On-line Search Algorithms for Texts and Biological Sequences, Cambridge University Press, Cambridge, UK, 2002]. Given a length-m pattern and an error threshold k, the original BPD requires (mk)(k+2) bits of space to represent an NFA with (mk)(k+1) states. In this paper we remove redundancy from the original NFA representation. Our variant requires (mk)(k+1) bits of space, which is optimal in the sense that exactly one bit per state is used. The space efficiency is achieved by using an alternative, but equally or even more efficient, simulation algorithm for the bit-parallel NFA. We also present experimental results to compare our modified NFA against the original BPD and its main competitors. Our new variant is more efficient than the original BPD, and it hence takes over/extends the role of the original BPD as one of the most practical approximate string matching algorithms under moderate values of k and m.  相似文献   

2.
Theapproximate string matching problem is, given a text string, a pattern string, and an integerk, to find in the text all approximate occurrences of the pattern. An approximate occurrence means a substring of the text with edit distance at mostk from the pattern. We give a newO(kn) algorithm for this problem, wheren is the length of the text. The algorithm is based on the suffix automaton with failure transitions and on the diagonalwise monotonicity of the edit distance table. Some experiments showing that the algorithm has a small overhead are reported.  相似文献   

3.
This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P, we can find all its k-error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies O((nlogkn)/B) disk pages and finds all k-error matches with O((|P|+occ)/B+logknloglogBn) I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require Ω(|P|+occ+poly(logn)) I/Os. The second index reduces the space to O((nlogn)/B) disk pages, and the I/O complexity is O((|P|+occ)/B+logk(k+1)nloglogn).  相似文献   

4.
Matching with don't-cares and a small number of mismatches   总被引:1,自引:0,他引:1  
In matching with don't-cares and k mismatches we are given a pattern of length m and a text of length n, both of which may contain don't-cares (a symbol that matches all symbols), and the goal is to find all locations in the text that match the pattern with at most k mismatches, where k is a parameter. We present new algorithms that solve this problem using a combination of convolutions and a dynamic programming procedure. We give randomized and deterministic solutions that run in time O(nk2logm) and O(nk3logm), respectively, and are faster than the most efficient extant methods for small values of k. Our deterministic algorithm is the first to obtain an O(polylog(k)⋅nlogm) running time.  相似文献   

5.
Alden H. Wright 《Software》1994,24(4):337-362
Given a text string, a pattern string, and an integer k, the problem of approximate string matching with k differences is to find all substrings of the text string whose edit distance from the pattern string is less than k. The edit distance between two strings is defined as the minimum number of differences, where a difference can be a substitution, insertion, or deletion of a single character. An implementation of the dynamic programming algorithm for this problem is given that packs several characters and mod-4 integers into a computer word. Thus, it is a parallelization of the conventional implementation that runs on ordinary processors. Since a small alphabet means that characters have short binary codes, the degree of parallelism is greatest for small alphabets and for processors with long words. For an alphabet of size 8 or smaller and a 64 bit processor, a 21-fold parallelism over the conventional algorithm can be obtained. Empirical comparisons to the basic dynamic programming algorithm, to a version of Ukkonen's algorithm, to the algorithm of Galil and Park, and to a limited implementation of the Wu-Manber algorithm are given.  相似文献   

6.
The goal of scaled permuted string matching is to find all occurrences of a pattern in a text, in all possible scales and permutations. Given a text of length n and a pattern of length m we present an O(n) algorithm.  相似文献   

7.
This paper considers the problem of gossiping with packets of limited size in a network with a cost function. We show that the problem of determining the minimum cost necessary to perform gossiping among a given set of participants with packets of limited size is NP-hard. We also give an approximate (with respect to the cost) gossiping algorithm. The ratio between the cost of an optimal algorithm and the approximate one is less than 1 + 2(k− 1)/n, wherenis the number of nodes participating in the gossiping process andkn− 1 is the upper bound on the number of individual blocks of information that a packet can carry. We also analyze the communication time and communication complexity, i.e., the product of the communication cost and time, of gossiping algorithms.  相似文献   

8.
Vidyasankar introduced a combined problem of k-exclusion and group mutual exclusion, called the group k-exclusion problem, which occurs in a situation where philosophers with the same interest can attend a forum in a meeting room, and up to k meeting rooms are available. We propose an improvement to Vidyasankar's algorithm. Waiting times in the trying region in the original algorithm and in our algorithm are bounded by n(nk)c+O(n3(nk))l and (nk)c+O(n(nk)2)l, respectively, where n is the number of processes, l is an upper bound on the time between successive two atomic steps, and c is an upper bound on the time that any philosopher spends in a forum.  相似文献   

9.
This paper presents simple and deterministic algorithms for partial point set pattern matching in 2D. Given a set P of n points, called sample set, and a query set Q of k points (n?k), the problem is to find a matching of Q with a subset of P under rigid motion. The match may be of two types: exact and approximate. If an exact matching exists, then each point in Q coincides with the corresponding point in P under some translation and/or rotation. For an approximate match, some translation and/or rotation may be allowed such that each point in Q lies in a predefined ε-neighborhood region around some point in P. The proposed algorithm for the exact matching needs O(n2) space and preprocessing time. The existence of a match for a given query set Q can be checked in time in the worst-case, where α is the maximum number of equidistant pairs of point in P. For a set of n points, α may be O(n4/3) in the worst-case. Some applications of the partial point set pattern matching are then illustrated. Experimental results on random point sets and some fingerprint databases show that, in practice, the computation time is much smaller than the worst-case requirement. The algorithm is then extended for checking the exact match of a set of k line segments in the query set with a k-subset of n line segments in the sample set under rigid motion in time. Next, a simple version of the approximate matching problem is studied where one point of Q exactly matches with a point of P, and each of the other points of Q lie in the ε-neighborhood of some point of P. The worst-case time and space complexities of the proposed algorithm are and O(n), respectively. The proposed algorithms will find many applications to fingerprint matching, image registration, and object recognition.  相似文献   

10.
We present an O(k3n2+n3) time FPT algorithm for the feedback vertex set problem in a bipartite tournament on n vertices with integral weights. This improves the previously best known O(k3.12n4) time FPT algorithm for the problem.  相似文献   

11.
Memorization is a technique which allows to speed up exponential recursive algorithms at the cost of an exponential space complexity. This technique already leads to the currently fastest algorithm for fixed-parameter vertex cover, whose time complexity is O(k1.2832k1.5+kn), where n is the number of nodes and k is the size of the vertex cover. Via a refined use of memorization, we obtain an O(k1.2759k1.5+kn) algorithm for the same problem. We moreover show how to further reduce the complexity to O(k1.2745k4+kn).  相似文献   

12.
We present an algorithm to approximate edit distance between two ordered and rooted trees of bounded degree. In this algorithm, each input tree is transformed into a string by computing the Euler string, where labels of some edges in the input trees are modified so that structures of small subtrees are reflected to the labels. We show that the edit distance between trees is at least 1/6 and at most O(n 3/4) of the edit distance between the transformed strings, where n is the maximum size of two input trees and we assume unit cost edit operations for both trees and strings. The algorithm works in O(n 2) time since computation of edit distance and reconstruction of tree mapping from string alignment takes O(n 2) time though transformation itself can be done in O(n) time.  相似文献   

13.
Finding the maximum independent set in the intersection graph of n axis-parallel rectangles is NP-hard. We re-examine two known approximation results for this problem. For the case of rectangles of unit height, Agarwal, van Kreveld and Suri [Comput. Geom. Theory Appl. 11 (1998) 209-218] gave a (1+1/k)-factor algorithm with an O(nlogn+n2k−1) time bound for any integer constant k?1; we describe a similar algorithm running in only O(nlogn+k−1) time, where Δ?n denotes the maximum number of rectangles a point can be in. For the general case, Berman, DasGupta, Muthukrishnan and Ramaswami [J. Algorithms 41 (2001) 443-470] gave a ⌈logkn⌉-factor algorithm with an O(nk+1) time bound for any integer constant k?2; we describe similar algorithms running in O(nlogn+k−2) and nO(k/logk) time.  相似文献   

14.
Given a string of lengthn, this short paper first presents anO(1)-time parallel algorithm for finding all initial palindromes and periods of the string on ann×n reconfigurable mesh (RM). Then, under the same cost (= time × the number of processors =O(n 2)), we provide a partitionable strategy when the RM doesn’t offer sufficient processors; this overcomes the hardware limitation and is very suitable for VLSI implementation. Prof. Chung was supported in part by the National Science Council of R. O. C. under contracts NSC87-2213-E011-001 and NSC87-2213-E011-003.  相似文献   

15.
We address the problem of building an index for a set D of n strings, where each string location is a subset of some finite integer alphabet of size σ, so that we can answer efficiently if a given simple query string (where each string location is a single symbol) p occurs in the set. That is, we need to efficiently find a string dD such that p[i]∈d[i] for every i. We show how to build such index in O(nlogσ/Δ(σ)log(n)) average time, where Δ is the average size of the subsets. Our methods have applications e.g. in computational biology (haplotype inference) and music information retrieval.  相似文献   

16.
Recently a new variation of approximate Boyer-Moore string matching was presented for the k-mismatch problem. The variation, called FAAST, is specifically tuned for small alphabets. We further improve this algorithm gaining speedups in both preprocessing and searching. We also present three variations of the algorithm for the k-difference problem. We show that the searching time of the algorithms is average-optimal and the preprocessing also has a lower time complexity than FAAST. Our experiments show that our algorithm for the k-mismatch problem is about 30% faster than FAAST and the new algorithms compare well against other state-of-the-art algorithms for approximate string matching.  相似文献   

17.
We present an efficient algorithm for finding all approximate occurrences of a given pattern p of length m in a text t of length n allowing for translocations of equal length adjacent factors and inversions of factors. The algorithm is based on an efficient filtering method and has an O(nmmax(α,β))-time complexity in the worst case and O(max(α,β,σ))-space complexity, where α and β are respectively the maximum length of the factors involved in any translocation and inversion, and σ is the alphabet size. Moreover we show that our algorithm has an O(n) average time complexity, whenever , for ε>0. Experiments show that the proposed algorithm achieves very good results in practical cases.  相似文献   

18.
A de Bruijn sequence over a finite alphabet of span n is a cyclic string such that all words of length n appear exactly once as factors of this sequence. We extend this definition to a subset of words of length n, characterizing for which subsets exists a de Bruijn sequence. We also study some symbolic dynamical properties of these subsets extending the definition to a language defined by forbidden factors. For these kinds of languages we present an algorithm to produce a de Bruijn sequence. In this work we use graph-theoretic and combinatorial concepts to prove these results.  相似文献   

19.
Given a text string of lengthn and a pattern string of lengthm over ab-letter alphabet, thek differences approximate string matching problem asks for all locations in the text where the pattern occurs with at mostk differences (substitutions, insertions, deletions). We treatk not as a constant but as a fraction ofm (not necessarily constant-fraction). Previous algorithms require at leastO(kn) time (or exponential space). We give an algorithm that is sublinear time0((n/m)k log b m) when the text is random andk is bounded by the threshold m/(logb m + O(1)). In particular, whenk=o(m/logb m) the expected running time iso(n). In the worst case our algorithm is O(kn), but is still an improvement in that it is practical and uses0(m) space compared with0(n) or0(m 2). We define three problems motivated by molecular biology and describe efficient algorithms based on our techniques: (1) approximate substring matching, (2) approximate-overlap detection, and (3) approximate codon matching. Respectively, applications to biology are local similarity search, sequence assembly, and DNA-protein matching.This work was supported in part by NSF Grants CCR-87-04184 and FD-89-02813; by the Human Genome Center, Lawrence Berkeley Laboratory, supported by the Director, Office of Health and Environmental Research, of the U.S. Department of Energy under Contract DE-AC03-76SF00098; and by Department of Energy Grants DE-FG03-90ER60999 and DE-FG02-91ER61190. Earlier versions of this paper appeared as [8] and part of [5].  相似文献   

20.
Local search is widely used for solving the propositional satisfiability problem. Papadimitriou (1991) showed that randomized local search solves 2-SAT in polynomial time. Recently, Schöning (1999) proved that a close algorithm for k-SAT takes time (2−2/k)n up to a polynomial factor. This is the best known worst-case upper bound for randomized 3-SAT algorithms (cf. also recent preprint by Schuler et al.).We describe a deterministic local search algorithm for k-SAT running in time (2−2/(k+1))n up to a polynomial factor. The key point of our algorithm is the use of covering codes instead of random choice of initial assignments. Compared to other “weakly exponential” algorithms, our algorithm is technically quite simple. We also describe an improved version of local search. For 3-SAT the improved algorithm runs in time 1.481n up to a polynomial factor. Our bounds are better than all previous bounds for deterministic k-SAT algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号