首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Theapproximate string matching problem is, given a text string, a pattern string, and an integerk, to find in the text all approximate occurrences of the pattern. An approximate occurrence means a substring of the text with edit distance at mostk from the pattern. We give a newO(kn) algorithm for this problem, wheren is the length of the text. The algorithm is based on the suffix automaton with failure transitions and on the diagonalwise monotonicity of the edit distance table. Some experiments showing that the algorithm has a small overhead are reported.  相似文献   

2.
This paper considers the problem of computing a constrained edit distance between unordered labeled trees. The problem of approximate unordered tree matching is also considered. We present dynamic programming algorithms solving these problems in sequential timeO(|T 1|×|T 2|×(deg(T 1)+deg(T 2))× log2(deg(T 1)+deg(T 2))). Our previous result shows that computing the edit distance between unordered labeled trees is NP-complete.This research was supported by the Natural Sciences and Engineering Research Council of Canada under Grant No. OGP0046373.  相似文献   

3.
We present a randomized algorithm for finding maximum matchings in planar graphs in timeO(n ω/2), whereω is the exponent of the best known matrix multiplication algorithm. Sinceω<2.38, this algorithm breaks through theO(n 1.5) barrier for the matching problem. This is the first result of this kind for general planar graphs. We also present an algorithm for generating perfect matchings in planar graphs uniformly at random usingO(n ω/2) arithmetic operations. Our algorithms are based on the Gaussian elimination approach to maximum matchings introduced in [16]. This research was supported by KBN Grant 4T11C04425.  相似文献   

4.
We study approximation algorithms and hardness of approximation for several versions of the problem of packing Steiner trees. For packing edge-disjoint Steiner trees of undirected graphs, we show APX-hardness for four terminals. For packing Steiner-node-disjoint Steiner trees of undirected graphs, we show a logarithmic hardness result, and give an approximation guarantee ofO (√n logn), wheren denotes the number of nodes. For the directed setting (packing edge-disjoint Steiner trees of directed graphs), we show a hardness result of Θ(m 1/3/−ɛ) and give an approximation guarantee ofO(m 1/2/+ɛ), wherem denotes the number of edges. We have similar results for packing Steiner-node-disjoint priority Steiner trees of undirected graphs. Supported by NSERC Grant No. OGP0138432. Supported by an NSERC postdoctoral fellowship, Department of Combinatorics and Optimization at University of Waterloo, and a University start-up fund at University of Alberta.  相似文献   

5.
A stringw isprimitive if it is not a power of another string (i.e., writingw =v k impliesk = 1. Conversely,w is asquare ifw =vv, withv a primitive string. A stringx issquare-free if it has no nonempty substring of the formww. It is shown that the square-freedom of a string ofn symbols over an arbitrary alphabet can be tested by a CRCW PRAM withn processors inO(logn) time and linear auxiliary space. If the cardinality of the input alphabet is bounded by a constant independent of the input size, then the number of processors can be reduced ton/logn without affecting the time complexity of this strategy. The fastest sequential algorithms solve this problemO(n logn) orO(n) time, depending on whether the cardinality of the input alphabet is unbounded or bounded, and either performance is known to be optimal within its class. More elaborate constructions lead to a CRCW PRAM algorithm for detecting, within the samen-processors bounds, all positioned squares inx in timeO(logn) and using linear auxiliary space. The fastest sequential algorithms solve this problem inO(n logn) time, and such a performance is known to be optimal.This research was supported, through the Leonardo Fibonacci Institute, by the Istituto Trentino di Cultura, Trento, Italy. Additional support was provided by the French and Italian Ministries of Education, by the National Research Council of Italy, by the British Research Council Grant SERC-E76797, by NSF Grant CCR-89-00305, by NIH Library of Medicine Grant ROI LM05118, by AFOSR Grant 90-0107, and by NATO Grant CRG900293.  相似文献   

6.
We introduce a novel definition of approximate palindromes in strings, and provide an algorithm to find all maximal approximate palindromes in a string with up to k errors. Our definition is based on the usual edit operations of approximate pattern matching, and the algorithm we give, for a string of size n on a fixed alphabet, runs in O(k2n) time. We also discuss two implementation-related improvements to the algorithm, and demonstrate their efficacy in practice by means of both experiments and an average-case analysis.  相似文献   

7.
分布式存储的并行串匹配算法的设计与分析   总被引:7,自引:0,他引:7  
陈国良  林洁  顾乃杰 《软件学报》2000,11(6):771-778
并行串匹配算法的研究大都集中在PRAM(parallel random access machine)模型上,其他更为实际的模型上的并行串匹配算法的研究相对要薄弱得多.该文采用将最优串行算法并行化的技术,利用模式串的周期性质,巧妙地将改进的KMP(Knuth-Morris-Pratt)算法并行化,提出了一个简便、高效且具有良好可扩放性的分布式串匹配算法,其计算复杂度为O(n/p+m),通信复杂度为O(ulogp相似文献   

8.
We study the classical approximate string matching problem, that is, given strings P and Q and an error threshold k, find all ending positions of substrings of Q whose edit distance to P is at most k. Let P and Q have lengths m and n, respectively. On a standard unit-cost word RAM with word size w≥log n we present an algorithm using time
O(nk ·min(\fraclog2 mlogn,\fraclog2 mlogww) + n)O\biggl(nk \cdot \min\biggl(\frac{\log^2 m}{\log n},\frac{\log^2 m\log w}{w}\biggr) + n\biggr)  相似文献   

9.
The problem of outputting all parse trees of a string accepted by a context-free grammar is considered. A systolic algorithms is presented that operates inO(m·n) time, wherem is the number of distinct parse trees andn is the length of the input. The systolic array usesn 2 processors, each of which requires at mostO(logn) bits of storage. This is much more space-efficient that a previously reported systolic algorithm for the same problem, which requiredO(n logn) space per processor. The algorithm also extends previous algorithms that only output a single parse tree of the input.Research squpported in part by NSF Grant DCR-8420935 and DCR-8604603.  相似文献   

10.
R. Wenger 《Algorithmica》1997,17(3):322-329
This paper contains a simple, randomized algorithm for constructing the convex hull of a set ofn points in the plane with expected running timeO(nlogh) whereh is the number of points on the convex hull. Supported in part by NSA Grant MDA904-93-H-3026 and by the NSF Regional Geometry Institute (Smith College, July 1993) Grant DMS-90 13220.  相似文献   

11.
The classical pattern matching paradigm is that of seeking occurrences of one string in another, where both strings are drawn from an alphabet set Σ. In the parameterized pattern matching model, a consistent renaming of symbols from Σ is allowed in a match. The parameterized matching paradigm has proven useful in problems in software engineering, computer vision, and other applications. In classical pattern matching, both the text and pattern are strings. Applications such as searching in xml or searching in hypertext require searching strings in non-linear structures such as trees or graphs. There has been work in the literature on exact and approximate parameterized matching, as well as work on exact and approximate string matching on non-linear structures. In this paper we explore parameterized matching in non-linear structures. We prove that exact parameterized matching on trees can be computed in linear time for alphabets in an O(n)-size integer range, and in time O(nlogm) in general, where n is the tree size and m the pattern length. These bounds are optimal in the comparison model. We also show that exact parameterized matching on directed acyclic graphs (DAGs) is NP-complete.  相似文献   

12.
We present an algorithm to approximate edit distance between two ordered and rooted trees of bounded degree. In this algorithm, each input tree is transformed into a string by computing the Euler string, where labels of some edges in the input trees are modified so that structures of small subtrees are reflected to the labels. We show that the edit distance between trees is at least 1/6 and at most O(n 3/4) of the edit distance between the transformed strings, where n is the maximum size of two input trees and we assume unit cost edit operations for both trees and strings. The algorithm works in O(n 2) time since computation of edit distance and reconstruction of tree mapping from string alignment takes O(n 2) time though transformation itself can be done in O(n) time.  相似文献   

13.
Given a text string of lengthn and a pattern string of lengthm over ab-letter alphabet, thek differences approximate string matching problem asks for all locations in the text where the pattern occurs with at mostk differences (substitutions, insertions, deletions). We treatk not as a constant but as a fraction ofm (not necessarily constant-fraction). Previous algorithms require at leastO(kn) time (or exponential space). We give an algorithm that is sublinear time0((n/m)k log b m) when the text is random andk is bounded by the threshold m/(logb m + O(1)). In particular, whenk=o(m/logb m) the expected running time iso(n). In the worst case our algorithm is O(kn), but is still an improvement in that it is practical and uses0(m) space compared with0(n) or0(m 2). We define three problems motivated by molecular biology and describe efficient algorithms based on our techniques: (1) approximate substring matching, (2) approximate-overlap detection, and (3) approximate codon matching. Respectively, applications to biology are local similarity search, sequence assembly, and DNA-protein matching.This work was supported in part by NSF Grants CCR-87-04184 and FD-89-02813; by the Human Genome Center, Lawrence Berkeley Laboratory, supported by the Director, Office of Health and Environmental Research, of the U.S. Department of Energy under Contract DE-AC03-76SF00098; and by Department of Energy Grants DE-FG03-90ER60999 and DE-FG02-91ER61190. Earlier versions of this paper appeared as [8] and part of [5].  相似文献   

14.
There is no known algorithm that solves the general case of the approximate edit distance problem, where the edit operations are insertion, deletion, mismatch, and swap, in time o(nm), where n is the length of the text and m is the length of the pattern.In the effort to study this problem, the edit operations have been analyzed independently. Karloff [10] showed an algorithm that approximates the edit distance problem with only the mismatch operation in time . Amir et al. [4] showed that if the only edit operations allowed are swap and mismatch, then the exact edit distance problem can be solved in time .In this paper, we discuss the problem of approximate edit distance with swap and mismatch. We show a randomized time algorithm for the problem. The algorithm guarantees an approximation factor of (1+?) with probability of at least .  相似文献   

15.
We give an improved parallel algorithm for the problem of computing the tube minima of a totally monotonen ×n ×n matrix, an important matrix searching problem that was formalized by Aggarwal and Park and has many applications. Our algorithm runs inO(log logn) time withO(n2/log logn) processors in theCRCW-PRAM model, whereas the previous best ran inO((log logn)2) time withO(n2/(log logn)2 processors, also in theCRCW-PRAM model. Thus we improve the speed without any deterioration in thetime ×processors product. Our improved bound immediately translates into improvedCRCW-PRAM bounds for the numerous applications of this problem, including string editing, construction of Huffmann codes and other coding trees, and many other combinatorial and geometric problems.This research was supported by the Office of Naval Research under Grants N00014-84-K-0502 and N00014-86-K-0689, the Air Force Office of Scientific Research under Grant AFOSR-90-0107, the National Science Foundation under Grant DCR-8451393, and the National Library of Medicine under Grant R01-LM05118. Part of the research was done while the author was at Princeton University, visiting the DIMACS center.  相似文献   

16.
Given an alphabet Σ={1,2,…,|Σ|} text string T∈Σ n and a pattern string P∈Σ m , for each i=1,2,…,nm+1 define L p (i) as the p-norm distance when the pattern is aligned below the text and starts at position i of the text. The problem of pattern matching with L p distance is to compute L p (i) for every i=1,2,…,nm+1. We discuss the problem for d=1,2,∞. First, in the case of L 1 matching (pattern matching with an L 1 distance) we show a reduction of the string matching with mismatches problem to the L 1 matching problem and we present an algorithm that approximates the L 1 matching up to a factor of 1+ε, which has an O(\frac1e2nlogmlog|S|)O(\frac{1}{\varepsilon^{2}}n\log m\log|\Sigma|) run time. Then, the L 2 matching problem (pattern matching with an L 2 distance) is solved with a simple O(nlog m) time algorithm. Finally, we provide an algorithm that approximates the L matching up to a factor of 1+ε with a run time of O(\frac1enlogmlog|S|)O(\frac{1}{\varepsilon}n\log m\log|\Sigma|) . We also generalize the problem of String Matching with mismatches to have weighted mismatches and present an O(nlog 4 m) algorithm that approximates the results of this problem up to a factor of O(log m) in the case that the weight function is a metric.  相似文献   

17.
Sun Wu  Udi Manber 《Algorithmica》1992,8(1):89-101
The notion of matching in graphs is generalized in this paper to a set of paths rather than to a set of edges. The generalized problem, which we call thepath-matching problem, is to pair the vertices of an undirected weighted graph such that the paths connecting each pair are subject to certain objectives and/or constraints. This paper concentrates on the case where the paths are required to be edge-disjoint and the objective is to minimize the maximal cost of a path in the matching (i.e., the bottleneck version). Other variations of the problem are also mentioned. Two algorithms are presented to find the best matching under the constraints listed above for trees. Their worst-case running times areO(n logd logw), whered is the maximal degree of a vertex,w is the maximal cost of an edge, andn is the size of the tree, andO(n 2), respectively. The problem is shown to be NP-complete for general graphs. Applications of these problems are also discussed.Udi Manber was supported in part by an NSF Presidential Young Investigator Award (Grant DCR-8451397), with matching funds from AT&T.  相似文献   

18.
The problem of partitioning a rectilinear figure into rectangles with minimum length is NP-hard and has bounded heuristics. In this paper we study a related problem,Elimination Problem (EP), in which a rectilinear figure is partitioned into a set of rectilinear figures containing no concave vertices of a fixed direction with minimum length. We show that a heuristic for EP within a factor of 4 from optimal can be computed in timeO(n 2), wheren is the number of vertices of the input figure, and a variant of this heuristic, within a factor of 6 from optimal, can be computed in timeO(n logn). As an application, we give a bounded heuristic for the problem of partitioning a rectilinear figure into histograms of a fixed direction with minimum length.An auxiliary result is that an optimal rectangular partition of a monotonic histogram can be computed in timeO(n 2), using a known speed-up technique in dynamic programming.Part of this work was done when the first author was a postdoctor fellow in MSRI, Berkeley, and supported in part by NSF Grant No. 8120790. The second author was supported in part by NSF Grant No. DCR-8411945.  相似文献   

19.
A recent trend in stringology explores the possibility of utilizing text compression to speed up similarity computation between strings. In this line of investigation, run-length encoding is one of the earliest studied compression schemes. Despite its simple coding nature, the only positive result before this work is the computation of the in-del distance (dual of longest common subsequence), which requires O(mnlogmn) time, where m and n denote the number of runs of the input strings. The worst-case time complexity of computing the edit distance between two run-length encoded strings still depends on the uncompressed string lengths. In this paper, we break the foundational gap by providing its first “fully compressed” algorithm whose running time depends solely on the compressed string lengths. Specifically, given two strings, compressed into m and n runs, mn, we present an O(mn 2)-time algorithm for computing the edit distance of the strings. Our approach also yields the first fully compressed solution to approximate matching of a pattern of m runs in a text of n runs in O(mn 2) time.  相似文献   

20.
在EREW PRAM(exclusive-read and exclusive-write parallel random access machine)并行计算模型上,对范围很广的一类无向图的边极大匹配问题,给出时间复杂性为O(logn),使用O((n+m)/logn)处理器的最佳、高速并行算法.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号