首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The length of the longest common subsequence (LCS) between two strings of M and N characters can be computed by an O(M × N) dynamic programming algorithm, that can be executed in O(M + N) steps by a linear systolic array. It has been recently shown that the LCS between run-length-encoded (RLE) strings of m and n runs can be computed by an O(nM + Nm − nm) algorithm that could be executed in O(m + n) steps by a parallel hardware. However, the algorithm cannot be directly mapped on a linear systolic array because of its irregular structure.In this paper, we propose a modified algorithm that exhibits a more regular structure at the cost of a marginal reduction of the efficiency of RLE. We outline the algorithm and we discuss its mapping on a linear systolic array.  相似文献   

2.
We propose a new algorithm for computing the edit distance of an uncompressed string against a run-length-encoded string. For an uncompressed string of length n and a compressed string with M runs, the algorithm computes their edit distance in time O(Mn). This result directly implies an O(min{mN,Mn}) time algorithm for strings of lengths m and n with M and N runs, respectively. It improves the previous best known time bound O(mN+Mn).  相似文献   

3.
New efficient algorithms for the LCS and constrained LCS problems   总被引:1,自引:0,他引:1  
In this paper, we study the classic and well-studied longest common subsequence (LCS) problem and a recent variant of it, namely the constrained LCS (CLCS) problem. In the CLCS problem, the computed LCS must also be a supersequence of a third given string. In this paper, we first present an efficient algorithm for the traditional LCS problem that runs in O(Rloglogn+n) time, where R is the total number of ordered pairs of positions at which the two strings match and n is the length of the two given strings. Then, using this algorithm, we devise an algorithm for the CLCS problem having time complexity O(pRloglogn+n) in the worst case, where p is the length of the third string.  相似文献   

4.
We address the problem of determining all extreme supported solutions of the biobjective shortest path problem. A novel Dijkstra-like method generalizing Dijkstra׳s algorithm to this biobjective case is proposed. The algorithm runs in O(N(m+n log n)) time to solve one-to-one and one-to-all biobjective shortest path problems determining all extreme supported non-dominated points in the outcome space and one supported efficient path associated with each one of them. Here n is the number of nodes, m is the number of arcs and N is the number of extreme supported points in outcome space for the one-to-all biobjective shortest path problem. The memory space required by the algorithm is O(n+m) for the one-to-one problem and O(N+m) for the one-to-all problem. A computational experiment comparing the performance of the proposed methods and state-of-the-art methods is included.  相似文献   

5.
Summary Efficient algorithms for computing the longest common subsequence (LCS for short) are discussed. O(pn) algorithm and O(p(m-p) log n) algorithm [Hirschberg 1977] seem to be best among previously known algorithms, where p is the length of an LCS and m and n are the lengths of given two strings (mn). There are many applications where the expected length of an LCS is close to m.In this paper, O(n(m-p)) algorithm is presented. When p is close to m (in other words, two given strings are similar), the algorithm presented here runs much faster than previously known algorithms.  相似文献   

6.
A classical measure of similarity between strings is the length of the longest common subsequence (LCS) between the two given strings. The search for efficient algorithms for finding the LCS has been going on for more than three decades. To date, all known algorithms may take quadratic time (shaved by logarithmic factors) to find large LCS. In this paper, the problem of approximating LCS is studied, while focusing on the hard inputs for this problem, namely, approximating LCS of near-linear size in strings over a relatively large alphabet (of size at least n? for some constant ?>0, where n is the length of the string). We show that, any given string over a relatively large alphabet can be embedded into a locally non-repetitive string. This embedding has a negligible additive distortion for strings that are not too dissimilar in terms of the edit distance. We also show that LCS can be efficiently approximated in locally-non-repetitive strings. Our new method (the embedding together with the approximation algorithm) gives a strictly sub-quadratic time algorithm (i.e., of complexity O(n2-?) for some constant ?) which can find common subsequences of linear (and near linear) size that cannot be detected efficiently by the existing tools.  相似文献   

7.
Finding the longest common subsequence (LCS) of two given sequences A=a0a1am−1 and B=b0b1bn−1 is an important and well studied problem. We consider its generalization, transposition-invariant LCS (LCTS), which has recently arisen in the field of music information retrieval. In LCTS, we look for the LCS between the sequences A+t=(a0+t)(a1+t)…(am−1+t) and B where t is any integer. We introduce a family of algorithms (motivated by the Hunt-Szymanski scheme for LCS), improving the currently best known complexity from O(mnloglogσ) to O(Dloglogσ+mn), where σ is the alphabet size and D?mn is the total number of dominant matches for all transpositions. Then, we demonstrate experimentally that some of our algorithms outperform the best ones from literature.  相似文献   

8.
Given two strings A and B of lengths na and nb, respectively, the All-substrings Longest Common Subsequence (ALCS) problem obtains, for any substring B' of B, the length of the longest string that is a subsequence of both A and B'. The sequential algorithm for this problem takes O(na nb) time and O(nb) space. We present a parallel algorithm for the ALCS problem on the Coarse-Grained Multicomputer (BSP/CGM) model with p < √na processors, that takes O(na nb/p) time, O(log p) communication rounds and O(nb √na) space per processor. The proposed algorithm also solves the basic Longest Common Subsequence (LCS) problem that finds the longest string (and not only its length) that is a subsequence of both A and B. To our knowledge, this is the best BSP/CGM algorithm in the literature for the LCS and ALCS problems.  相似文献   

9.
This paper re-examines, in a unified framework, two classic approaches to the problem of finding a longest common subsequence (LCS) of two strings, and proposes faster implementations for both. Letl be the length of an LCS between two strings of lengthm andnm, respectively, and let s be the alphabet size. The first revised strategy follows the paradigm of a previousO(ln) time algorithm by Hirschberg. The new version can be implemented in timeO(lm · min logs, logm, log(2n/m)), which is profitable when the input strings differ considerably in size (a looser bound for both versions isO(mn)). The second strategy improves on the Hunt-Szymanski algorithm. This latter takes timeO((r +n) logn), wherermn is the total number of matches between the two input strings. Such a performance is quite good (O(n logn)) whenrn, but it degrades to Θ(mn logn) in the worst case. On the other hand the variation presented here is never worse than linear-time in the productmn. The exact time bound derived for this second algorithm isO(m logn +d log(2mn/d)), wheredr is the number ofdominant matches (elsewhere referred to asminimal candidates) between the two strings. Both algorithms require anO(n logs) preprocessing that is nearly standard for the LCS problem, and they make use of simple and handy auxiliary data structures.  相似文献   

10.
A New Efficient Algorithm for Computing the Longest Common Subsequence   总被引:1,自引:0,他引:1  
The Longest Common Subsequence (LCS) problem is a classic and well-studied problem in computer science. The LCS problem is a common task in DNA sequence analysis with many applications to genetics and molecular biology. In this paper, we present a new and efficient algorithm for solving the LCS problem for two strings. Our algorithm runs in O(ℛlog log n+n) time, where ℛ is the total number of ordered pairs of positions at which the two strings match. Preliminary version appeared in [24]. C.S. Iliopoulos is supported by EPSRC and Royal Society grants. M.S. Rahman is supported by the Commonwealth Scholarship Commission in the UK under the Commonwealth Scholarship and Fellowship Plan (CSFP) and is on Leave from Department of CSE, BUET, Dhaka-1000, Bangladesh.  相似文献   

11.
M?kinen  Ukkonen  Navarro 《Algorithmica》2008,35(4):347-369
Abstract. We focus on the problem of approximate matching of strings that have been compressed using run-length encoding. Previous studies have concentrated on the problem of computing the longest common subsequence (LCS) between two strings of length m and n , compressed to m' and n' runs. We extend an existing algorithm for the LCS to the Levenshtein distance achieving O(m'n+n'm) complexity. Furthermore, we extend this algorithm to a weighted edit distance model, where the weights of the three basic edit operations can be chosen arbitrarily. This approach also gives an algorithm for approximate searching of a pattern of m letters (m' runs) in a text of n letters (n' runs) in O(mm'n') time. Then we propose improvements for a greedy algorithm for the LCS, and conjecture that the improved algorithm has O(m'n') expected case complexity. Experimental results are provided to support the conjecture.  相似文献   

12.
We address the problem of determining a complete set of extreme supported efficient solutions of biobjective minimum cost flow (BMCF) problems. A novel method improving the classical parametric method for this biobjective problem is proposed. The algorithm runs in O(Nn(m + nlogn)) time determining all extreme supported non-dominated points in the outcome space and one extreme supported efficient solution associated with each one of them. Here n is the number of nodes, m is the number of arcs and N is the number of extreme supported non-dominated points in outcome space for the BMCF problem. The memory space required by the algorithm is O(n + m) when the extreme supported efficient solutions are not required to be stored in RAM. Otherwise, the algorithm requires O(N + m) space. Extensive computational experiments comparing the performance of the proposed method and a standard parametric network simplex method are presented.  相似文献   

13.
The longest common subsequence and sequence alignment problems have been studied extensively and they can be regarded as the relationship measurement between sequences. However, most of them treat sequences evenly or consider only two sequences. Recently, with the rise of whole-genome duplication research, the doubly conserved synteny relationship among three sequences should be considered. It is a brand new model to find a merging way for understanding the interleaving relationship of sequences. Here, we define the merged LCS problem for measuring the interleaving relationship among three sequences. An O(n3) algorithm is first proposed for solving the problem, where n is the sequence length. We further discuss the variant version of this problem with the block information. For the blocked merged LCS problem, we propose an algorithm with time complexity O(n2m), where m is the number of blocks. An improved O(n2+nm2) algorithm is further proposed for the same blocked problem.  相似文献   

14.
The “Common Substring Alignment” problem is defined as follows. The input consists of a set of strings S1,S2…,Sc, with a common substring appearing at least once in each of them, and a target string T. The goal is to compute similarity of all strings Si with T, without computing the part of the common substring over and over again.In this paper we consider the Common Substring Alignment problem for the LCS (Longest Common Subsequence) similarity metric. Our algorithm gains its efficiency by exploiting the sparsity inherent to the LCS problem. Let Y be the common substring, n be the size of the compared sequences, Ly be the length of the LCS of T and Y, denoted |LCS[T,Y]|, and L be max{|LCS[T,Si]|}. Our algorithm consists of an O(nLy) time encoding stage that is executed once per common substring, and an O(L) time alignment stage that is executed once for each appearance of the common substring in each source string. The additional running time depends only on the length of the parts of the strings that are not in any common substring.  相似文献   

15.
The connected vertex cover problem is a variant of the vertex cover problem, in which a vertex cover is additional required to induce a connected subgraph in a given connected graph. The problem is known to be NP-hard and to be at least as hard to approximate as the vertex cover problem is. While several 2-approximation NC algorithms are known for vertex cover, whether unweighted or weighted, no parallel algorithm with guaranteed approximation is known for connected vertex cover. Moreover, converting the existing sequential 2-approximation algorithms for connected vertex cover to parallel ones results in RNC algorithms of rather high complexity at best.In this paper we present a 2-approximation NC (and RNC) algorithm for connected vertex cover (and tree cover). The NC algorithm runs in O(log2n) time using O(Δ2(m+n)/logn) processors on an EREW-PRAM, while the RNC algorithm runs in O(logn) expected time using O(m+n) processors on a CRCW-PRAM, when a given graph has n vertices and m edges with maximum vertex degree of Δ.  相似文献   

16.
A recent trend in stringology explores the possibility of utilizing text compression to speed up similarity computation between strings. In this line of investigation, run-length encoding is one of the earliest studied compression schemes. Despite its simple coding nature, the only positive result before this work is the computation of the in-del distance (dual of longest common subsequence), which requires O(mnlogmn) time, where m and n denote the number of runs of the input strings. The worst-case time complexity of computing the edit distance between two run-length encoded strings still depends on the uncompressed string lengths. In this paper, we break the foundational gap by providing its first “fully compressed” algorithm whose running time depends solely on the compressed string lengths. Specifically, given two strings, compressed into m and n runs, mn, we present an O(mn 2)-time algorithm for computing the edit distance of the strings. Our approach also yields the first fully compressed solution to approximate matching of a pattern of m runs in a text of n runs in O(mn 2) time.  相似文献   

17.
The longest common subsequence problem (LCS) and the closest substring problem (CSP) are two models for finding common patterns in strings, and have been studied extensively. Though both LCS and CSP are NP-Hard, they exhibit very different behavior with respect to polynomial time approximation algorithms. While LCS is hard to approximate within n δ for some δ>0, CSP admits a polynomial time approximation scheme. In this paper, we study the longest common rigid subsequence problem (LCRS). This problem shares similarity with both LCS and CSP and has an important application in motif finding in biological sequences. We show that it is NP-hard to approximate LCRS within ratio n δ , for some constant δ>0, where n is the maximum string length. We also show that it is NP-Hard to approximate LCRS within ratio Ω(m), where m is the number of strings.  相似文献   

18.
We consider the problem of designing an efficient index for approximate pattern matching. Despite ongoing efforts, this topic is still a challenge in combinatorial pattern matching. We present a new data structure that allows to report all matches in worst-case time O(|Σ|m+occ), which is linear in the pattern length m and the number of occurrences occ for alphabets of constant size |Σ|. Our index uses O(n|Σ|logn) extra space on average and with high probability, where n is the length of the text (indexing) or the number of strings to index (dictionary lookup).  相似文献   

19.
The longest common subsequence problem revisited   总被引:2,自引:0,他引:2  
This paper re-examines, in a unified framework, two classic approaches to the problem of finding a longest common subsequence (LCS) of two strings, and proposes faster implementations for both. Letl be the length of an LCS between two strings of lengthm andn m, respectively, and let s be the alphabet size. The first revised strategy follows the paradigm of a previousO(ln) time algorithm by Hirschberg. The new version can be implemented in timeO(lm · min logs, logm, log(2n/m)), which is profitable when the input strings differ considerably in size (a looser bound for both versions isO(mn)). The second strategy improves on the Hunt-Szymanski algorithm. This latter takes timeO((r +n) logn), wherermn is the total number of matches between the two input strings. Such a performance is quite good (O(n logn)) whenrn, but it degrades to (mn logn) in the worst case. On the other hand the variation presented here is never worse than linear-time in the productmn. The exact time bound derived for this second algorithm isO(m logn +d log(2mn/d)), whered r is the number ofdominant matches (elsewhere referred to asminimal candidates) between the two strings. Both algorithms require anO(n logs) preprocessing that is nearly standard for the LCS problem, and they make use of simple and handy auxiliary data structures.  相似文献   

20.
Summary The LCS problem is to determine the longest common subsequence (LCS) of two strings. A new linear-space algorithm to solve the LCS problem is presented. The only other algorithm with linear-space complexity is by Hirschberg and has runtime complexity O(mn). Our algorithm, based on the divide and conquer technique, has runtime complexity O(n(m-p)), where p is the length of the LCS.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号