首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A longest common subsequence (LCS) of two strings is a common subsequence of two strings of maximal length. The LCS problem is to find an LCS of two given strings and the length of the LCS (LLCS). In this paper, we present a new linear processor array for solving the LCS problem. The array is based on parallelization of a recent LCS algorithm which consists of two phases, i.e. preprocessing and computation. The computation phase is based on bit-level dynamic programming approach. Implementations of the preprocessing and computation phases are discussed on the same processor array architecture for the LCS problem. Further, we propose a block processor array architecture which reduces the overall communication and time requirements. Finally, we develop a performance model for estimating the performance of the processor array architecture on Pentium processors.  相似文献   

2.
A subsequence of a given string is any string obtained by deleting none or some symbols from the given string. A longest common subsequence (LCS) of two strings is a common subsequence of both that is as long as any other common subsequences. The problem is to find the LCS of two given strings. The bound on the complexity of this problem under the decision tree model is known to be mn if the number of distinct symbols that can appear in strings is infinite, where m and n are the lengths of the two strings, respectively, and m⩽n. In this paper, we propose two parallel algorithms far this problem on the CREW-PRAM model. One takes O(log2 m + log n) time with mn/log m processors, which is faster than all the existing algorithms on the same model. The other takes O(log2 m log log m) time with mn/(log2 m log log m) processors when log2 m log log m > log n, or otherwise O(log n) time with mn/log n processors, which is optimal in the sense that the time×processors bound matches the complexity bound of the problem. Both algorithms exploit nice properties of the LCS problem that are discovered in this paper  相似文献   

3.
Abstract

We design linear time systolic-based parallel algorithms that run on two-dimensional arrays for both computing the length and recovering a longest common subsequence of three given sequences that are appropriate for very large-scale integration (VLSI) implementation. These problems have been qualified to be difficult to be solved in linear time in [14], and our approach, which generalizes the methods used for determining a longest common subsequence of two strings [28,38] to the case of three strings, enables to solve both problems in linear time. Given the three sequences A, B and C of length n, m and l (n ≤ m ≤ l;), we provide an algorithm that computes the length p of their longest common subsequence on a modular I/O bounded one-way two-dimensional array of nm processors in n + 2m + l? 1 time-steps. To compute a longest common subsequence of the three given strings, we show that each processor of the above array requires an O(min{n,p\) local storage to solve the problem in In + 1m + 1 + p — 2 time-steps.  相似文献   

4.
Mäkinen  Ukkonen  Navarro 《Algorithmica》2003,35(4):347-369
We focus on the problem of approximate matching of strings that have been compressed using run-length encoding. Previous studies have concentrated on the problem of computing the longest common subsequence (LCS) between two strings of length m and n , compressed to m' and n' runs. We extend an existing algorithm for the LCS to the Levenshtein distance achieving O(m'n+n'm) complexity. Furthermore, we extend this algorithm to a weighted edit distance model, where the weights of the three basic edit operations can be chosen arbitrarily. This approach also gives an algorithm for approximate searching of a pattern of m letters (m' runs) in a text of n letters (n' runs) in O(mm'n') time. Then we propose improvements for a greedy algorithm for the LCS, and conjecture that the improved algorithm has O(m'n') expected case complexity. Experimental results are provided to support the conjecture.  相似文献   

5.
Summary The LCS problem is to determine the longest common subsequence (LCS) of two strings. A new linear-space algorithm to solve the LCS problem is presented. The only other algorithm with linear-space complexity is by Hirschberg and has runtime complexity O(mn). Our algorithm, based on the divide and conquer technique, has runtime complexity O(n(m-p)), where p is the length of the LCS.  相似文献   

6.
We introduce a linear systolic array for the Longest Common Subsequence (LCS, for short) problem. We first present an array of m identical cells which computes the length of an LCS of two strings of length m and n, respectively, in linear time (i.e., in time proportional to m + n). Then we show that, by extending any cell with the systolic stack introduced by Guibas and Liang (1982), a new array can be designed to recover an LCS in linear time.  相似文献   

7.
New efficient algorithms for the LCS and constrained LCS problems   总被引:1,自引:0,他引:1  
In this paper, we study the classic and well-studied longest common subsequence (LCS) problem and a recent variant of it, namely the constrained LCS (CLCS) problem. In the CLCS problem, the computed LCS must also be a supersequence of a third given string. In this paper, we first present an efficient algorithm for the traditional LCS problem that runs in O(Rloglogn+n) time, where R is the total number of ordered pairs of positions at which the two strings match and n is the length of the two given strings. Then, using this algorithm, we devise an algorithm for the CLCS problem having time complexity O(pRloglogn+n) in the worst case, where p is the length of the third string.  相似文献   

8.
Summary Efficient algorithms for computing the longest common subsequence (LCS for short) are discussed. O(pn) algorithm and O(p(m-p) log n) algorithm [Hirschberg 1977] seem to be best among previously known algorithms, where p is the length of an LCS and m and n are the lengths of given two strings (mn). There are many applications where the expected length of an LCS is close to m.In this paper, O(n(m-p)) algorithm is presented. When p is close to m (in other words, two given strings are similar), the algorithm presented here runs much faster than previously known algorithms.  相似文献   

9.
The length of the longest common subsequence (LCS) between two strings of M and N characters can be computed by an O(M × N) dynamic programming algorithm, that can be executed in O(M + N) steps by a linear systolic array. It has been recently shown that the LCS between run-length-encoded (RLE) strings of m and n runs can be computed by an O(nM + Nm − nm) algorithm that could be executed in O(m + n) steps by a parallel hardware. However, the algorithm cannot be directly mapped on a linear systolic array because of its irregular structure.In this paper, we propose a modified algorithm that exhibits a more regular structure at the cost of a marginal reduction of the efficiency of RLE. We outline the algorithm and we discuss its mapping on a linear systolic array.  相似文献   

10.
A classical measure of similarity between strings is the length of the longest common subsequence (LCS) between the two given strings. The search for efficient algorithms for finding the LCS has been going on for more than three decades. To date, all known algorithms may take quadratic time (shaved by logarithmic factors) to find large LCS. In this paper, the problem of approximating LCS is studied, while focusing on the hard inputs for this problem, namely, approximating LCS of near-linear size in strings over a relatively large alphabet (of size at least n? for some constant ?>0, where n is the length of the string). We show that, any given string over a relatively large alphabet can be embedded into a locally non-repetitive string. This embedding has a negligible additive distortion for strings that are not too dissimilar in terms of the edit distance. We also show that LCS can be efficiently approximated in locally-non-repetitive strings. Our new method (the embedding together with the approximation algorithm) gives a strictly sub-quadratic time algorithm (i.e., of complexity O(n2-?) for some constant ?) which can find common subsequences of linear (and near linear) size that cannot be detected efficiently by the existing tools.  相似文献   

11.
提出2种针对3条源序列的近似LCS算法,近似因子均为1/|?|。其中,线性近似LCS算法的时空复杂度均为 , 为最长源序列的长度,适于解决大规模问题。递归近似LCS算法时空复杂度均为O(nlogn),适于要求高精度问题。同时,这2种算法都能用于解决多序列的LCS和CLCS问题。实验验证了这2种算法的有效性。  相似文献   

12.
Longest common subsequence is a widely used measure to compare strings, in particular in computational biology. Recently, several variants of the longest common subsequence have been introduced to tackle the comparison of genomes. In particular, the Repetition Free Longest Common Subsequence (RFLCS) problem is a variant of the LCS problem that asks for a longest common subsequence of two input strings with no repetition of symbols. In this paper, we investigate the parameterized complexity of RFLCS. First, we show that the problem does not admit a polynomial kernel. Then, we present a randomized FPT algorithm for the RFLCS problem, improving the time complexity of the existent FPT algorithm.  相似文献   

13.
Data compression can be used to simultaneously reduce memory, communication and computation requirements of string comparison. In this paper we address the problem of computing the length of the longest common subsequence (LCS) between run-length-encoded (RLE) strings. We exploit RLE both to reduce the complexity of LCS computation from O(M×N) to O(mN+Mnmn), where M and N are the lengths of the original strings and m and n the number of runs in their RLE representation, and to improve the inherent parallelism of the proposed algorithm, so that it may execute in O(m+n) steps on a systolic array of M+N units.We also discuss the application of the proposed algorithm to the related problem of edit distance (ED) computation.  相似文献   

14.
The longest common subsequence problem (LCS) and the closest substring problem (CSP) are two models for finding common patterns in strings, and have been studied extensively. Though both LCS and CSP are NP-Hard, they exhibit very different behavior with respect to polynomial time approximation algorithms. While LCS is hard to approximate within n δ for some δ>0, CSP admits a polynomial time approximation scheme. In this paper, we study the longest common rigid subsequence problem (LCRS). This problem shares similarity with both LCS and CSP and has an important application in motif finding in biological sequences. We show that it is NP-hard to approximate LCRS within ratio n δ , for some constant δ>0, where n is the maximum string length. We also show that it is NP-Hard to approximate LCRS within ratio Ω(m), where m is the number of strings.  相似文献   

15.
M?kinen  Ukkonen  Navarro 《Algorithmica》2008,35(4):347-369
Abstract. We focus on the problem of approximate matching of strings that have been compressed using run-length encoding. Previous studies have concentrated on the problem of computing the longest common subsequence (LCS) between two strings of length m and n , compressed to m' and n' runs. We extend an existing algorithm for the LCS to the Levenshtein distance achieving O(m'n+n'm) complexity. Furthermore, we extend this algorithm to a weighted edit distance model, where the weights of the three basic edit operations can be chosen arbitrarily. This approach also gives an algorithm for approximate searching of a pattern of m letters (m' runs) in a text of n letters (n' runs) in O(mm'n') time. Then we propose improvements for a greedy algorithm for the LCS, and conjecture that the improved algorithm has O(m'n') expected case complexity. Experimental results are provided to support the conjecture.  相似文献   

16.
This paper presents a new practical bit-vector algorithm for solving the well-known Longest Common Subsequence (LCS) problem. Given two strings of length m and n, nm, we present an algorithm which determines the length p of an LCS in O(nm/w) time and O(m/w) space, where w is the number of bits in a machine word. This algorithm can be thought of as column-wise “parallelization” of the classical dynamic programming approach. Our algorithm is very efficient in practice, where computing the length of an LCS of two strings can be done in linear time and constant (additional/working) space by assuming that mw.  相似文献   

17.
We show that the fixed alphabet shortest common supersequence (SCS) and the fixed alphabet longest common subsequence (LCS) problems parameterized in the number of strings are W[1]-hard. Unless W[1]=FPT, this rules out the existence of algorithms with time complexity of O(f(k)nα) for those problems. Here n is the size of the problem instance, α is constant, k is the number of strings and f is any function of k. The fixed alphabet version of the LCS problem is of particular interest considering the importance of sequence comparison (e.g. multiple sequence alignment) in the fixed length alphabet world of DNA and protein sequences.  相似文献   

18.
The well-known problem of the longest common subsequence (LCS), of two strings of lengths nn and mm respectively, is O(nm)O(nm)-time solvable and is a classical distance measure for strings. Another well-studied string comparison measure is that of parameterized matching, where two equal-length strings are a parameterized match if there exists a bijection on the alphabets such that one string matches the other under the bijection. All works associated with parameterized pattern matching present polynomial time algorithms.  相似文献   

19.
In this paper, we consider a generalized longest common subsequence problem, the string-excluding constrained LCS problem. For the two input sequences X and Y of lengths n and m, and a constraint string P of length r, the problem is to find a common subsequence Z of X and Y excluding P as a substring and the length of Z is maximized. The problem and its solution were first proposed by Chen and Chao (2011) [1], but we found that their algorithm cannot solve the problem correctly. A new dynamic programming solution for the STR-EC-LCS problem is then presented in this paper. The correctness of the new algorithm is proved. The time complexity of the new algorithm is O(nmr).  相似文献   

20.
The longest common subsequence problem revisited   总被引:2,自引:0,他引:2  
This paper re-examines, in a unified framework, two classic approaches to the problem of finding a longest common subsequence (LCS) of two strings, and proposes faster implementations for both. Letl be the length of an LCS between two strings of lengthm andn m, respectively, and let s be the alphabet size. The first revised strategy follows the paradigm of a previousO(ln) time algorithm by Hirschberg. The new version can be implemented in timeO(lm · min logs, logm, log(2n/m)), which is profitable when the input strings differ considerably in size (a looser bound for both versions isO(mn)). The second strategy improves on the Hunt-Szymanski algorithm. This latter takes timeO((r +n) logn), wherermn is the total number of matches between the two input strings. Such a performance is quite good (O(n logn)) whenrn, but it degrades to (mn logn) in the worst case. On the other hand the variation presented here is never worse than linear-time in the productmn. The exact time bound derived for this second algorithm isO(m logn +d log(2mn/d)), whered r is the number ofdominant matches (elsewhere referred to asminimal candidates) between the two strings. Both algorithms require anO(n logs) preprocessing that is nearly standard for the LCS problem, and they make use of simple and handy auxiliary data structures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号