首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Let X and Y be two strings of lengths n and m, respectively, and k and l, respectively, be the numbers of runs in their corresponding run-length encoded forms. We propose a simple algorithm for computing the longest common subsequence of two given strings X and Y in O(kl+min{p1,p2}) time, where p1 and p2 denote the numbers of elements in the bottom and right boundaries of the matched blocks, respectively. It improves the previously known time bound O(min{nl,km}) and outperforms the time bounds O(kllogkl) or O((k+l+q)log(k+l+q)) for some cases, where q denotes the number of matched blocks.  相似文献   

2.
Let A=〈a1,a2,…,am〉 and B=〈b1,b2,…,bn〉 be two sequences, where each pair of elements in the sequences is comparable. A common increasing subsequence of A and B is a subsequence 〈ai1=bj1,ai2=bj2,…,ail=bjl〉, where i1<i2<?<il and j1<j2<?<jl, such that for all 1?k<l, we have aik<aik+1. A longest common increasing subsequence of A and B is a common increasing subsequence of the maximum length. This paper presents an algorithm for delivering a longest common increasing subsequence in O(mn) time and O(mn) space.  相似文献   

3.
The Longest Common Subsequence problem seeks a longest subsequence of every member of a given set of strings. It has applications, among others, in data compression, FPGA circuit minimization, and bioinformatics. The problem is NP-hard for more than two input strings, and the existing exact solutions are impractical for large input sizes. Therefore, several approximation and (meta) heuristic algorithms have been proposed which aim at finding good, but not necessarily optimal, solutions to the problem. In this paper, we propose a new algorithm based on the constructive beam search method. We have devised a novel heuristic, inspired by the probability theory, intended for domains where the input strings are assumed to be independent. Special data structures and dynamic programming methods are developed to reduce the time complexity of the algorithm. The proposed algorithm is compared with the state-of-the-art over several standard benchmarks including random and real biological sequences. Extensive experimental results show that the proposed algorithm outperforms the state-of-the-art by giving higher quality solutions with less computation time for most of the experimental cases.  相似文献   

4.
刘维  陈崚 《计算机应用》2006,26(6):1422-1424
求生物序列的最长公共子串是生物信息学中最重要的问题之一,提出了该问题的一个快速算法,可对所有初始同字符对并行地寻找其后继同字符对,并记录下相应层次值。最后通过最大层次值回溯得到比对结果。此外,该算法采用了剪枝技术,对于明显不能得出最优比对的同字符将中止其后继的搜索。实验结果证明,本文算法比其他算法速度快、精确度高。  相似文献   

5.
We introduce a linear systolic array for the Longest Common Subsequence (LCS, for short) problem. We first present an array of m identical cells which computes the length of an LCS of two strings of length m and n, respectively, in linear time (i.e., in time proportional to m + n). Then we show that, by extending any cell with the systolic stack introduced by Guibas and Liang (1982), a new array can be designed to recover an LCS in linear time.  相似文献   

6.
The longest common subsequence problem revisited   总被引:2,自引:0,他引:2  
This paper re-examines, in a unified framework, two classic approaches to the problem of finding a longest common subsequence (LCS) of two strings, and proposes faster implementations for both. Letl be the length of an LCS between two strings of lengthm andn m, respectively, and let s be the alphabet size. The first revised strategy follows the paradigm of a previousO(ln) time algorithm by Hirschberg. The new version can be implemented in timeO(lm · min logs, logm, log(2n/m)), which is profitable when the input strings differ considerably in size (a looser bound for both versions isO(mn)). The second strategy improves on the Hunt-Szymanski algorithm. This latter takes timeO((r +n) logn), wherermn is the total number of matches between the two input strings. Such a performance is quite good (O(n logn)) whenrn, but it degrades to (mn logn) in the worst case. On the other hand the variation presented here is never worse than linear-time in the productmn. The exact time bound derived for this second algorithm isO(m logn +d log(2mn/d)), whered r is the number ofdominant matches (elsewhere referred to asminimal candidates) between the two strings. Both algorithms require anO(n logs) preprocessing that is nearly standard for the LCS problem, and they make use of simple and handy auxiliary data structures.  相似文献   

7.
This paper re-examines, in a unified framework, two classic approaches to the problem of finding a longest common subsequence (LCS) of two strings, and proposes faster implementations for both. Letl be the length of an LCS between two strings of lengthm andnm, respectively, and let s be the alphabet size. The first revised strategy follows the paradigm of a previousO(ln) time algorithm by Hirschberg. The new version can be implemented in timeO(lm · min logs, logm, log(2n/m)), which is profitable when the input strings differ considerably in size (a looser bound for both versions isO(mn)). The second strategy improves on the Hunt-Szymanski algorithm. This latter takes timeO((r +n) logn), wherermn is the total number of matches between the two input strings. Such a performance is quite good (O(n logn)) whenrn, but it degrades to Θ(mn logn) in the worst case. On the other hand the variation presented here is never worse than linear-time in the productmn. The exact time bound derived for this second algorithm isO(m logn +d log(2mn/d)), wheredr is the number ofdominant matches (elsewhere referred to asminimal candidates) between the two strings. Both algorithms require anO(n logs) preprocessing that is nearly standard for the LCS problem, and they make use of simple and handy auxiliary data structures.  相似文献   

8.
The well-known problem of the longest common subsequence (LCS), of two strings of lengths nn and mm respectively, is O(nm)O(nm)-time solvable and is a classical distance measure for strings. Another well-studied string comparison measure is that of parameterized matching, where two equal-length strings are a parameterized match if there exists a bijection on the alphabets such that one string matches the other under the bijection. All works associated with parameterized pattern matching present polynomial time algorithms.  相似文献   

9.
The constrained longest common subsequence problem   总被引:1,自引:0,他引:1  
This paper considers a constrained version of longest common subsequence problem for two strings. Given strings S1, S2 and P, the constrained longest common subsequence problem for S1 and S2 with respect to P is to find a longest common subsequence lcs of S1 and S2 such that P is a subsequence of this lcs. An O(rn2m2) time algorithm based upon the dynamic programming technique is proposed for this new problem, where n, m and r are lengths of S1, S2 and P, respectively.  相似文献   

10.
We consider a variant of the classical Longest Common Subsequence problem called Doubly-Constrained Longest Common Subsequence (DC-LCS). Given two strings s1 and s2 over an alphabet Σ, a set Cs of strings, and a function Co:ΣN, the DC-LCS problem consists of finding the longest subsequence s of s1 and s2 such that s is a supersequence of all the strings in Cs and such that the number of occurrences in s of each symbol σΣ is upper bounded by Co(σ). The DC-LCS problem provides a clear mathematical formulation of a sequence comparison problem in Computational Biology and generalizes two other constrained variants of the LCS problem that have been introduced previously in the literature: the Constrained LCS and the Repetition-Free LCS. We present two results for the DC-LCS problem. First, we illustrate a fixed-parameter algorithm where the parameter is the length of the solution which is also applicable to the more specialized problems. Second, we prove a parameterized hardness result for the Constrained LCS problem when the parameter is the number of the constraint strings (|Cs|) and the size of the alphabet Σ. This hardness result also implies the parameterized hardness of the DC-LCS problem (with the same parameters) and its NP-hardness when the size of the alphabet is constant.  相似文献   

11.
Among the algorithms set up to date for finding the longest common subsequence of two strings, the one by Hunt and Szymanski exhibits the best known performance in favorable cases, but can be worse than any straightforward algorithm for a large variety of inputs. The new algorithm presented here pursues a schedule of primitive operations quite close to the one inherent to the Hunt-Szymanski strategy, but with substantially enhanced efficiency. In fact, the new algorithm improves on the former in two important respects. First, its worst case is never worse than linear in the product nm of the lengths of the two input strings. Second, its time bound does not always grow with the cardinality r of the set R of all pairs of matching positions of the input strings. Rather, it depends on the cardinality d of a specific subset of R, whose elements are called here dominant matches, and are elsewhere referred to as minimal candidates. This second improvement also appears of significance, since it seems that whenever r gets too close to mn, this forces d to be linear in m. The new algorithm requires standard preprocessing, and makes use of finger-trees. In a forthcoming paper, it will be shown among other things that the same performance can be achieved with simpler and handier auxiliary data structures.  相似文献   

12.
A subsequence of a given string is any string obtained by deleting none or some symbols from the given string. A longest common subsequence (LCS) of two strings is a common subsequence of both that is as long as any other common subsequences. The problem is to find the LCS of two given strings. The bound on the complexity of this problem under the decision tree model is known to be mn if the number of distinct symbols that can appear in strings is infinite, where m and n are the lengths of the two strings, respectively, and m⩽n. In this paper, we propose two parallel algorithms far this problem on the CREW-PRAM model. One takes O(log2 m + log n) time with mn/log m processors, which is faster than all the existing algorithms on the same model. The other takes O(log2 m log log m) time with mn/(log2 m log log m) processors when log2 m log log m > log n, or otherwise O(log n) time with mn/log n processors, which is optimal in the sense that the time×processors bound matches the complexity bound of the problem. Both algorithms exploit nice properties of the LCS problem that are discovered in this paper  相似文献   

13.
The longest common subsequence problem is a classical string problem that concerns finding the common part of a set of strings. It has several important applications, for example, pattern recognition or computational biology. Most research efforts up to now have focused on solving this problem optimally. In comparison, only few works exist dealing with heuristic approaches. In this work we present a deterministic beam search algorithm. The results show that our algorithm outperforms the current state-of-the-art approaches not only in solution quality but often also in computation time.  相似文献   

14.
There are two general approaches to the longest common subsequence problem. The dynamic programming approach takes quadratic time but linear space, while the nondynamic-programming approach takes less time but more space. We propose a new implementation of the latter approach which seems to get the best for both time and space for the DNA application.  相似文献   

15.
16.
Given a finite set of strings X, the Longest Common Subsequence problem (LCS) consists in finding a subsequence common to all strings in X that is of maximal length. LCS is a central problem in stringology and finds broad applications in text compression, conception of error-detecting codes, or biological sequence comparison. However, in numerous contexts, words represent cyclic or unoriented sequences of symbols and LCS must be generalized to consider both orientations and/or all cyclic shifts of the strings involved. This occurs especially in computational biology when genetic material is sequenced from circular DNA or RNA molecules.In this work, we define three variants of LCS when the input words are unoriented and/or cyclic. We show that these problems are -hard, and -hard if parameterized in the number of input strings. These results still hold even if the three LCS variants are restricted to input languages over a binary alphabet. We also settle the parameterized complexity of our problems for most relevant parameters. Moreover, we study the approximability of these problems: we discuss the existence of approximation bounds depending on the cardinality of the alphabet, on the length of the shortest sequence, and on the number of input sequences. For this we prove that Maximum Independent Set in r-uniform hypergraphs is -hard if parameterized in the cardinality of the sought independent set and at least as hard to approximate as Maximum Independent Set in graphs.  相似文献   

17.
18.
郑子君 《计算机应用研究》2020,37(11):3334-3337,3358
最长循环公共子序列(LCCS)是两个字符串在所有可能的循环移位操作下能得到的最长公共子序列(LCS)。针对穷举移位量求解LCCS效率过低的问题,设法对候选移位量进行筛选。通过证明循环移位操作对两字符串间LCS长度增量影响的上下限,得到最优移位量的必要条件,从而减小了求解LCCS的枚举量;在此基础上,建立了求解LCCS的迭代方法,只经过少数几次迭代便可消除绝大部分无效候选移位量;此外,还提出一个可在◢O(mn)◣时间复杂度下快速估算LCCS长度的近似算法。大量随机模拟表明,当两字符串间的相似度明显高于随机字符串的相似度时,提出的两种算法表现良好。  相似文献   

19.
Data compression can be used to simultaneously reduce memory, communication and computation requirements of string comparison. In this paper we address the problem of computing the length of the longest common subsequence (LCS) between run-length-encoded (RLE) strings. We exploit RLE both to reduce the complexity of LCS computation from O(M×N) to O(mN+Mnmn), where M and N are the lengths of the original strings and m and n the number of runs in their RLE representation, and to improve the inherent parallelism of the proposed algorithm, so that it may execute in O(m+n) steps on a systolic array of M+N units.We also discuss the application of the proposed algorithm to the related problem of edit distance (ED) computation.  相似文献   

20.
Finding the longest common subsequence in k-length substrings (LCSk) is a recently proposed problem motivated by computational biology. This is a generalization of the well-known LCS problem in which matching symbols from two sequences A and B are replaced with matching non-overlapping substrings of length k from A and B. We propose several algorithms for LCSk, being non-trivial incarnations of the major concepts known from LCS research (dynamic programming, sparse dynamic programming, tabulation). Our algorithms make use of a linear-time and linear-space preprocessing finding the occurrences of all the substrings of length k from one sequence in the other sequence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号