首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we consider a generalized longest common subsequence problem, the string-excluding constrained LCS problem. For the two input sequences X and Y of lengths n and m, and a constraint string P of length r, the problem is to find a common subsequence Z of X and Y excluding P as a substring and the length of Z is maximized. The problem and its solution were first proposed by Chen and Chao (2011) [1], but we found that their algorithm cannot solve the problem correctly. A new dynamic programming solution for the STR-EC-LCS problem is then presented in this paper. The correctness of the new algorithm is proved. The time complexity of the new algorithm is O(nmr).  相似文献   

2.
Let X and Y be two strings of lengths n and m, respectively, and k and l, respectively, be the numbers of runs in their corresponding run-length encoded forms. We propose a simple algorithm for computing the longest common subsequence of two given strings X and Y in O(kl+min{p1,p2}) time, where p1 and p2 denote the numbers of elements in the bottom and right boundaries of the matched blocks, respectively. It improves the previously known time bound O(min{nl,km}) and outperforms the time bounds O(kllogkl) or O((k+l+q)log(k+l+q)) for some cases, where q denotes the number of matched blocks.  相似文献   

3.
A New Efficient Algorithm for Computing the Longest Common Subsequence   总被引:1,自引:0,他引:1  
The Longest Common Subsequence (LCS) problem is a classic and well-studied problem in computer science. The LCS problem is a common task in DNA sequence analysis with many applications to genetics and molecular biology. In this paper, we present a new and efficient algorithm for solving the LCS problem for two strings. Our algorithm runs in O(ℛlog log n+n) time, where ℛ is the total number of ordered pairs of positions at which the two strings match. Preliminary version appeared in [24]. C.S. Iliopoulos is supported by EPSRC and Royal Society grants. M.S. Rahman is supported by the Commonwealth Scholarship Commission in the UK under the Commonwealth Scholarship and Fellowship Plan (CSFP) and is on Leave from Department of CSE, BUET, Dhaka-1000, Bangladesh.  相似文献   

4.
New efficient algorithms for the LCS and constrained LCS problems   总被引:1,自引:0,他引:1  
In this paper, we study the classic and well-studied longest common subsequence (LCS) problem and a recent variant of it, namely the constrained LCS (CLCS) problem. In the CLCS problem, the computed LCS must also be a supersequence of a third given string. In this paper, we first present an efficient algorithm for the traditional LCS problem that runs in O(Rloglogn+n) time, where R is the total number of ordered pairs of positions at which the two strings match and n is the length of the two given strings. Then, using this algorithm, we devise an algorithm for the CLCS problem having time complexity O(pRloglogn+n) in the worst case, where p is the length of the third string.  相似文献   

5.
The longest common subsequence problem revisited   总被引:2,自引:0,他引:2  
This paper re-examines, in a unified framework, two classic approaches to the problem of finding a longest common subsequence (LCS) of two strings, and proposes faster implementations for both. Letl be the length of an LCS between two strings of lengthm andn m, respectively, and let s be the alphabet size. The first revised strategy follows the paradigm of a previousO(ln) time algorithm by Hirschberg. The new version can be implemented in timeO(lm · min logs, logm, log(2n/m)), which is profitable when the input strings differ considerably in size (a looser bound for both versions isO(mn)). The second strategy improves on the Hunt-Szymanski algorithm. This latter takes timeO((r +n) logn), wherermn is the total number of matches between the two input strings. Such a performance is quite good (O(n logn)) whenrn, but it degrades to (mn logn) in the worst case. On the other hand the variation presented here is never worse than linear-time in the productmn. The exact time bound derived for this second algorithm isO(m logn +d log(2mn/d)), whered r is the number ofdominant matches (elsewhere referred to asminimal candidates) between the two strings. Both algorithms require anO(n logs) preprocessing that is nearly standard for the LCS problem, and they make use of simple and handy auxiliary data structures.  相似文献   

6.
7.
F. Chin  C. K. Poon 《Algorithmica》1994,12(4-5):293-311
Although theLongest Common Subsequence (LCS)Problem has been studied by many researchers for years, heuristic methods have not been investigated before. In this paper we present a simple heuristic which guarantees to return a common subsequence of length at least 1/s that of the longest wheres is the number of different symbols in the input strings. Furthermore, we generalize the idea to several classes of heuristic algorithms. Surprisingly, we find that no other heuristic in these classes outperforms this simple algorithm. In other words, we show that any heuristic which uses only global information, such as number of symbol occurrences, might return a common subsequence as short as 1/s of the length of the longest. Analysis of the average performance of the simple heuristic fors=2 is also presented.This research was supported in part by ONR Grant N00014-87-K-0833.  相似文献   

8.
Abstract

We design linear time systolic-based parallel algorithms that run on two-dimensional arrays for both computing the length and recovering a longest common subsequence of three given sequences that are appropriate for very large-scale integration (VLSI) implementation. These problems have been qualified to be difficult to be solved in linear time in [14], and our approach, which generalizes the methods used for determining a longest common subsequence of two strings [28,38] to the case of three strings, enables to solve both problems in linear time. Given the three sequences A, B and C of length n, m and l (n ≤ m ≤ l;), we provide an algorithm that computes the length p of their longest common subsequence on a modular I/O bounded one-way two-dimensional array of nm processors in n + 2m + l? 1 time-steps. To compute a longest common subsequence of the three given strings, we show that each processor of the above array requires an O(min{n,p\) local storage to solve the problem in In + 1m + 1 + p — 2 time-steps.  相似文献   

9.
A longest common subsequence (LCS) of two strings is a common subsequence of the two strings of maximal length. The LCS problem is to find an LCS of two given strings and the length of the LCS (LLCS). In this paper, a fast linear systolic algorithm that improves on previous systolic algorithms for solving the LCS problem is presented. For two given strings of length m and n, where m n, the LLCS and an LCS can be found in m + 2n – 1 time steps. This algorithm achieves the tight lower bound of the time complexity under the situation where symbols are input sequentially to a linear array of n processors. The systolic algorithm can be modified to take only m + n steps on multicomputers by using the scatter operation.  相似文献   

10.
Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, Arslan introduced the regular expression constrained sequence alignment problem and proposed an algorithm which, if implemented naïvely, can take time and space up to O(2|Σ|4|V|n2) and O(2|Σ|4|V|n), respectively, where Σ is the alphabet, n is the sequence length, and V is the set of states in an automaton equivalent to the input regular expression. In this paper we propose a more efficient algorithm solving this problem which takes O(3|V|n2) time and O(2|V|n) space in the worst case. If |V|=O(logn) we propose another algorithm with time complexity O(2|V|log|V|n2). The time complexity of our algorithms is independent of Σ, which is desirable in protein applications where the formulation of this problem originates; a factor of 2|Σ|=400 in the time complexity of the previously proposed algorithm would significantly affect the efficiency in practice.  相似文献   

11.
The Longest Common Subsequence problem seeks a longest subsequence of every member of a given set of strings. It has applications, among others, in data compression, FPGA circuit minimization, and bioinformatics. The problem is NP-hard for more than two input strings, and the existing exact solutions are impractical for large input sizes. Therefore, several approximation and (meta) heuristic algorithms have been proposed which aim at finding good, but not necessarily optimal, solutions to the problem. In this paper, we propose a new algorithm based on the constructive beam search method. We have devised a novel heuristic, inspired by the probability theory, intended for domains where the input strings are assumed to be independent. Special data structures and dynamic programming methods are developed to reduce the time complexity of the algorithm. The proposed algorithm is compared with the state-of-the-art over several standard benchmarks including random and real biological sequences. Extensive experimental results show that the proposed algorithm outperforms the state-of-the-art by giving higher quality solutions with less computation time for most of the experimental cases.  相似文献   

12.
The Arc-Preserving Subsequence (APS) problem appears in the comparison of RNA structures in computational biology. Given two arc-annotated sequences of length n and m<n, APS asks if the shorter sequence can be obtained from the longer one by deleting certain elements along with their incident arcs. It is known that APS with pairwise nested arcs can be solved in O(mn) time. We give an algorithm running in O(m2+n) time in the worst case, actually it is even faster in particular if the shorter sequence has many arcs. The result may serve as a building block for improved APS algorithms in the general case.  相似文献   

13.
Johan Jeuring 《Algorithmica》1994,11(2):146-184
A theory for the derivation of on-line algorithms is presented. The algorithms are derived in the Bird-Meertens calculus for program transformations. This calculus provides a concise functional notation for algorithms, and a few powerful theorems for proving equalities of functions. The theory for the derivation of on-line algorithms is illustrated with the derivation of an algorithm for finding palindromes.An on-line linear-time random access machine (RAM) algorithm for finding the longest palindromic substring in a string is derived. For the purpose of finding the longest palindromic substring, all maximal palindromic substrings are computed. The list of maximal palindromes obtained in the computation of the longest palindrome can be used for other purposes such as finding the largest palindromic rectangle in a matrix and finding the shortest partition of a string into palindromes.This research was supported by the Dutch organization for scientific research NWO, under NFI project STOP, project number NF 62/63-518.  相似文献   

14.
We present parallel algorithms for computing all pair shortest paths in directed graphs. Our algorithm has time complexityO(f(n)/p+I(n)logn) on the PRAM usingp processors, whereI(n) is logn on the EREW PRAM, log logn on the CCRW PRAM,f(n) iso(n 3). On the randomized CRCW PRAM we are able to achieve time complexityO(n 3/p+logn) usingp processors. A preliminary version of this paper was presented at the 4th Annual ACM Symposium on Parallel Algorithms and Architectures, June 1992. Support by NSF Grant CCR 90-20690 and PSC CUNY Awards #661340 and #662478.  相似文献   

15.
This paper introduces a simple Sum-over-Paths (SoP) formulation of string edit distances accounting for all possible alignments between two sequences, and extends related previous work from bioinformatics to the case of graphs with cycles. Each alignment ℘, with a total cost C(℘), is assigned a probability of occurrence P(℘)=exp[−θC(℘)]/Z where Z is a normalization factor. Therefore, good alignments (having a low cost) are favored over bad alignments (having a high cost). The expected cost ∑℘∈PC(℘)exp[−θC(℘)]/Z computed over all possible alignments ℘∈P defines the SoP edit distance. When θ→∞, only the best alignments matter and the measure reduces to the standard edit distance. The rationale behind this definition is the following: for some applications, two sequences sharing many good alignments should be considered as more similar than two sequences having only one single good, optimal, alignment in common. In other words, sub-optimal alignments could also be taken into account. Forward/backward recurrences allowing to efficiently compute the expected cost are developed. Virtually any Viterbi-like sequence comparison algorithm computed on a lattice can be generalized in the same way; for instance, a SoP longest common subsequence is also developed. Pattern classification tasks performed on five data sets show that the new measures usually outperform the standard ones and, in any case, never perform significantly worse, at the expense of tuning the parameter θ.  相似文献   

16.
We derive fast algorithms for the following problem: given a set of n points on the real line and two parameters s and p, find s disjoint intervals of maximum total length that contain at most p of the given points. Our main contribution consists of algorithms whose time bounds improve upon a straightforward dynamic programming algorithm, in the relevant case that input size n is much bigger than parameters s and p. These results are achieved by selecting a few candidate intervals that are provably sufficient for building an optimal solution via dynamic programming. As a byproduct of this idea we improve an algorithm for a similar subsequence problem of Chen et al. [Disjoint segments with maximum density, in: International Workshop on Bioinformatics Research and Applications IWBRA 2005, (within ICCS 2005), Lecture Notes in Computer Science, vol. 3515, Springer, Berlin, pp. 845–850]. The problems are motivated by the search for significant patterns in biological data. Finally, we propose several heuristics that further reduce the time complexity in typical instances. One of them leads to an apparently open subsequence sum problem of independent interest.  相似文献   

17.
ObjectiveThis paper presents an algorithm for the solution of the motif discovery problem (MDP).Methods and materialsMotif discovery problem can be considered in two cases: motifs with insertions/deletions, and motifs without insertions/deletions. The first group motifs can be found by stochastic and approximated methods. The second group can be found by using stochastic and approximated methods, but also deterministic method. We proved that the second group motifs can be found with a deterministic algorithm, and so, it can be said that the second motifs finding is a P-type problem as proved in this paper.Results and conclusionsAn algorithm was proposed in this paper for motif discovery problem. The proposed algorithm finds all motifs which are occurred in the sequence at least two times, and it also finds motifs of various sizes. Due to this case, this algorithm is regarded as Automatic Exact Motif Discovery Algorithm. All motifs of different sizes can be found with this algorithm, and this case was proven in this paper. It shown that automatic exact motif discovery is a P-type problem in this paper. The application of the proposed algorithm has been shown that this algorithm is superior to MEME, MEME3, Motif Sampler, WEEDER, CONSENSUS, AlignACE.  相似文献   

18.
This paper addresses the total completion time minimization in a two-stage differentiation flowshop where the sequences of jobs per type are predetermined. The two-stage differentiation flowshop consists of a stage-1 common machine and m stage-2 parallel dedicated machines. The goal is to determine an optimal interleaved processing sequence of all jobs at the first stage. We propose an dynamic programming algorithm, where nk is the number of type-k jobs. The running time is polynomial when m is constant.  相似文献   

19.
String inclusion and non-inclusion problems have been vigorously studied in such diverse fields as molecular biology, data compression, and computer security. Among the well-known string inclusion or non-inclusion notions, we are interested in the longest common nonsuperstring. Given a set of strings, the longest common nonsuperstring problem is finding the longest string that is not a superstring of any string in the given set. It is known that the longest common nonsuperstring problem is solvable in polynomial time.In this paper, we propose an efficient algorithm for the longest common nonsuperstring problem. The running time of our algorithm is linear with respect to the sum of the lengths of the strings in the given set, using generalized suffix trees.  相似文献   

20.
We give two optimal linear-time algorithms for computing the Longest Previous Factor (LPF) array corresponding to a string w. For any position i in w, LPF[i] gives the length of the longest factor of w starting at position i that occurs previously in w. Several properties and applications of LPF are investigated. They include computing the Lempel-Ziv factorization of a string and detecting all repetitions (runs) in a string in linear time independently of the integer alphabet size.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号