期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A dynamic programming solution to a generalized LCS problem

Lei Wang Xiaodong Wang Yingjie Wu Daxin Zhu 《Information Processing Letters》2013,113(19-21):723-728

In this paper, we consider a generalized longest common subsequence problem, the string-excluding constrained LCS problem. For the two input sequences X and Y of lengths n and m, and a constraint string P of length r, the problem is to find a common subsequence Z of X and Y excluding P as a substring and the length of Z is maximized. The problem and its solution were first proposed by Chen and Chao (2011) [1], but we found that their algorithm cannot solve the problem correctly. A new dynamic programming solution for the STR-EC-LCS problem is then presented in this paper. The correctness of the new algorithm is proved. The time complexity of the new algorithm is

O (n m r)

. 相似文献

2.

A fast and simple algorithm for computing the longest common subsequence of run-length encoded strings

Hsing-Yen Ann Chiou-Ting Tseng Chiou-Yi Hor 《Information Processing Letters》2008,108(6):360-364

Let X and Y be two strings of lengths n and m, respectively, and k and l, respectively, be the numbers of runs in their corresponding run-length encoded forms. We propose a simple algorithm for computing the longest common subsequence of two given strings X and Y in O(kl+min{p₁,p₂}) time, where p₁ and p₂ denote the numbers of elements in the bottom and right boundaries of the matched blocks, respectively. It improves the previously known time bound O(min{nl,km}) and outperforms the time bounds O(kllogkl) or O((k+l+q)log(k+l+q)) for some cases, where q denotes the number of matched blocks. 相似文献

3.

A New Efficient Algorithm for Computing the Longest Common Subsequence 总被引：1，自引：0，他引：1

Costas S. Iliopoulos M. Sohel Rahman 《Theory of Computing Systems》2009,45(2):355-371

The Longest Common Subsequence (LCS) problem is a classic and well-studied problem in computer science. The LCS problem is a common task in DNA sequence analysis with many applications to genetics and molecular biology. In this paper, we present a new and efficient algorithm for solving the LCS problem for two strings. Our algorithm runs in O(ℛlog log n+n) time, where ℛ is the total number of ordered pairs of positions at which the two strings match. Preliminary version appeared in [24]. C.S. Iliopoulos is supported by EPSRC and Royal Society grants. M.S. Rahman is supported by the Commonwealth Scholarship Commission in the UK under the Commonwealth Scholarship and Fellowship Plan (CSFP) and is on Leave from Department of CSE, BUET, Dhaka-1000, Bangladesh. 相似文献

4.

New efficient algorithms for the LCS and constrained LCS problems 总被引：1，自引：0，他引：1

Costas S. Iliopoulos 《Information Processing Letters》2008,106(1):13-18

In this paper, we study the classic and well-studied longest common subsequence (LCS) problem and a recent variant of it, namely the constrained LCS (CLCS) problem. In the CLCS problem, the computed LCS must also be a supersequence of a third given string. In this paper, we first present an efficient algorithm for the traditional LCS problem that runs in O(Rloglogn+n) time, where R is the total number of ordered pairs of positions at which the two strings match and n is the length of the two given strings. Then, using this algorithm, we devise an algorithm for the CLCS problem having time complexity O(pRloglogn+n) in the worst case, where p is the length of the third string. 相似文献

5.

The longest common subsequence problem revisited 总被引：2，自引：0，他引：2

A. Apostolico C. Guerra 《Algorithmica》1987,2(1):315-336

This paper re-examines, in a unified framework, two classic approaches to the problem of finding a longest common subsequence (LCS) of two strings, and proposes faster implementations for both. Letl be the length of an LCS between two strings of lengthm andn m, respectively, and let s be the alphabet size. The first revised strategy follows the paradigm of a previousO(ln) time algorithm by Hirschberg. The new version can be implemented in timeO(lm · min logs, logm, log(2n/m)), which is profitable when the input strings differ considerably in size (a looser bound for both versions isO(mn)). The second strategy improves on the Hunt-Szymanski algorithm. This latter takes timeO((r +n) logn), wherermn is the total number of matches between the two input strings. Such a performance is quite good (O(n logn)) whenrn, but it degrades to (mn logn) in the worst case. On the other hand the variation presented here is never worse than linear-time in the productmn. The exact time bound derived for this second algorithm isO(m logn +d log(2mn/d)), whered r is the number ofdominant matches (elsewhere referred to asminimal candidates) between the two strings. Both algorithms require anO(n logs) preprocessing that is nearly standard for the LCS problem, and they make use of simple and handy auxiliary data structures. 相似文献

6.

Finding the longest common subsequence for multiple biological sequences by ant colony optimization

Shyong Jian Shyu Chun-Yuan Tsai 《Computers & Operations Research》2009

相似文献

7.

APPLICATION-SPECIFIC ARRAY PROCESSORS FOR THE LONGEST COMMON SUBSEQUENCE PROBLEM OF THREE SEQUENCES ∗ †

《International Journal of Parallel, Emergent and Distributed Systems》2012,27(1):27-52

Abstract

We design linear time systolic-based parallel algorithms that run on two-dimensional arrays for both computing the length and recovering a longest common subsequence of three given sequences that are appropriate for very large-scale integration (VLSI) implementation. These problems have been qualified to be difficult to be solved in linear time in [14], and our approach, which generalizes the methods used for determining a longest common subsequence of two strings [28,38] to the case of three strings, enables to solve both problems in linear time. Given the three sequences A, B and C of length n, m and l (n ≤ m ≤ l;), we provide an algorithm that computes the length p of their longest common subsequence on a modular I/O bounded one-way two-dimensional array of nm processors in n + 2m + l? 1 time-steps. To compute a longest common subsequence of the three given strings, we show that each processor of the above array requires an O(min{n,p\) local storage to solve the problem in In + 1m + 1 + p — 2 time-steps. 相似文献

8.

An Efficient Systolic Algorithm for the Longest Common Subsequence Problem

Lin Yen-Chun Chen Jyh-Chian 《The Journal of supercomputing》1998,12(4):373-385

A longest common subsequence (LCS) of two strings is a common subsequence of the two strings of maximal length. The LCS problem is to find an LCS of two given strings and the length of the LCS (LLCS). In this paper, a fast linear systolic algorithm that improves on previous systolic algorithms for solving the LCS problem is presented. For two given strings of length m and n, where m n, the LLCS and an LCS can be found in m + 2n – 1 time steps. This algorithm achieves the tight lower bound of the time complexity under the situation where symbols are input sequentially to a linear array of n processors. The systolic algorithm can be modified to take only m + n steps on multicomputers by using the scatter operation. 相似文献

9.

Efficient algorithms for regular expression constrained sequence alignment

Yun-Sheng Chung 《Information Processing Letters》2007,103(6):240-246

Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, Arslan introduced the regular expression constrained sequence alignment problem and proposed an algorithm which, if implemented naïvely, can take time and space up to O(²|Σ|⁴|V|n²) and O(²|Σ|⁴|V|n), respectively, where Σ is the alphabet, n is the sequence length, and V is the set of states in an automaton equivalent to the input regular expression. In this paper we propose a more efficient algorithm solving this problem which takes O(³|V|n²) time and O(²|V|n) space in the worst case. If |V|=O(logn) we propose another algorithm with time complexity O(²|V|log|V|n²). The time complexity of our algorithms is independent of Σ, which is desirable in protein applications where the formulation of this problem originates; a factor of ²|Σ|=400 in the time complexity of the previously proposed algorithm would significantly affect the efficiency in practice. 相似文献

10.

An improved algorithm for the longest common subsequence problem

Sayyed Rasoul Mousavi Farzaneh Tabataba 《Computers & Operations Research》2012,39(3):512-520

The Longest Common Subsequence problem seeks a longest subsequence of every member of a given set of strings. It has applications, among others, in data compression, FPGA circuit minimization, and bioinformatics. The problem is NP-hard for more than two input strings, and the existing exact solutions are impractical for large input sizes. Therefore, several approximation and (meta) heuristic algorithms have been proposed which aim at finding good, but not necessarily optimal, solutions to the problem. In this paper, we propose a new algorithm based on the constructive beam search method. We have devised a novel heuristic, inspired by the probability theory, intended for domains where the input strings are assumed to be independent. Special data structures and dynamic programming methods are developed to reduce the time complexity of the algorithm. The proposed algorithm is compared with the state-of-the-art over several standard benchmarks including random and real biological sequences. Extensive experimental results show that the proposed algorithm outperforms the state-of-the-art by giving higher quality solutions with less computation time for most of the experimental cases. 相似文献

11.

A remark on the subsequence problem for arc-annotated sequences with pairwise nested arcs

Peter Damaschke 《Information Processing Letters》2006,100(2):64-68

The Arc-Preserving Subsequence (APS) problem appears in the comparison of RNA structures in computational biology. Given two arc-annotated sequences of length n and m<n, APS asks if the shorter sequence can be obtained from the longer one by deleting certain elements along with their incident arcs. It is known that APS with pairwise nested arcs can be solved in O(mn) time. We give an algorithm running in O(m²+n) time in the worst case, actually it is even faster in particular if the shorter sequence has many arcs. The result may serve as a building block for improved APS algorithms in the general case. 相似文献

12.

The derivation of on-line algorithms,with an application to finding palindromes

Johan Jeuring 《Algorithmica》1994,11(2):146-184

A theory for the derivation of on-line algorithms is presented. The algorithms are derived in the Bird-Meertens calculus for program transformations. This calculus provides a concise functional notation for algorithms, and a few powerful theorems for proving equalities of functions. The theory for the derivation of on-line algorithms is illustrated with the derivation of an algorithm for finding palindromes.An on-line linear-time random access machine (RAM) algorithm for finding the longest palindromic substring in a string is derived. For the purpose of finding the longest palindromic substring, all maximal palindromic substrings are computed. The list of maximal palindromes obtained in the computation of the longest palindrome can be used for other purposes such as finding the largest palindromic rectangle in a matrix and finding the shortest partition of a string into palindromes.This research was supported by the Dutch organization for scientific research NWO, under NFI project STOP, project number NF 62/63-518. 相似文献

13.

Efficient parallel algorithms for computing all pair shortest paths in directed graphs

Yijie Han V. Y. Pan J. H. Reif 《Algorithmica》1997,17(4):399-415

We present parallel algorithms for computing all pair shortest paths in directed graphs. Our algorithm has time complexityO(f(n)/p+I(n)logn) on the PRAM usingp processors, whereI(n) is logn on the EREW PRAM, log logn on the CCRW PRAM,f(n) iso(n ³). On the randomized CRCW PRAM we are able to achieve time complexityO(n ³/p+logn) usingp processors. A preliminary version of this paper was presented at the 4th Annual ACM Symposium on Parallel Algorithms and Architectures, June 1992. Support by NSF Grant CCR 90-20690 and PSC CUNY Awards #661340 and #662478. 相似文献

14.

A sum-over-paths extension of edit distances accounting for all sequence alignments

Silvia García-Díez Author Vitae François Fouss^{Author Vitae} 《Pattern recognition》2011,44(6):1172-1182

This paper introduces a simple Sum-over-Paths (SoP) formulation of string edit distances accounting for all possible alignments between two sequences, and extends related previous work from bioinformatics to the case of graphs with cycles. Each alignment ℘, with a total cost C(℘), is assigned a probability of occurrence P(℘)=exp[−θC(℘)]/Z where Z is a normalization factor. Therefore, good alignments (having a low cost) are favored over bad alignments (having a high cost). The expected cost ∑_℘∈PC(℘)exp[−θC(℘)]/Z computed over all possible alignments ℘∈P defines the SoP edit distance. When θ→∞, only the best alignments matter and the measure reduces to the standard edit distance. The rationale behind this definition is the following: for some applications, two sequences sharing many good alignments should be considered as more similar than two sequences having only one single good, optimal, alignment in common. In other words, sub-optimal alignments could also be taken into account. Forward/backward recurrences allowing to efficiently compute the expected cost are developed. Virtually any Viterbi-like sequence comparison algorithm computed on a lattice can be generalized in the same way; for instance, a SoP longest common subsequence is also developed. Pattern classification tasks performed on five data sets show that the new measures usually outperform the standard ones and, in any case, never perform significantly worse, at the expense of tuning the parameter θ. 相似文献

15.

Fast algorithms for finding disjoint subsequences with extremal densities

Anders Bergkvist Peter Damaschke 《Pattern recognition》2006,39(12):2281-2292

We derive fast algorithms for the following problem: given a set of n points on the real line and two parameters s and p, find s disjoint intervals of maximum total length that contain at most p of the given points. Our main contribution consists of algorithms whose time bounds improve upon a straightforward dynamic programming algorithm, in the relevant case that input size n is much bigger than parameters s and p. These results are achieved by selecting a few candidate intervals that are provably sufficient for building an optimal solution via dynamic programming. As a byproduct of this idea we improve an algorithm for a similar subsequence problem of Chen et al. [Disjoint segments with maximum density, in: International Workshop on Bioinformatics Research and Applications IWBRA 2005, (within ICCS 2005), Lecture Notes in Computer Science, vol. 3515, Springer, Berlin, pp. 845–850]. The problems are motivated by the search for significant patterns in biological data. Finally, we propose several heuristics that further reduce the time complexity in typical instances. One of them leads to an apparently open subsequence sum problem of independent interest. 相似文献

16.

Efficient automatic exact motif discovery algorithms for biological sequences

Ali Karci 《Expert systems with applications》2009,36(4):7952-7963

ObjectiveThis paper presents an algorithm for the solution of the motif discovery problem (MDP).Methods and materialsMotif discovery problem can be considered in two cases: motifs with insertions/deletions, and motifs without insertions/deletions. The first group motifs can be found by stochastic and approximated methods. The second group can be found by using stochastic and approximated methods, but also deterministic method. We proved that the second group motifs can be found with a deterministic algorithm, and so, it can be said that the second motifs finding is a P-type problem as proved in this paper.Results and conclusionsAn algorithm was proposed in this paper for motif discovery problem. The proposed algorithm finds all motifs which are occurred in the sequence at least two times, and it also finds motifs of various sizes. Due to this case, this algorithm is regarded as Automatic Exact Motif Discovery Algorithm. All motifs of different sizes can be found with this algorithm, and this case was proven in this paper. It shown that automatic exact motif discovery is a P-type problem in this paper. The application of the proposed algorithm has been shown that this algorithm is superior to MEME, MEME3, Motif Sampler, WEEDER, CONSENSUS, AlignACE. 相似文献

17.

Total completion time minimization in a 2-stage differentiation flowshop with fixed sequences per job type

Bertrand M.T. Lin F.J. Hwang 《Information Processing Letters》2011,111(5):208-212

This paper addresses the total completion time minimization in a two-stage differentiation flowshop where the sequences of jobs per type are predetermined. The two-stage differentiation flowshop consists of a stage-1 common machine and m stage-2 parallel dedicated machines. The goal is to determine an optimal interleaved processing sequence of all jobs at the first stage. We propose an dynamic programming algorithm, where nk is the number of type-k jobs. The running time is polynomial when m is constant. 相似文献

18.

Finding the longest common nonsuperstring in linear time

Joong Chae Na Dong Kyue Kim Jeong Seop Sim 《Information Processing Letters》2009,109(18):1066-1070

String inclusion and non-inclusion problems have been vigorously studied in such diverse fields as molecular biology, data compression, and computer security. Among the well-known string inclusion or non-inclusion notions, we are interested in the longest common nonsuperstring. Given a set of strings, the longest common nonsuperstring problem is finding the longest string that is not a superstring of any string in the given set. It is known that the longest common nonsuperstring problem is solvable in polynomial time.In this paper, we propose an efficient algorithm for the longest common nonsuperstring problem. The running time of our algorithm is linear with respect to the sum of the lengths of the strings in the given set, using generalized suffix trees. 相似文献

19.

Computing Longest Previous Factor in linear time and applications

Maxime Crochemore Lucian Ilie 《Information Processing Letters》2008,106(2):75-80

We give two optimal linear-time algorithms for computing the Longest Previous Factor (LPF) array corresponding to a string w. For any position i in w, LPF[i] gives the length of the longest factor of w starting at position i that occurs previously in w. Several properties and applications of LPF are investigated. They include computing the Lempel-Ziv factorization of a string and detecting all repetitions (runs) in a string in linear time independently of the integer alphabet size. 相似文献

20.

Ant algorithms and simulated annealing for multicriteria dynamic programming

Sebastian Sitarz 《Computers & Operations Research》2009

The paper presents a comparison of ant algorithms and simulated annealing as well as their applications in multicriteria discrete dynamic programming. The considered dynamic process consists of finite states and decision variables. In order to describe the effectiveness of multicriteria algorithms, four measures of the quality of the nondominated set approximations are used. 相似文献