首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
M?kinen  Ukkonen  Navarro 《Algorithmica》2008,35(4):347-369
Abstract. We focus on the problem of approximate matching of strings that have been compressed using run-length encoding. Previous studies have concentrated on the problem of computing the longest common subsequence (LCS) between two strings of length m and n , compressed to m' and n' runs. We extend an existing algorithm for the LCS to the Levenshtein distance achieving O(m'n+n'm) complexity. Furthermore, we extend this algorithm to a weighted edit distance model, where the weights of the three basic edit operations can be chosen arbitrarily. This approach also gives an algorithm for approximate searching of a pattern of m letters (m' runs) in a text of n letters (n' runs) in O(mm'n') time. Then we propose improvements for a greedy algorithm for the LCS, and conjecture that the improved algorithm has O(m'n') expected case complexity. Experimental results are provided to support the conjecture.  相似文献   

2.
Mäkinen  Ukkonen  Navarro 《Algorithmica》2003,35(4):347-369
We focus on the problem of approximate matching of strings that have been compressed using run-length encoding. Previous studies have concentrated on the problem of computing the longest common subsequence (LCS) between two strings of length m and n , compressed to m' and n' runs. We extend an existing algorithm for the LCS to the Levenshtein distance achieving O(m'n+n'm) complexity. Furthermore, we extend this algorithm to a weighted edit distance model, where the weights of the three basic edit operations can be chosen arbitrarily. This approach also gives an algorithm for approximate searching of a pattern of m letters (m' runs) in a text of n letters (n' runs) in O(mm'n') time. Then we propose improvements for a greedy algorithm for the LCS, and conjecture that the improved algorithm has O(m'n') expected case complexity. Experimental results are provided to support the conjecture.  相似文献   

3.
Finding the longest common subsequence in k-length substrings (LCSk) is a recently proposed problem motivated by computational biology. This is a generalization of the well-known LCS problem in which matching symbols from two sequences A and B are replaced with matching non-overlapping substrings of length k from A and B. We propose several algorithms for LCSk, being non-trivial incarnations of the major concepts known from LCS research (dynamic programming, sparse dynamic programming, tabulation). Our algorithms make use of a linear-time and linear-space preprocessing finding the occurrences of all the substrings of length k from one sequence in the other sequence.  相似文献   

4.
Data compression can be used to simultaneously reduce memory, communication and computation requirements of string comparison. In this paper we address the problem of computing the length of the longest common subsequence (LCS) between run-length-encoded (RLE) strings. We exploit RLE both to reduce the complexity of LCS computation from O(M×N) to O(mN+Mnmn), where M and N are the lengths of the original strings and m and n the number of runs in their RLE representation, and to improve the inherent parallelism of the proposed algorithm, so that it may execute in O(m+n) steps on a systolic array of M+N units.We also discuss the application of the proposed algorithm to the related problem of edit distance (ED) computation.  相似文献   

5.
New efficient algorithms for the LCS and constrained LCS problems   总被引:1,自引:0,他引:1  
In this paper, we study the classic and well-studied longest common subsequence (LCS) problem and a recent variant of it, namely the constrained LCS (CLCS) problem. In the CLCS problem, the computed LCS must also be a supersequence of a third given string. In this paper, we first present an efficient algorithm for the traditional LCS problem that runs in O(Rloglogn+n) time, where R is the total number of ordered pairs of positions at which the two strings match and n is the length of the two given strings. Then, using this algorithm, we devise an algorithm for the CLCS problem having time complexity O(pRloglogn+n) in the worst case, where p is the length of the third string.  相似文献   

6.
We observe a repeated-update problem in the process of updating walkabout strengths of the DeltaBlue algorithm, which is known as a fast constraint solving method based on local propagation. According to the basic references on the DeltaBlue algorithm, the time complexity of the planning phase is described as O(MN) for a constraint problem such that the number of constraints is N and the maximum number of methods a constraint has is M . We, however, point out that the time complexity is not O(MN) using a very simple example. In this example, the time complexity is exponential order for N . Then we present a corrected DeltaBlue algorithm whose time complexity is O(EN 2) for a constraint problem such that the number of constraints is N and the maximum number of variables a constraint has is E . Finally we measure the performance of the corrected DeltaBlue algorithm using two benchmarks.  相似文献   

7.
Summary Efficient algorithms for computing the longest common subsequence (LCS for short) are discussed. O(pn) algorithm and O(p(m-p) log n) algorithm [Hirschberg 1977] seem to be best among previously known algorithms, where p is the length of an LCS and m and n are the lengths of given two strings (mn). There are many applications where the expected length of an LCS is close to m.In this paper, O(n(m-p)) algorithm is presented. When p is close to m (in other words, two given strings are similar), the algorithm presented here runs much faster than previously known algorithms.  相似文献   

8.
A New Efficient Algorithm for Computing the Longest Common Subsequence   总被引:1,自引:0,他引:1  
The Longest Common Subsequence (LCS) problem is a classic and well-studied problem in computer science. The LCS problem is a common task in DNA sequence analysis with many applications to genetics and molecular biology. In this paper, we present a new and efficient algorithm for solving the LCS problem for two strings. Our algorithm runs in O(ℛlog log n+n) time, where ℛ is the total number of ordered pairs of positions at which the two strings match. Preliminary version appeared in [24]. C.S. Iliopoulos is supported by EPSRC and Royal Society grants. M.S. Rahman is supported by the Commonwealth Scholarship Commission in the UK under the Commonwealth Scholarship and Fellowship Plan (CSFP) and is on Leave from Department of CSE, BUET, Dhaka-1000, Bangladesh.  相似文献   

9.
A classical measure of similarity between strings is the length of the longest common subsequence (LCS) between the two given strings. The search for efficient algorithms for finding the LCS has been going on for more than three decades. To date, all known algorithms may take quadratic time (shaved by logarithmic factors) to find large LCS. In this paper, the problem of approximating LCS is studied, while focusing on the hard inputs for this problem, namely, approximating LCS of near-linear size in strings over a relatively large alphabet (of size at least n? for some constant ?>0, where n is the length of the string). We show that, any given string over a relatively large alphabet can be embedded into a locally non-repetitive string. This embedding has a negligible additive distortion for strings that are not too dissimilar in terms of the edit distance. We also show that LCS can be efficiently approximated in locally-non-repetitive strings. Our new method (the embedding together with the approximation algorithm) gives a strictly sub-quadratic time algorithm (i.e., of complexity O(n2-?) for some constant ?) which can find common subsequences of linear (and near linear) size that cannot be detected efficiently by the existing tools.  相似文献   

10.
A longest common subsequence (LCS) of two strings is a common subsequence of the two strings of maximal length. The LCS problem is to find an LCS of two given strings and the length of the LCS (LLCS). In this paper, a fast linear systolic algorithm that improves on previous systolic algorithms for solving the LCS problem is presented. For two given strings of length m and n, where m n, the LLCS and an LCS can be found in m + 2n – 1 time steps. This algorithm achieves the tight lower bound of the time complexity under the situation where symbols are input sequentially to a linear array of n processors. The systolic algorithm can be modified to take only m + n steps on multicomputers by using the scatter operation.  相似文献   

11.
A sequential algorithm with complexity O(M2+n) for the integer knapsack problem is presented. M is the capacity of the knapsack, and n the number of objects. The algorithm admits an efficient parallelization on a p-processor ring machine. The corresponding parallel algorithm is O(M2/p+n). The parallel algorithm is compared with a version of the well-known Lee algorithm adapted to the integer knapsack problem. Computational results on both a local area network and a transputer are reported.  相似文献   

12.
A variety of applications have motivated interest in the hidden-line and hidden-surface problem. This has resulted in a number of fundamentally different solutions. However no algorithm has been shown to be optimal. A common trait among algorithms for hidden-line elimination is a worst case complexity ofO(n 2). It is the interent here to introduce an algorithm that exhibits a linear worst case complexity. The use of a restricted class of input, has been employed to achieve asymptotic improvement in complexity as well as simplifying the problem enough to permit theoretic analysis of the algorithm. The class of input is still general enough to conform to the requirements of a number of applications.  相似文献   

13.
T. Matsui 《Algorithmica》1997,18(4):530-543
In this paper we propose an algorithm for generating all the spanning trees in undirected graphs. The algorithm requires O (n+m+ τ n) time where the given graph has n vertices, m edges, and τ spanning trees. For outputting all the spanning trees explicitly, this time complexity is optimal. Our algorithm follows a special rooted tree structure on the skeleton graph of the spanning tree polytope. The rule by which the rooted tree structure is traversed is irrelevant to the time complexity. In this sense, our algorithm is flexible. If we employ the depth-first search rule, we can save the memory requirement to O (n+m). A breadth-first implementation requires as much as O (m+ τ n) space, but when a parallel computer is available, this might have an advantage. When a given graph is weighted, the best-first search rule provides a ranking algorithm for the minimum spanning tree problem. The ranking algorithm requires O (n+ m + τ n) time and O (m+ τ n) space when we have a minimum spanning tree. Received January 21, 1995; revised February 19, 1996.  相似文献   

14.
This paper describes a distributed algorithm for computing the biconnected components of a dynamically changing graph. Our algorithm has a worst-case communication complexity of O(b+c) messages for an edge insertion and O(b'+c) messages for an edge removal, and a worst-case time complexity of O(c) for both operations, where c is the maximum number of biconnected components in any of the connected components during the operation, b is the number of nodes in the biconnected component containing the new edge, and b' is the number of nodes in the biconnected component just before the deletion. The algorithm is presented in two stages. First, a serial algorithm is presented in which topology updates occur one at a time. Then, building on the serial algorithm, an algorithm is presented in which concurrent update requests are serialized within each connected component. The problem is motivated by the need to implement causal ordering of messages efficiently in a dynamically changing communication structure. Received January 1995; revised February 1997.  相似文献   

15.
We show that the fixed alphabet shortest common supersequence (SCS) and the fixed alphabet longest common subsequence (LCS) problems parameterized in the number of strings are W[1]-hard. Unless W[1]=FPT, this rules out the existence of algorithms with time complexity of O(f(k)nα) for those problems. Here n is the size of the problem instance, α is constant, k is the number of strings and f is any function of k. The fixed alphabet version of the LCS problem is of particular interest considering the importance of sequence comparison (e.g. multiple sequence alignment) in the fixed length alphabet world of DNA and protein sequences.  相似文献   

16.
提出2种针对3条源序列的近似LCS算法,近似因子均为1/|?|。其中,线性近似LCS算法的时空复杂度均为 , 为最长源序列的长度,适于解决大规模问题。递归近似LCS算法时空复杂度均为O(nlogn),适于要求高精度问题。同时,这2种算法都能用于解决多序列的LCS和CLCS问题。实验验证了这2种算法的有效性。  相似文献   

17.
The longest common subsequence problem (LCS) and the closest substring problem (CSP) are two models for finding common patterns in strings, and have been studied extensively. Though both LCS and CSP are NP-Hard, they exhibit very different behavior with respect to polynomial time approximation algorithms. While LCS is hard to approximate within n δ for some δ>0, CSP admits a polynomial time approximation scheme. In this paper, we study the longest common rigid subsequence problem (LCRS). This problem shares similarity with both LCS and CSP and has an important application in motif finding in biological sequences. We show that it is NP-hard to approximate LCRS within ratio n δ , for some constant δ>0, where n is the maximum string length. We also show that it is NP-Hard to approximate LCRS within ratio Ω(m), where m is the number of strings.  相似文献   

18.
We consider the problem of merging two sorted sequences on constant degree networks performing compare—exchange operations only. The classical solution to this problem is given by the networks based on Batcher's Odd—Even Merge and Bitonic Merge running in log(2n ) time. Due to the obvious log n lower bound for the runtime, this is time-optimal. We present a new family of merging networks working in a completely different way than the previously known algorithms. They have a novel property of being periodic: this means that for some (small) constant k , each processing unit of the network performs the same operations at steps t and t+k (as long as t+k does not exceed the runtime). The only operations executed are compare—exchange operations, just like in the case of Batcher's networks. The architecture of the networks is very simple and easy to lay out. We show that even for period 3 there is a network in our family merging two n -element sorted sequences in time O(log n ). Since each network of period 2 requires steps to merge such sequences, 3 is the smallest period for which we may achieve a fast runtime. In order to improve constants standing in front of log n we increment the period and tune the construction using additional techniques. We achieve the runtime 9 . . . log_3 n 5.7 . . . log n for a network of period 4, and 2.25 . . . ((k+3)/(k-1+log 3))log n 2.25 . . . log n for a network of period k+3 , for . Due to the periodic design, our networks have small area complexity. For instance, if each processing unit requires O(1) area and a comparator uses a single wire of width O(1) connecting the processing elements, then our networks require area. This compares well with Batcher's networks that require area . Received February 1997, and in revised form September 1997, and in final form February 1998.  相似文献   

19.
Given two strings A and B of lengths na and nb, respectively, the All-substrings Longest Common Subsequence (ALCS) problem obtains, for any substring B' of B, the length of the longest string that is a subsequence of both A and B'. The sequential algorithm for this problem takes O(na nb) time and O(nb) space. We present a parallel algorithm for the ALCS problem on the Coarse-Grained Multicomputer (BSP/CGM) model with p < √na processors, that takes O(na nb/p) time, O(log p) communication rounds and O(nb √na) space per processor. The proposed algorithm also solves the basic Longest Common Subsequence (LCS) problem that finds the longest string (and not only its length) that is a subsequence of both A and B. To our knowledge, this is the best BSP/CGM algorithm in the literature for the LCS and ALCS problems.  相似文献   

20.
Coalition structure generation is a central problem in characteristic function games. Most algorithmic work to date can be classified into one of three broad categories: anytime algorithms, design-to-time algorithms and heuristic algorithms [5]. This paper focuses on the former two approaches. Both design-to-time and anytime algorithms have pros and cons. While design-to-time algorithms guarantee finding an optimal solution, they must be run to completion in order to generate any solution. Anytime algorithms; however, permit premature termination while providing solutions of ever increasing quality along with quality guarantees. Design-to-time algorithms have a better worst case runtime (O(3 n ) for n agents) compared to the current anytime algorithms (O(n n ) for n agents), but do not provide the flexibility of anytime algorithms. In this paper we present the first design-to-time constant factor approximation algorithms for coalition structure generation that guarantee high quality solutions quickly. We show how our approach can be used as an anytime algorithm, which combines both the worst case runtime of the design-to-time algorithms and the flexibility of the anytime algorithms. This results in the first anytime algorithm for coalition structure generation which has the same worst case time complexity of the current best design-to-time algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号