首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Approximation algorithms for tree alignment with a given phylogeny   总被引:3,自引:0,他引:3  
We study the following fundamental problem in computational molecular biology: Given a set of DNA sequences representing some species and a phylogenetic tree depicting the ancestral relationship among these species, compute an optimal alignment of the sequences by the means of constructing a minimum-cost evolutionary tree. The problem is an important variant of multiple sequence alignment, and is widely known astree alignment. We design an efficient approximation algorithm with performance ratio 2 for tree alignment. The algorithm is then extended to a polynomial-time approximation scheme. The construction actually works for Steiner trees in any metric space, and thus implies a polynomial-time approximation scheme for planar Steiner trees under a given topology (with any constant degree). To our knowledge, this is the first polynomial-time approximation scheme in the fields of computational biology and Steiner trees. The approximation algorithms may be useful in evolutionary genetics practice as they can provide a good initial alignment for the iterative method in [23].Supported in part by NSERC Operating Grant OGP0046613.Supported in part by NSERC Operating Grant OGP0046613 and a Canadian Genome Analysis and Technology Research Grant.Supported in part by US Department of Energy Grant DE-FG03-90ER6099.  相似文献   

2.
We prove that the multiple sequence alignment problem with weighted sum-of-pairs score is APX-hard for arbitrary metric scoring functions over the binary alphabet. This holds even when the weights are restricted to zero and one.  相似文献   

3.
We study the Euclidean bottleneck Steiner tree problem: given a set P of n points in the Euclidean plane and a positive integer k, find a Steiner tree with at most k Steiner points such that the length of the longest edge in the tree is minimized. This problem is known to be NP-hard even to approximate within ratio and there was no known exact algorithm even for k=1 prior to this work. In this paper, we focus on finding exact solutions to the problem for a small constant k. Based on geometric properties of optimal location of Steiner points, we present an optimal -time exact algorithm for k=1 and an O(n2)-time algorithm for k=2. Also, we present an optimal -time exact algorithm for any constant k for a special case where there is no edge between Steiner points.  相似文献   

4.
Genome resequencing with short reads generated from pyrosequencing generally relies on mapping the short reads against a single reference genome. However, mapping of reads from multiple reference genomes is not possible using a pairwise mapping algorithm. In order to align the reads w.r.t each other and the reference genomes, existing multiple sequence alignment(MSA) methods cannot be used because they do not take into account the position of these short reads with respect to the genome, and are highly inefficient for a large number of sequences. In this paper, we develop a highly scalable parallel algorithm based on domain decomposition, referred to as P-Pyro-Align, to align such a large number of reads from single or multiple reference genomes. The proposed alignment algorithm accurately aligns the erroneous reads, and has been implemented on a cluster of workstations using MPI library. Experimental results for different problem sizes are analyzed in terms of execution time, quality of the alignments, and the ability of the algorithm to handle reads from multiple haplotypes. We report high quality multiple alignment of up to 0.5 million reads. The algorithm is shown to be highly scalable and exhibits super-linear speedups with increasing number of processors.  相似文献   

5.
Multiple sequence alignment (MSA) and phylogenetic tree reconstruction are one of the most important problems in the computational biology. While both these problems are of great practical significance, in most cases they are very computationally demanding. In this paper we propose a new approach to the MSA problem which simultaneously infers an underlying phylogenetic tree. To process large data sets we provide parallel implementation of our method, which is based on the distributed caching of intermediate results. Finally, we show a parallel server designed for grid environments, and we report results of experiments performed with actual biological data, e.g. 1000 ribosomal RNA sequences.  相似文献   

6.
On the partial terminal Steiner tree problem   总被引:1,自引:0,他引:1  
We investigate a practical variant of the well-known graph Steiner tree problem. For a complete graph G = ( V,E ) with length function l:E R + and two vertex subsets R V and R R, a partial terminal Steiner tree is a Steiner tree which contains all vertices in R such that all vertices in R R belong to the leaves of this Steiner tree. The partial terminal Steiner tree problem is to find a partial terminal Steiner tree T whose total lengths (u,v) T l ( u,v ) is minimum. In this paper, we show that the problem is both NP-complete and MAX SNP-hard when the lengths of edges are restricted to either 1 or 2. We also provide an approximation algorithm for the problem.
Sun-Yuan HsiehEmail:
  相似文献   

7.
On approximation algorithms for the terminal Steiner tree problem   总被引:1,自引:0,他引:1  
The terminal Steiner tree problem is a special version of the Steiner tree problem, where a Steiner minimum tree has to be found in which all terminals are leaves. We prove that no polynomial time approximation algorithm for the terminal Steiner tree problem can achieve an approximation ratio less than (1−o(1))lnn unless NP has slightly superpolynomial time algorithms. Moreover, we present a polynomial time approximation algorithm for the metric version of this problem with a performance ratio of 2ρ, where ρ denotes the best known approximation ratio for the Steiner tree problem. This improves the previously best known approximation ratio for the metric terminal Steiner tree problem of ρ+2.  相似文献   

8.
We consider two aesthetic criteria for the visualization of rooted trees: inclusion and tip-over. Finding the minimum area layout according to either of these two standards is an NP-hard task, even when we restrict ourselves to binary trees.We provide a fully polynomial time approximation scheme for this problem. This result applies to any tree for tip-over layouts and to bounded degree trees in the case of the inclusion convention. We also prove that such restriction is necessary since, for unbounded degree trees, the inclusion problem is strongly NP-hard. Hence, neither a fully polynomial time approximation scheme nor a pseudopolynomial time algorithm exists, unless P=NP. Our technique, combined with the parallel algorithm by Metaxas et al. [Comput. Geom. 9 (1998) 145-158], also yields an NC fully parallel approximation scheme. This latter result holds for inclusion of binary trees and for the slicing floorplanning problem. Although this problem is in P, it is unknown whether it belongs to NC or not. All the above results also apply to other size functions of the drawing (e.g., the perimeter).  相似文献   

9.
We investigate a practical variant of the well-known graph Steiner tree problem. In this variant, every target vertex is required to be a leaf vertex in the solution Steiner tree. We present hardness results for this variant as well as a polynomial time approximation algorithm with performance ratio ρ+2, where ρ is the best-known approximation ratio for the graph Steiner tree problem.  相似文献   

10.
The Degree- Δ Closest Phylogenetic k th Root Problem (ΔCPR k ) is the problem of finding a (phylogenetic) tree T from a given graph G=(V,E) such that (1) the degree of each internal node in T is at least 3 and at most Δ, (2) the external nodes (i.e. leaves) of T are exactly the elements of V, and (3) the number of disagreements, i.e., |E {{u,v} : u,v are leaves of T and d T (u,v)≤k}|, is minimized, where d T (u,v) denotes the distance between u and v in tree T. This problem arises from theoretical studies in evolutionary biology and generalizes several important combinatorial optimization problems such as the maximum matching problem. Unfortunately, it is known to be NP-hard for all fixed constants Δ,k such that either both Δ≥3 and k≥3, or Δ>3 and k=2. This paper presents a polynomial-time 8-approximation algorithm for Δ CPR 2 for any fixed Δ>3, a quadratic-time 12-approximation algorithm for 3CPR 3, and a polynomial-time approximation scheme for the maximization version of Δ CPR k for any fixed Δ and k.  相似文献   

11.
In this paper, we study the protein threading problem, which was proposed for predicting a folded 3D protein structure from an amino acid sequence. Since this problem was already proved to be NP-hard, we study polynomial time approximation algorithms. We show several hardness results for the approximation, which includes a MAX SNP-hardness result. We also show approximation algorithms for a special case and a general case, where a graph representing interactions between amino acid residues is restricted to be planar in a special case. For this special case, we obtain a constant approximation ratio.  相似文献   

12.
We consider the problem of computing a watchman route in a polygon with holes. We show that the problem of finding a minimum-link watchman route is NP-complete, even if the holes are all convex. The proof is based on showing that the related problem of finding a minimum-link tour on a set of points in the plane is NP-complete. We provide a provably good approximation algorithm that achieves an approximation factor of O(logn).  相似文献   

13.
Max-Satisfy is the problem of finding an assignment that satisfies the maximum number of equations in a system of linear equations over Q. We prove that unless NP⊂BPP Max-Satisfy cannot be efficiently approximated within an approximation ratio of 1/n1−?, if we consider systems of n linear equations with at most n variables and ?>0 is an arbitrarily small constant. Previously, it was known that the problem is NP-hard to approximate within a ratio of 1/nα, but 0<α<1 was some specific constant that could not be taken to be arbitrarily close to 1.  相似文献   

14.
This paper deals with approximation algorithms for the prize collecting generalized Steiner forest problem, defined as follows. The input is an undirected graph G=(V,E), a collection T={T1,…,Tk}, each a subset of V of size at least 2, a weight function , and a penalty function . The goal is to find a forest F that minimizes the cost of the edges of F plus the penalties paid for subsets Ti whose vertices are not all connected by F. Our main result is a -approximation for the prize collecting generalized Steiner forest problem, where n2 is the number of vertices in the graph. This obviously implies the same approximation for the special case called the prize collecting Steiner forest problem (all subsets Ti are of size 2). The approximation algorithm is obtained by applying the local ratio method, and is much simpler than the best known combinatorial algorithm for this problem.Our approach gives a -approximation for the prize collecting Steiner tree problem (all subsets Ti are of size 2 and there is some root vertex r that belongs to all of them). This latter algorithm is in fact the local ratio version of the primal-dual algorithm of Goemans and Williamson [M.X. Goemans, D.P. Williamson, A general approximation technique for constrained forest problems, SIAM Journal on Computing 24 (2) (April 1995) 296–317]. Another special case of our main algorithm is Bar-Yehuda's local ratio -approximation for the generalized Steiner forest problem (all the penalties are infinity) [R. Bar-Yehuda, One for the price of two: a unified approach for approximating covering problems, Algorithmica 27 (2) (June 2000) 131–144]. Thus, an important contribution of this paper is in providing a natural generalization of the framework presented by Goemans and Williamson, and later by Bar-Yehuda.  相似文献   

15.
We present a polynomial time 1.5h-approximation algorithm for the problem of finding the largest common subtree between two rooted, labeled, and unordered trees of height at most h, where a tree S is called a subtree of a tree T if S is obtained from T by deletion of some nodes in T. This result improves the previous 2h-approximation algorithm.  相似文献   

16.
In 2002, Lin and Xue [Inform. Process. Lett. 84 (2002) 103-107] introduced a variant of the graph Steiner tree problem, in which each terminal vertex is required to be a leaf in the solution Steiner tree. They presented a ρ+2 approximation algorithm, where ρ is the approximation ratio of the best known efficient algorithm for the regular graph Steiner tree problem. In this note, we derive a simplified algorithm with an improved approximation ratio of 2ρ (currently ρ≈1.550, see [SODA 2000, 2000, pp. 770-790]).  相似文献   

17.
Motivated by the problem in computational biology of reconstructing the series of chromosome inversions by which one organism evolved from another, we consider the problem of computing the shortest series of reversals that transform one permutation to another. The permutations describe the order of genes on corresponding chromosomes, and areversal takes an arbitrary substring of elements, and reverses their order.For this problem, we develop two algorithms: a greedy approximation algorithm, that finds a solution provably close to optimal inO(n 2) time and0(n) space forn-element permutations, and a branch- and-bound exact algorithm, that finds an optimal solution in0(mL(n, n)) time and0(n 2) space, wherem is the size of the branch- and-bound search tree, andL(n, n) is the time to solve a linear program ofn variables andn constraints. The greedy algorithm is the first to come within a constant factor of the optimum; it guarantees a solution that uses no more than twice the minimum number of reversals. The lower and upper bounds of the branch- and-bound algorithm are a novel application of maximum-weight matchings, shortest paths, and linear programming.In a series of experiments, we study the performance of an implementation on random permutations, and permutations generated by random reversals. For permutations differing byk random reversals, we find that the average upper bound on reversal distance estimatesk to within one reversal fork<1/2n andn<100. For the difficult case of random permutations, we find that the average difference between the upper and lower bounds is less than three reversals forn<50. Due to the tightness of these bounds, we can solve, to optimality, problems on 30 elements in a few minutes of computer time. This approaches the scale of mitochondrial genomes.This research was supported by a postdoctoral fellowship from the Program in Mathematics and Molecular Biology of the University of California at Berkeley under National Science Foundation Grant DMS-8720208, and by a fellowship from the Centre de recherches mathématiques of the Université de Montréal.This research was supported by grants from the Natural Sciences and Engineering Research Council of Canada, and the Fonds pour la formation de chercheurs et l'aide à la recherche (Québec). The author is a fellow of the Canadian Institute for Advanced Research.  相似文献   

18.
We give a tight analysis of the greedy algorithm introduced by Krumke and Wirth for the minimum label spanning tree problem. The algorithm is shown to be a (ln(n−1)+1)-approximation for any graph with n nodes (n>1), which improves the known performance guarantee 2lnn+1.  相似文献   

19.
多序列联配(MAS)是现代生物信息学中的重要工具之一,MAS问题是NP-难的,因此需要一些启发式方法在合理的时间内联配大的数据集。本文提出了一个基于最小生成树的多序列联配算法,并使用BALiBASE标准数据集合,对我们的算法进行了性能评价,结果表明算法较之ClustalX类的算法其精确度更高。  相似文献   

20.
Given a text string of lengthn and a pattern string of lengthm over ab-letter alphabet, thek differences approximate string matching problem asks for all locations in the text where the pattern occurs with at mostk differences (substitutions, insertions, deletions). We treatk not as a constant but as a fraction ofm (not necessarily constant-fraction). Previous algorithms require at leastO(kn) time (or exponential space). We give an algorithm that is sublinear time0((n/m)k log b m) when the text is random andk is bounded by the threshold m/(logb m + O(1)). In particular, whenk=o(m/logb m) the expected running time iso(n). In the worst case our algorithm is O(kn), but is still an improvement in that it is practical and uses0(m) space compared with0(n) or0(m 2). We define three problems motivated by molecular biology and describe efficient algorithms based on our techniques: (1) approximate substring matching, (2) approximate-overlap detection, and (3) approximate codon matching. Respectively, applications to biology are local similarity search, sequence assembly, and DNA-protein matching.This work was supported in part by NSF Grants CCR-87-04184 and FD-89-02813; by the Human Genome Center, Lawrence Berkeley Laboratory, supported by the Director, Office of Health and Environmental Research, of the U.S. Department of Energy under Contract DE-AC03-76SF00098; and by Department of Energy Grants DE-FG03-90ER60999 and DE-FG02-91ER61190. Earlier versions of this paper appeared as [8] and part of [5].  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号