首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
随着生物序列数据库中序列数据的激增, 开发兼有高度生物敏感性和高效率的算法显得极为迫切. 通过对生物序列比对问题中Needleman-Wunsch算法和Smith-Waterman算法深入分析, 提出了Smith-Waterman算法的改进算法, 并通过实验验证该算法, 对改进前后的运行性能进行比较分析. 实验证明, 改进后的算法实现了双序列局部最优解个数的优化, 有效降低了生物序列比对算法时间与空间的复杂性, 提高序列比对的得分率和准确率.  相似文献   

2.
GPU加速的生物序列比对   总被引:1,自引:1,他引:0  
为了精确高效地进行生物序列比对,提出一种GPU加速的Smith-Waterman算法.该算法使用菱形数据布局以更充分地利用GPU的并行处理能力;使用查询串分批处理技术来支持上百兆规模的序列比对;同时引入树形算法,以优化最大匹配值的计算.将该算法在一块NVIDIA GeForce GTX285显卡上实现,并使用多组不同规模的生物序列进行了比对实验.实验结果表明,与CPU上的串行算法相比,采用文中算法最高可获得120倍以上的性能提升.  相似文献   

3.
The local alignment problem for two sequences requires determining similar regions, one from each sequence, and aligning those regions. The Smith-Waterman algorithm for local sequence alignment is one of the most well-known algorithm in computational molecular biology. This ingenious dynamic programming approach is designed to reveal the highly conserved fragments by discarding poorly conserved initial and terminal segments. However, the local alignment sometimes produces a mosaic of well conserved fragments artificially connected by poorly conserved or even unrelated fragments. This may lead to problems in comparison of long genomic sequences and comparative gene prediction. In this paper we propose a new strategy of dynamic penalty strategy to {ix this problem. In the process of computing similarity matrix, if similarity value is larger than the pre-specified threshold X then starting our strategy, when related character mismatches, then penalizing more than others until similarity value is 0 or the process ends. Test results show that this algorithm has better performance by comparison to the standard Smith-Waterman while dose not increase signally the computational complexity both in time and space.  相似文献   

4.
We present the first space and time optimal parallel algorithm for the pairwise sequence alignment problem, a fundamental problem in computational biology. This problem can be solved sequentially in O(mn) time and O(m+n) space, where m and n are the lengths of the sequences to be aligned. The fastest known parallel space-optimal algorithm for pairwise sequence alignment takes optimal O(m+n/p) space, but suboptimal O((m+n)/sup 2//p) time, where p is the number of processors. On the other hand, the most space economical time-optimal parallel algorithm takes O(mn/p) time, but O(m+n/p) space. We close this gap by presenting an algorithm that achieves both time and space optimality, i.e. requires only O((m+n)/p) space and O(mn/p) time. We also present an experimental evaluation of the proposed algorithm on an IBM xSeries cluster. Although presented in the context of full sequence alignments, our algorithm is applicable to other alignment problems in computational biology including local alignments and syntenic alignments. It is also a useful addition to the range of techniques available for parallel dynamic programming.  相似文献   

5.
导向定位测序(GPS)是一种全基因组DNA甲基化检测的新测序技术,产生的测序数据具有成本低、没有序列偏好等优势.目前,甲基化分析中最重要的一步是将其测序产生的序列比对到参考基因组上.但是,现有导向定位测序的方法使用Smith-Waterman进行局部序列比对,时间消耗过大且容易对序列比对位置产生误判.因此,提出一种导向定位测序数据的改进比对算法,该算法利用其双端测序的优势,先用甲基化序列端数据进行序列比对,对多位置匹配的序列再利用常规数据端数据进行比对位置确定.实验结果表明:本文方法和现有方法的准确率相当,而具有更高的唯一比对比率,时间性能有3倍以上的提升.  相似文献   

6.
序列比对是生物信息学中基本的信息处理方法,对于发现生物序列中的功能、结构和进化信息具有重要的意义。该文对典型的双序列比对算法Smith-Waterman、FASTA、BLAST以及多序列比对算法CLUSTAL进行了描述和评价;针对目前序列比对算法普遍存在的不足,简单介绍了应用KDD技术进行序列相似性发现的定义及其处理过程。  相似文献   

7.
The multiple alignment of the sequences of DNA and proteins is applicable to various important fields in molecular biology. Although the approach based on Dynamic Programming is well-known for this problem, it requires enormous time and space to obtain the optimal alignment. On the other hand, this problem corresponds to the shortest path problem and the A* algorithm, which can efficiently find the shortest path with an estimator, is usable.

First, this paper directly applies the A* algorithm to multiple sequence alignment problem with more powerful estimator in more than two-dimensional case and discusses the extensions of this approach utilizing an upper bound of the shortest path length and of modification of network structure. The algorithm to provide the upper bound is also proposed in this paper. The basic part of these results was originally shown in Ikeda and Imai [11]. This part is similar to the branch-and-bound techniques implemented in MSA program in Gupta et al. [6]. Our framework is based on the edge length transformation to reduce the problem to the shortest path problem, which is more suitable to generalizations to enumerating suboptimal alignments and parametric analysis as done in Shibuya and Imai [15–17]. By this enhanced A* algorithm, optimal multiple alignments of several long sequences can be computed in practice, which is shown by computational results.

Second, this paper proposes a k-group alignment algorithm for multiple alignment as a practical method for much larger-size problem of, say multiple alignments of 50–100 sequences. A basic part of these results were originally presented in Imai and Ikeda [13]. In existing iterative improvement methods for multiple alignment, the so-called group-to-group two-dimensional dynamic programming has been used, and in this respect our proposal is to extend the ordinary two-group dynamic programming to a k-group alignment programming. This extension is conceptually straightforward, and here our contribution is to demonstrate that the k-group alignment can be implemented so as to run in a reasonable time and space under standard computing environments. This is established by generalizing the above A* search approach. The k-group alignment method can be directly incorporated in existing methods such as iterative improvement algorithms [2, 5] and tree-based (iterative) algorithms [9]. This paper performs computational experiments by applying the k-group method to iterative improvement algorithms, and shows that our approach can find better alignments in reasonable time. For example, through larger-scale computational experiments here, 34 protein sequences with very high homology can be optimally 10-group aligned, and 64 sequences with high homology can be optimally 5-group aligned.  相似文献   


8.
为提高生物序列比对算法的性能和效率,提出一种异构处理平台下可移植的大规模生物序列比对算法及其优化方法.通过改变原有Smith-Waterman算法的计算流程和数据依赖关系,增加序列比对的并行性;通过改变存储器布局后使用向量数据类型,提高全局存储器的带宽利用率;通过增加偏移量改变存储器模块的映射方式,避免模块访问冲突,提高局部存储器的使用效率.实验结果表明,优化后的生物序列比对性能提升了近100倍.  相似文献   

9.
生物信息学是以计算机为工具对生物信息进行储存、检索和分析的科学。序列比对是生物信息学中的一个基本问题,设计快速而有效的序列比对算法是生物信息学研究的一个重要内容,通过序列比较可以发现生物序列中的功能、结构和进化的信息,序列比较的基本操作是比对。本文介绍了序列比对算法的发展现状,描述了常用的各类序列比对算法,并分析了它们的优劣。  相似文献   

10.
11.
刘维  陈崚 《计算机应用》2006,26(6):1422-1424
求生物序列的最长公共子串是生物信息学中最重要的问题之一,提出了该问题的一个快速算法,可对所有初始同字符对并行地寻找其后继同字符对,并记录下相应层次值。最后通过最大层次值回溯得到比对结果。此外,该算法采用了剪枝技术,对于明显不能得出最优比对的同字符将中止其后继的搜索。实验结果证明,本文算法比其他算法速度快、精确度高。  相似文献   

12.
A platform for biological sequence comparison on parallel computers   总被引:1,自引:0,他引:1  
We have written two programs for searching biological sequence databases that run on Intel hypercube computers. PSCANLIB compares a single sequence against a sequence library, and PCOMPLIB compares all the entries in one sequence library against a second library. The programs provide a general framework for similarity searching; they include functions for reading in query sequences, search parameters and library entries, and reporting the results of a search. We have isolated the code for the specific function that calculates the similarity score between the query and library sequence; alternative searching algorithms can be implemented by editing two files. We have implemented the rapid FASTA sequence comparison algorithm and the more rigorous Smith-Waterman algorithm within this framework. The PSCANLIB program on a 16 node iPSC/2 80386-based hypercube can compare a 229 amino acid protein sequence with a 3.4 million residue sequence library in approximately 16 s with the FASTA algorithm. Using the Smith-Waterman algorithm, the same search takes 35 min. The PCOMPLIB program can compare a 0.8 million amino acid protein sequence library with itself in 5.3 min with FASTA on a third-generation 32 node Intel iPSC/860 hypercube.  相似文献   

13.
生物信息学双序列比对算法加速器设计与实现   总被引:2,自引:0,他引:2       下载免费PDF全文
双序列比对算法是进行生物信息学研究的基础算法。在FPGA上实现大规模脉动式阵列对双序列比对算法进行加速能够大幅度提高比对的效率。然而现有的设计方法在比对序列长度较短的情况下,处理单元利用率很低;在序列的长度较大时,需要占用大量的片内存储资源。通过将两条序列同时送入阵列进行比对减少比对时间。将比对数据送入外部存储器,优化比对过程中的数据存储调度,有效降低了对片内存储器的需求。以Smith-Waterman算法为例进行了实现验证,结果表明本设计在性能上优于传统设计。与Pentium42.60GHz通用微处理器计算机相比,使用加速器对长度为65536的序列进行比对可获得1555倍的加速比。  相似文献   

14.
To fully utilize all available information in protein structure prediction, including both backbone and side-chain structures, we present a novel algorithm for solving a generalized threading problem. In this problem, we consider simultaneously backbone threading and side-chain packing during the process of a protein structure prediction. For a given query protein sequence and a template structure, our goal is to find a threading alignment between the query sequence and the template structure, along with a rotamer assignment for each side-chain of the query protein, which optimizes an energy function that combines a backbone threading energy and a side-chain packing energy. This highly computationally challenging problem is solved through first formulating this problem as a graph-based optimization problem. Various graph-theoretic techniques are employed to achieve the computational efficiency to make our algorithm practically useful, which takes advantage of a number of special properties of the graph representing this generalized threading problem. The overall framework of our algorithm is a dynamic programming algorithm implemented on an optimal tree decomposition of the graph representation of our problem. By using various additional heuristic techniques such as the dead-end elimination, we have demonstrated that our algorithm can solve a generalized threading problem within practically acceptable amount of time and space, the first of its kind.  相似文献   

15.
Sequence alignment is a fundamental operation for homology search in bioinformatics. For two DNA or protein sequences of length m and n, full-matrix (FM), dynamic programming alignment algorithms such as Needleman-Wunsch and Smith-Waterman take O(m × n) time and use a possibly prohibitive O(m × n) space. Hirschberg's algorithm reduces the space requirements to O(min(m, n)), but requires approximately twice the number of operations required by the FM algorithms. The Fast Linear-Space Alignment (FastLSA) algorithm adapts to the amount of space available by trading space for operations. FastLSA can effectively adapt to use either linear or quadratic space, depending on the specific machine. Our experiments show that, in practice, due to memory caching effects, FastLSA is always as fast or faster than the Hirschberg and FM algorithms. To improve the performance of FastLSA further, we have parallelized it using a simple but effective form of wavefront parallelism. Our experimental results show that Parallel FastLSA exhibits good speedups, almost linear for eight processors or less, and also that the efficiency of Parallel FastLSA increases with the size of the sequences that are aligned. Consequently, parallel and sequential FastLSA can be flexibly and effectively used with high performance in situations where space and the number of parallel processors can vary greatly.  相似文献   

16.
MegaBlast is one of the most important programs in NCBI BLAST (Basic Local Alignment Search Tool) toolkits, tIowever, MegaBlast is computation and I/O intensive. It consumes a great deal of memory which is proportional to the size of the query sequences set and subject (database) sequences set of product. This paper proposes a new strategy for optimizing MegaBlast. The new strategy exchanges the query and subject sequences sets, and builds a hash table based on new subject sequences. It overlaps I/O with computation, shortens the overall time and reduces the cost of memory, since the memory here is only proportional to the size of subject sequences set. The optimized algorithm is suitable to be parallelized in cluster systems. The parallel algorithm uses query segmentation method. As our experiments shown, the parallel program which is implemented with MPI has fine scalability.  相似文献   

17.
提出一种基于Dijkstra算法的序列比对方法,该算法主要用于求最短路径,而序列比对可以转化为在有向无环图中寻找最短路径问题。对于少量序列比对,使用该算法可以求出最优解。对于多序列比对,可将在N维空间求解最短路径问题转化为在二维空间求解最短路径。该算法可以简化问题复杂度,能求得相对最优解。  相似文献   

18.
In this paper, we have developed a HiTi (Hierarchical MulTi) graph model for structuring large topographical road maps to speed up the minimum cost route computation. The HiTi graph model provides a novel approach to abstracting and structuring a topographical road map in a hierarchical fashion. We propose a new shortest path algorithm named SPAH, which utilizes HiTi graph model of a topographical road map for its computation. We give the proof for the optimality of SPAH. Our performance analysis of SPAH on grid graphs showed that it significantly reduces the search space over existing methods. We also present an in-depth experimental analysis of HiTi graph method by comparing it with other similar works on grid graphs. Within the HiTi graph framework, we also propose a parallel shortest path algorithm named ISPAH. Experimental results show that inter query shortest path problem provides more opportunity for scalable parallelism than the intra query shortest path problem.  相似文献   

19.
基于GPGPU的生物序列快速比对   总被引:1,自引:0,他引:1       下载免费PDF全文
在CPU-GPU异构平台下,提出一种高效的生物序列比对方案。该方案利用GPU的并行处理能力,通过对读延迟、写延迟、重组函数及数据传输进行优化,在OpenCL框架下重构Smith-Waterman算法,加快生物序列比对速度。实验结果证明,与CPU上传统的串行算法相比,该算法最高可获得约100倍的性能提升。  相似文献   

20.
生物数据库中的查询是在生物序列数据集中查找与输入查询序列相似的目标,目前的一些流行工具如BLAST等,是利用启发式算法来提高查询的速度。然而,这些启发式算法无法找到所有的满足要求的结果,而一些精确算法,如动态规划算法,却需要非常高昂的代价。最近,一种新的技术,QASIS,提出了在后缀树的遍历中使用动态规划的精确查找算法,其性能与BLAST相当。但是它的主要缺点就是后缀树这种索引结构需要巨大的空间开销。本文采用基于无损压缩的块排序结构来索引超常的生物序列,减小索引的存储空间开销,有效地减少动态规划算法的计算代价。实验结果表明基于块排序索引的算法在性能方面优于OASIS算法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号