首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
Over the past several decades, biologists have conducted numerous studies examining both general and specific functions of proteins. Generally, if similarities in either the structure or sequence of amino acids exist for two proteins, then a common biological function is expected. Protein function is determined primarily based on the structure rather than the sequence of amino acids. The algorithm for protein structure alignment is an essential tool for the research. The quality of the algorithm depends on the quality of the similarity measure that is used, and the similarity measure is an objective function used to determine the best alignment. However, none of existing similarity measures became golden standard because of their individual strength and weakness. They require excessive filtering to find a single alignment. In this paper, we introduce a new strategy that finds not a single alignment, but multiple alignments with di?erent lengths. This method has obvious benefits of high quality alignment. However, this novel method leads to a new problem that the running time for this method is considerably longer than that for methods that find only a single alignment. To address this problem, we propose algorithms that can locate a common region (CORE) of multiple alignment candidates, and can then extend the CORE into multiple alignments. Because the CORE can be defined from a final alignment, we introduce CORE* that is similar to CORE and propose an algorithm to identify the CORE*. By adopting CORE* and dynamic programming, our proposed method produces multiple alignments of various lengths with higher accuracy than previous methods. In the experiments, the alignments identified by our algorithm are longer than those obtained by TM-align by 17% and 15.48%, on average, when the comparison is conducted at the level of super-family and fold, respectively.  相似文献   

2.
Protein folding problem is one of the most prominent problems of bioinformatics. In this paper, we study a three-dimensional off-lattice protein AB model with two species of monomers, hydrophobic and hydrophilic, and present a heuristic quasi-physical algorithm. By elaborately simulating the movement of the smooth elastic balls in the physical world, the algorithm finds low-energy configurations for a given monomer chain. A subsequent "off-trap" strategy is proposed to trigger a jump for a stuck situation in order to get out of local minima. The methods have been tested in the off-lattice AB model. The computational results show promising performance. For all sequences with 13 to 55 monomers, the algorithm finds states with lower energy than previously proposed putative ground states. Furthermore, for the sequences with 21, 34 and 55 monomers, new putative ground states are found, which are different from those given in present literature.  相似文献   

3.
4.
求解圆形packing问题的拟人退火算法   总被引:2,自引:2,他引:2  
张德富  李新 《自动化学报》2005,31(4):590-595
Circles packing problem is an NP-hard problem and is difficult to solve. In this paper, a hybrid search strategy for circles packing problem is discussed. A way of generating new configuration is presented by simulating the moving of elastic objects, which can avoid the blindness of simulated annealing search and make iteration process converge fast. Inspired by the life experiences of people, an effective personified strategy to jump out of local minima is given. Based on the simulated annealing idea and personification strategy, an effective personified annealing algorithm for circles packing problem is developed. Numerical experiments on benchmark problem instances show that the proposed algorithm outperforms the best algorithm in the literature.  相似文献   

5.
In this paper,we propose a new arc consistency algorithm,AC-8,which requires less computation time and space than AC-6 and AC-7.The main idea of the optimization is the divide-and-conquer strategy,thereby decomposing an arc consistency problem into a series of smaller ones and trying to solve them in sequence. In this way,not only the space complexity but also the time complexity can be reduced.The reason for this is that due to the ahead of time performed inconsistency propagation(in the semse that some of them are executed before the entire inconsistency checking has been finished),each constraint subnetwork will be searched with a gradually shrunk domain.In addition,the technique of AC-6 can be integrated into our algorithm,leading to a further decrease in computational complexity.  相似文献   

6.
Efficient task scheduling is critical to achieving high performance on grid computing environment. The task scheduling on grid is studied as optimization problem in this paper. A heuristic task scheduling algorithm satisfying resources load balancing on grid environment is presented. The algorithm schedules tasks by employing mean load based on task predictive execution time as heuristic information to obtain an initial scheduling strategy. Then an optimal scheduling strategy is achieved by selecting two machines satisfying condition to change their loads via reassigning their tasks under the heuristic of their mean load. Methods of selecting machines and tasks are given in this paper to increase the throughput of the system and reduce the total waiting time. The efficiency of the algorithm is analyzed and the performance of the proposed algorithm is evaluated via extensive simulation experiments. Experimental results show that the heuristic algorithm performs significantly to ensure high load balancing and achieve an optimal scheduling strategy almost all the time. Furthermore, results show that our algorithm is high efficient in terms of time complexity.  相似文献   

7.
Sequence comparison leads to a combinatorial optimization problem of sorting permutations by reversals and transpositions.Namely,given any two permutations,find the shortest distance between them.This problem is related with genome rearrangement,genes are oriented in DNA sequences.The transpositions which have been studied in the liteature can be viewed as operations working on two consecutive segments of the genome.In this paper,a new kind of transposition which can work on two arbitrary segments of the genome is proposed,and the sorting of signed permutations by reversals and this new kind of transpostitions are studied.After establishing a lower bound on the number of operations needed,a 2-approximation algorithm is presented for this problem and an example is given to show that the performance ratio of the algorithm cannot be improved.  相似文献   

8.
The intersection of N halfplanes is a basic problem in computational geometry and computer graphics,The optimal offline algorithm for this problem runs in time O(N log N).In this paper,an optimal online algorithm which runs also in time O(N log N) for this problem is presented.The main idea of the algorithm is to give a new definition for the left side of a given line,to assign the order for the opoints of a convex polygon.and then to use binary search method in an ordered vertex set.The data structure used in the algorithm is no more complex than array.  相似文献   

9.
王冰  席裕庚 《自动化学报》2006,32(5):667-673
This paper discusses the single-machine rescheduling problem with efficiency and stability as criteria, where more than one disruption arises in large-scale dynamic circumstances. Partial rescheduling (PR) strategy is adopted after each disruption and a rolling mechanism is driven by events in response to disruptions. Two kinds of objective functions are designed respectively for PR sub-problem involving in the interim and the terminal of unfinished jobs. The analytical result demonstrates that each local objective is consistent with the global one. Extensive computational experiment was performed and the computational results show that the rolling PR strategy with dual objectives can greatly improve schedule stability with little sacrifice in efficiency and provide a reasonable trade-off between solution quality and computational efforts.  相似文献   

10.
This paper discusses the single-machine rescheduling problem with efficiency and stability as criteria, where more than one disruption arises in large-scale dynamic circumstances. Partial rescheduling (PR) strategy is adopted after each disruption and a rolling mechanism is driven by events in response to disruptions. Two kinds of objective functions are designed respectively for PR sub-problem involving in the interim and the terminal of unfinished jobs. The analytical result demonstrates that each local objective is consistent with the global one. Extensive computational experiment was performed and the computational results show that the rolling PR strategy with dual objectives can greatly improve schedule stability with little sacrifice in efficiency and provide a reasonable trade-off between solution quality and computational efforts.  相似文献   

11.
Biological sequence (e.g. DNA sequence) can be treated as strings over some fixed alphabet of characters (a, c, t and g)[1]. Sequence alignment is a procedure of comparing two or more sequences by searching for a series of individual characters that are in the same order in the sequences. Two-sequence alignment, pair-wise alignment, is a way of stacking one sequence above the other and matching characters from the two sequences thaat lie in the same column: identical characters are placed in …  相似文献   

12.
This paper presents a method for classifying a large and mixed set of uncharacterized sequences provided by genome projects. As the measure of sequence similarity, we use similarity score computed by a method based on the dynamic programming (DP), such as the Smith-Waterman local alignment algorithm. Although comparison by DP based method is very sensitive, when given sequences include a family of sequences that are much diverged in evolutionary process, similarity among some of them may be hidden behind spurious similarity of some unrelated sequences. Also the distance derived from the similarity score may not be metric (i.e., triangle inequality may not hold) when some sequences have multi-domain structure. To cope with these problems, we introduce a new graph structure called p-quasi complete graph for describing a family of sequences with a confidence measure. We prove that a restricted version of the p-quasi complete graph problem (given a positive integer k, whether a graph contains a 0.5-quasi complete subgraph of which size k or not) is NP-complete. Thus we present an approximation algorithm for classifying a set of sequences using p-quasi complete subgraphs. The effectiveness of our method is demonstrated by the result of classifying over 4000 protein sequences on the Escherichia coli genome that was completely determined recently.  相似文献   

13.
In this paper we address the problem of recognising embedded activities within continuous spatial sequences obtained from an online video tracking system. Traditionally, continuous data streams such as video tracking data are buffered with a sliding window applied to the buffered data stream for activity detection. We introduce an algorithm based on Smith-Waterman (SW) local alignment from the field of bioinformatics that can locate and accurately quantify embedded activities within a windowed sequence. The modified SW approach utilises dynamic programming with two dimensional spatial data to quantify sequence similarity and is capable of recognising sequences containing gaps and significant amounts of noise. A more efficient SW formulation for online recognition, called Online SW (OSW), is also developed. Through experimentation we show that the OSW algorithm can accurately and robustly recognise manually segmented activity sequences as well as embedded sequences from an online tracking system. To benchmark the classification performance of OSW we compare the approach to dynamic time warping (DTW) and the discrete hidden Markov model (HMM). Results demonstrate that OSW produces higher precision and recall than both DTW and the HMM in an online recognition context. With accurately segmented sequences the SW approach produces results comparable to DTW and superior to the HMM. Finally, we confirm the robust property of the SW approach by evaluating it with sequences containing artificially incorporated noise.  相似文献   

14.
Parallelization of a local similarity algorithm.   总被引:1,自引:0,他引:1  
The local similarity problem is to determine the similar regions within two given sequences. We recently developed a dynamic programming algorithm for the local similarity problem that requires only space proportional to the sum of the two sequence lengths, whereas earlier methods use space proportional to the product of the lengths. In this paper, we describe how to parallelize the new algorithm and present results of experimental studies on an Intel hypercube. The parallel method provides rapid, high-resolution alignments for users of our software toolkit for pairwise sequence comparison, as illustrated here by a comparison of the chloroplast genomes of tobacco and liverwort.  相似文献   

15.
导向定位测序(GPS)是一种全基因组DNA甲基化检测的新测序技术,产生的测序数据具有成本低、没有序列偏好等优势.目前,甲基化分析中最重要的一步是将其测序产生的序列比对到参考基因组上.但是,现有导向定位测序的方法使用Smith-Waterman进行局部序列比对,时间消耗过大且容易对序列比对位置产生误判.因此,提出一种导向定位测序数据的改进比对算法,该算法利用其双端测序的优势,先用甲基化序列端数据进行序列比对,对多位置匹配的序列再利用常规数据端数据进行比对位置确定.实验结果表明:本文方法和现有方法的准确率相当,而具有更高的唯一比对比率,时间性能有3倍以上的提升.  相似文献   

16.
随着生物序列数据库中序列数据的激增, 开发兼有高度生物敏感性和高效率的算法显得极为迫切. 通过对生物序列比对问题中Needleman-Wunsch算法和Smith-Waterman算法深入分析, 提出了Smith-Waterman算法的改进算法, 并通过实验验证该算法, 对改进前后的运行性能进行比较分析. 实验证明, 改进后的算法实现了双序列局部最优解个数的优化, 有效降低了生物序列比对算法时间与空间的复杂性, 提高序列比对的得分率和准确率.  相似文献   

17.
Smith-Waterman算法OpenMP并行化   总被引:1,自引:0,他引:1  
基因比对可以实现对诲量生物信息的分析和处理,其中Smith—Waterman算法实现的比对信息精确度较高,但是处理速度慢。本文利用共享存储编程的工业标准OpenMPX;ySmith-Waterman算法进行了并行化实现。在一个拥有四个双核CPU的SMP节点上的测试表明,共享并行化使得该局部比对算法的速度提高了40%。  相似文献   

18.
序列比对是生物信息学中基本的信息处理方法,对于发现生物序列中的功能、结构和进化信息具有重要的意义。该文对典型的双序列比对算法Smith-Waterman、FASTA、BLAST以及多序列比对算法CLUSTAL进行了描述和评价;针对目前序列比对算法普遍存在的不足,简单介绍了应用KDD技术进行序列相似性发现的定义及其处理过程。  相似文献   

19.
在介绍生物信息学中多序列比对定义和原理的基础上,给出了序列结构信息集的表示形式和基于序列结构信息的度量函数,该函数只与参加比对序列自身信息有关,不受主观因素的影响,能更客观、有效地反映生物序列之间的进化距离.通过利用该函数计算序列间的进化距离,在渐进比对的基础上,采用迭代策略,不断修正指导树,进而提高比对的准确性,避免了局部最优问题.最后,通过实验模拟,本算法在保证不提高计算时间复杂度的基础上,提高了序列比对的准确性,同时也很好地反映了生物学意义.  相似文献   

20.
有效分析蛋白质家族是生物信息学的一项重要挑战,聚类成为解决这一问题的主要途径之一.基于传统序列比对方法定义蛋白质序列间相似关系时,假设了同源片断问的邻接保守性,与遗传重组相冲突.为更好地识别蛋白质家族,提出了一种蛋白质序列家族挖掘算法ProFaM.ProFaM首先采用前缀投影策略挖掘表征蛋白质序列的模式,然后基于模式及其权重信息构造相似度度量函数,并采用共享最近邻方法,实现了蛋白质序列家族聚类.解决了以往方法在蛋白质模式挖掘及相似度设计中的不足.在蛋白质家族数据库Pfam上的实验结果证实了ProFaM算法在蛋白质家族分析上有良好的结果.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号