首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
In the bioinformatics community, it is really important to find an accurate and simultaneous alignment among diverse biological sequences which are assumed to have an evolutionary relationship. From the alignment, the sequences homology is inferred and the shared evolutionary origins among the sequences are extracted by using phylogenetic analysis. This problem is known as the multiple sequence alignment (MSA) problem. In the literature, several approaches have been proposed to solve the MSA problem, such as progressive alignments methods, consistency-based algorithms, or genetic algorithms (GAs). In this work, we propose a Hybrid Multiobjective Evolutionary Algorithm based on the behaviour of honey bees for solving the MSA problem, the hybrid multiobjective artificial bee colony (HMOABC) algorithm. HMOABC considers two objective functions with the aim of preserving the quality and consistency of the alignment: the weighted sum-of-pairs function with affine gap penalties (WSP) and the number of totally conserved (TC) columns score. In order to assess the accuracy of HMOABC, we have used the BAliBASE benchmark (version 3.0), which according to the developers presents more challenging test cases representing the real problems encountered when aligning large sets of complex sequences. Our multiobjective approach has been compared with 13 well-known methods in bioinformatics field and with other 6 evolutionary algorithms published in the literature.  相似文献   

2.
Over the past several decades, biologists have conducted numerous studies examining both general and specific functions of proteins. Generally, if similarities in either the structure or sequence of amino acids exist for two proteins, then a common biological function is expected. Protein function is determined primarily based on the structure rather than the sequence of amino acids. The algorithm for protein structure alignment is an essential tool for the research. The quality of the algorithm depends on the quality of the similarity measure that is used, and the similarity measure is an objective function used to determine the best alignment. However, none of existing similarity measures became golden standard because of their individual strength and weakness. They require excessive filtering to find a single alignment. In this paper, we introduce a new strategy that finds not a single alignment, but multiple alignments with di?erent lengths. This method has obvious benefits of high quality alignment. However, this novel method leads to a new problem that the running time for this method is considerably longer than that for methods that find only a single alignment. To address this problem, we propose algorithms that can locate a common region (CORE) of multiple alignment candidates, and can then extend the CORE into multiple alignments. Because the CORE can be defined from a final alignment, we introduce CORE* that is similar to CORE and propose an algorithm to identify the CORE*. By adopting CORE* and dynamic programming, our proposed method produces multiple alignments of various lengths with higher accuracy than previous methods. In the experiments, the alignments identified by our algorithm are longer than those obtained by TM-align by 17% and 15.48%, on average, when the comparison is conducted at the level of super-family and fold, respectively.  相似文献   

3.
Multiple sequence alignment is an important tool in molecular sequence analysis. This paper presents genetic algorithms to solve multiple sequence alignments. Several data sets are tested and the experimental results are compared with other methods. We find our approach could obtain good performance in the data sets with high similarity and long sequences.The software can be found in http://rsdb.csie.ncu.edu.tw/tools/msa.htm.  相似文献   

4.
To have efficient data mining systems, we need powerful algorithms to extract and mine the data. In the case of genomes data mining system, the algorithms search for genomes/proteins that share similar properties. Proteins that have a significant biological relationship to one another often share only isolated regions of sequence similarity. When identifying relationships of this nature, the ability to find local regions of optimal similarity is advantageous over global alignments that optimize the overall alignment of two entire sequences. The paper describes a new method for genome sequence comparison. This algorithm can be used in a genomes data mining system. It provides a good theoretical improvement in accuracy with a modest sacrifice in speed as compared to the most commonly used alternatives. The method is based on the popular progressive approach, the dot plot method, but avoids the most serious pitfalls caused by the greedy nature of this technique. The new approach pre-processes a data set of all pair-wise alignments between the sequences. This provides a library of alignment information that can be used to guide the comparison. The algorithm is based on the similar segment method, i.e. having n similar identities in window of size L. The paper presents some results about the termination and correctness of the algorithm and how to include this algorithm into other comparison algorithms. The paper introduces the mechanism to create random sequences. These data will be our main benchmarks for comparing our algorithms.  相似文献   

5.
The multiple alignment of the sequences of DNA and proteins is applicable to various important fields in molecular biology. Although the approach based on Dynamic Programming is well-known for this problem, it requires enormous time and space to obtain the optimal alignment. On the other hand, this problem corresponds to the shortest path problem and the A* algorithm, which can efficiently find the shortest path with an estimator, is usable.

First, this paper directly applies the A* algorithm to multiple sequence alignment problem with more powerful estimator in more than two-dimensional case and discusses the extensions of this approach utilizing an upper bound of the shortest path length and of modification of network structure. The algorithm to provide the upper bound is also proposed in this paper. The basic part of these results was originally shown in Ikeda and Imai [11]. This part is similar to the branch-and-bound techniques implemented in MSA program in Gupta et al. [6]. Our framework is based on the edge length transformation to reduce the problem to the shortest path problem, which is more suitable to generalizations to enumerating suboptimal alignments and parametric analysis as done in Shibuya and Imai [15–17]. By this enhanced A* algorithm, optimal multiple alignments of several long sequences can be computed in practice, which is shown by computational results.

Second, this paper proposes a k-group alignment algorithm for multiple alignment as a practical method for much larger-size problem of, say multiple alignments of 50–100 sequences. A basic part of these results were originally presented in Imai and Ikeda [13]. In existing iterative improvement methods for multiple alignment, the so-called group-to-group two-dimensional dynamic programming has been used, and in this respect our proposal is to extend the ordinary two-group dynamic programming to a k-group alignment programming. This extension is conceptually straightforward, and here our contribution is to demonstrate that the k-group alignment can be implemented so as to run in a reasonable time and space under standard computing environments. This is established by generalizing the above A* search approach. The k-group alignment method can be directly incorporated in existing methods such as iterative improvement algorithms [2, 5] and tree-based (iterative) algorithms [9]. This paper performs computational experiments by applying the k-group method to iterative improvement algorithms, and shows that our approach can find better alignments in reasonable time. For example, through larger-scale computational experiments here, 34 protein sequences with very high homology can be optimally 10-group aligned, and 64 sequences with high homology can be optimally 5-group aligned.  相似文献   


6.
This paper proposes a novel approach which uses a multi-objective evolutionary algorithm based on decomposition to address the ontology alignment optimization problem. Comparing with the approach based on Genetic Algorithm (GA), our method can simultaneously optimize three goals (maximizing the alignment recall, the alignment precision and the f-measure). The experimental results shows that our approach is able to provide various alignments in one execution which are less biased to one of the evaluations of the alignment quality than GA approach, thus the quality of alignments are obviously better than or equal to those given by the approach based on GA which considers precision, recall and f-measure only, and other multi-objective evolutionary approach such as NSGA-II approach. In addition, the performance of our approach outperforms NSGA-II approach with the average improvement equal to 32.79  \(\%\) . Through the comparison of the quality of the alignments obtained by our approach with those by the state of the art ontology matching systems, we draw the conclusion that our approach is more effective and efficient.  相似文献   

7.
This paper describes three weighting schemes for improving the accuracy of progressive multiple sequence alignment methods: (1) global profile pre-processing, to capture for each sequence information about other sequences in a profile before the actual multiple alignment takes place; (2) local pre-processing; which incorporates a new protocol to only use non-overlapping local sequence regions to construct the pre-processed profiles; and (3) local-global alignment, a weighting scheme based on the double dynamic programming (DDP) technique to softly bias global alignment to local sequence motifs. The first two schemes allow the compilation of residue-specific multiple alignment reliability indices, which can be used in an iterative fashion. The schemes have been implemented with associated iterative modes in the PRALINE multiple sequence alignment method, and have been evaluated using the BAliBASE benchmark alignment database. These tests indicate that PRALINE is a toolbox able to build alignments with very high quality. We found that local profile pre-processing raises the alignment quality by 5.5% compared to PRALINE alignments generated under default conditions. Iteration enhances the quality by a further percentage point. The implications of multiple alignment scoring functions and iteration in relation to alignment quality and benchmarking are discussed.  相似文献   

8.
A method of multiple sequence alignment is described based on the double dynamic programming (DDP) algorithm previously used for treating structural constraints encountered in structure comparison and threading. Following these applications, the inconsistencies that emerge when trying to combine pair-wise alignments into a multiple alignment are reconciled by summing all the, possibly inconsistent, paths (low-level alignments) into a matrix which is then used to provide a final (high-level) alignment. This process is applied to all sequence pairs and the pair-wise results combined in a simple multiple sequence alignment program. From this alignment, further constraints are selected to bias the low-level alignments in the DDP algorithm and the process iterated. The results, however, showed that this overall iteration was not needed and one-pass gave results at least as good as the 'standard' progressive method of multiple sequence alignment. Further applications of the method are discussed.  相似文献   

9.
Video indexing requires the efficient segmentation of video into scenes. The video is first segmented into shots and a set of key-frames is extracted for each shot. Typical scene detection algorithms incorporate time distance in a shot similarity metric. In the method we propose, to overcome the difficulty of having prior knowledge of the scene duration, the shots are clustered into groups based only on their visual similarity and a label is assigned to each shot according to the group that it belongs to. Then, a sequence alignment algorithm is applied to detect when the pattern of shot labels changes, providing the final scene segmentation result. In this way shot similarity is computed based only on visual features, while ordering of shots is taken into account during sequence alignment. To cluster the shots into groups we propose an improved spectral clustering method that both estimates the number of clusters and employs the fast global k-means algorithm in the clustering stage after the eigenvector computation of the similarity matrix. The same spectral clustering method is applied to extract the key-frames of each shot and numerical experiments indicate that the content of each shot is efficiently summarized using the method we propose herein. Experiments on TV-series and movies also indicate that the proposed scene detection method accurately detects most of the scene boundaries while preserving a good tradeoff between recall and precision.  相似文献   

10.
序列比对是生物信息学中基本的信息处理方法,对于发现生物序列中的功能、结构和进化信息具有重要的意义。该文对典型的双序列比对算法Smith-Waterman、FASTA、BLAST以及多序列比对算法CLUSTAL进行了描述和评价;针对目前序列比对算法普遍存在的不足,简单介绍了应用KDD技术进行序列相似性发现的定义及其处理过程。  相似文献   

11.
《Applied Soft Computing》2007,7(3):1121-1130
We describe a new method for pairwise nucleic acid sequence alignment that can also be used for pattern searching and tandem repeat searching within a nucleic acid sequence. The method is broadly a hybrid algorithm employing ant colony optimization (ACO) and the simple genetic algorithm. The method first employs ACO to obtain a set of alignments, which are then further processed by an elitist genetic algorithm, which employs primitive selection and a novel multipoint crossover-mutation operator to generate accurate alignments. The resulting alignments show a fair amount of accuracy for smaller and medium size sequences. Furthermore, this algorithm can be used rather quickly and efficiently for aligning shorter sequences and also for pattern searching in both nucleic acid and amino acid sequences. Furthermore, it can be used as an effective local alignment method or as a global alignment tool. On improvement of accuracy, this method can be extended for use towards multiple sequence alignment.  相似文献   

12.
基于最大权值路径算法的DNA多序列比对方法   总被引:1,自引:0,他引:1  
霍红卫  肖智伟 《软件学报》2007,18(2):185-195
针对生物序列分析中的多序列比对问题,当输入数据量比较大时,人们提出了很多启发式的算法来改善计算速度和比对结果.提出了用于进行全局DNA多序列比对的一种方法:MWPAlign(maximum weighted path alignment).该算法把序列信息用de Bruijn图的形式表示,并将输入序列的信息记录在图的边上,这样,就将求调和序列的问题转化为求图的最大权值路径问题,使多序列比对问题的时间复杂度降低到几乎线性.实验结果显示:MWPAlign是可行的多序列比对算法,尤其对于变异率低于5.2%的大量序列数据,相对于CLUSTALW(cluster alignments weight),T-Coffee和HMMT(hidden Markov model training)有较好的比对结果和运算性能.  相似文献   

13.
Genomic alignments, as a means to uncover evolutionary relationships among organisms, are a fundamental tool in computational biology. There is considerable recent interest in using the Cell Broadband Engine, a heterogeneous multicore chip that provides high performance, for biological applications. However, work in genomic alignments so far has been limited to computing optimal alignment scores using quadratic space for the basic global/local alignment problem. In this paper, we present a comprehensive study of developing alignment algorithms on the Cell, exploiting its thread and data level parallelism features. First, we develop a parallel implementation on the Cell that computes optimal alignments and adopts Hirschberg's linear space technique. The former is essential, as merely computing optimal alignment scores is not useful, while the latter is needed to permit alignments of longer sequences. We then present Cell implementations of two advanced alignment techniques-spliced alignments and syntenic alignments. Spliced alignments are useful in aligning mRNA sequences with corresponding genomic sequences to uncover the gene structure. Syntenic alignments are used to discover conserved exons and other sequences between long genomic sequences from different organisms. We present experimental results for these three types of alignments on 16 Synergistic Processing Elements of the IBM QS20 dual-Cell blade system.  相似文献   

14.
Approximation algorithms for tree alignment with a given phylogeny   总被引:3,自引:0,他引:3  
We study the following fundamental problem in computational molecular biology: Given a set of DNA sequences representing some species and a phylogenetic tree depicting the ancestral relationship among these species, compute an optimal alignment of the sequences by the means of constructing a minimum-cost evolutionary tree. The problem is an important variant of multiple sequence alignment, and is widely known astree alignment. We design an efficient approximation algorithm with performance ratio 2 for tree alignment. The algorithm is then extended to a polynomial-time approximation scheme. The construction actually works for Steiner trees in any metric space, and thus implies a polynomial-time approximation scheme for planar Steiner trees under a given topology (with any constant degree). To our knowledge, this is the first polynomial-time approximation scheme in the fields of computational biology and Steiner trees. The approximation algorithms may be useful in evolutionary genetics practice as they can provide a good initial alignment for the iterative method in [23].Supported in part by NSERC Operating Grant OGP0046613.Supported in part by NSERC Operating Grant OGP0046613 and a Canadian Genome Analysis and Technology Research Grant.Supported in part by US Department of Energy Grant DE-FG03-90ER6099.  相似文献   

15.
Multiple sequence alignment, known as NP-complete problem, is among the most important and challenging tasks in computational biology. For multiple sequence alignment, it is difficult to solve this type of problems directly and always results in exponential complexity. In this paper, we present a novel algorithm of genetic algorithm with ant colony optimization for multiple sequence alignment. The proposed GA-ACO algorithm is to enhance the performance of genetic algorithm (GA) by incorporating local search, ant colony optimization (ACO), for multiple sequence alignment. In the proposed GA-ACO algorithm, genetic algorithm is conducted to provide the diversity of alignments. Thereafter, ant colony optimization is performed to move out of local optima. From simulation results, it is shown that the proposed GA-ACO algorithm has superior performance when compared to other existing algorithms.  相似文献   

16.
The computation of the optimal phonetic alignment andthe phonetic similarity between wordsis an important step in many applications in computational phonology,including dialectometry.After discussing several related algorithms,I present a novel approach to the problem that employsa scoring scheme for computing phonetic similarity between phonetic segmentson the basis of multivalued articulatory phonetic features.The scheme incorporates the key concept of feature salience,which is necessary to properly balance the importance of various features.The new algorithm combines several techniquesdeveloped for sequence comparison:an extended set of edit operations,local and semiglobal modes of alignment,and the capability of retrieving a set of near-optimal alignments.On a set of 82 cognate pairs,it performs better than comparable algorithms reported in the literature.  相似文献   

17.
一种多搜索策略的多生物序列比对自适应遗传算法   总被引:1,自引:0,他引:1  
多生物序列比对是用来计算生物序列间相似性的重要工具,本文在引入熵来度量种群多样性的基础上,提出了一种多搜索策略的自适应遗传算法,其交叉和变异概率随着熵的变化进行自动调整,并且综合考虑了利用动态规划算法来设计遗传操作算子.实验结果表明,这个算法具有较强的全局搜索能力和局部搜索能力,并且能有效的克服未成熟收敛问题.  相似文献   

18.
This paper challenges the issue of automatic matching between two image sets with similar intrinsic structures and different appearances, especially when there is no prior correspondence. An unsupervised manifold alignment framework is proposed to establish correspondence between data sets by a mapping function in the mutual embedding space. We introduce a local similarity metric based on parameterized distance curves to represent the connection of one point with the rest of the manifold. A small set of valid feature pairs can be found without manual interactions by matching the distance curve of one manifold with the curve cluster of the other manifold. To avoid potential confusions in image matching, we propose an extended affine transformation to solve the nonrigid alignment in the embedding space. The comparatively tight alignments and the structure preservation can be obtained simultaneously. The point pairs with the minimum distance after alignment are viewed as the matchings. We apply manifold alignment to image set matching problems. The correspondence between image sets of different poses, illuminations, and identities can be established effectively by our approach.  相似文献   

19.
Complex physical phenomena can be usually split into several interacting physical computational models and can be numerically simulated by coupling parallel codes individually designed for these models. Besides rational splitting and efficient numerical methods for different models, we must design scalable parallel algorithms to concatenate these parallel codes. Meanwhile, three objectives should be well balanced. The first is how to efficiently transfer data among multiple physical models, the second is how to inherit original scalability of parallel codes and then ensure good scalability of full simulation, and the third is how to ensure independent or simultaneous developments of codes by different research groups. This paper presents two concatenation algorithms for parallel numerical simulation of radiation hydrodynamics coupled with neutron transport on unstructured grid. The first, Full Loose Concatenation Algorithm, focuses on independent development and inheritance of original scalability, and the second, Two Level Compact Concatenation Algorithm, focuses on optimal tradeoff among above three objectives. Theoretical analysis for communicational complexity and parallel numerical experiments using hundreds of processors on two parallel machines have shown that these two algorithms are efficient and can be generalized to other parallel numerical simulations for hydrodynamics coupled with radiation or neutron transport. In particular, the second algorithm is linearly scalable and has achieved theoretical optimal performance.  相似文献   

20.
3D model alignment is a fundamental step in many shape analysis processes, and various algorithms have been proposed to solve this problem. However, to the best of our knowledge, they are effective only on specific categories of models. Therefore, we present a novel framework that can align general categories by combining different features together. In order to align given groups of models, multiple features are evaluated first in this framework, according to three types of quantified characteristics, i.e., the intensity, the uniqueness and the consistency. Then, the quantified characteristics are combined into scores and a data-driven model is learned to predict the alignment errors according to the scores. Finally, the features with the minimum predicted alignment errors are selected to align the given groups. Experimental results show that our framework can generate consistent alignments on general categories, which are much better than those generated using single features.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号