首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Indexing and retrieval for genomic databases   总被引:2,自引:0,他引:2  
Genomic sequence databases are widely used by molecular biologists for homology searching. Amino acid and nucleotide databases are increasing in size exponentially, and mean sequence lengths are also increasing. In searching such databases, it is desirable to use heuristics to perform computationally intensive local alignments on selected sequences and to reduce the costs of the alignments that are attempted. We present an index-based approach for both selecting sequences that display broad similarity to a query and for fast local alignment. We show experimentally that the indexed approach results in significant savings in computationally intensive local alignments and that index-based searching is as accurate as existing exhaustive search schemes  相似文献   

2.
Multiple sequence alignment is of central importance to bioinformatics and computational biology. Although a large number of algorithms for computing a multiple sequence alignment have been designed, the efficient computation of highly accurate and statistically significant multiple alignments is still a challenge. In this paper, we propose an efficient method by using multi-objective genetic algorithm (MSAGMOGA) to discover optimal alignments with affine gap in multiple sequence data. The main advantage of our approach is that a large number of tradeoff (i.e., non-dominated) alignments can be obtained by a single run with respect to conflicting objectives: affine gap penalty minimization and similarity and support maximization. To the best of our knowledge, this is the first effort with three objectives in this direction. The proposed method can be applied to any data set with a sequential character. Furthermore, it allows any choice of similarity measures for finding alignments. By analyzing the obtained optimal alignments, the decision maker can understand the tradeoff between the objectives. We compared our method with the three well-known multiple sequence alignment methods, MUSCLE, SAGA and MSA-GA. As the first of them is a progressive method, and the other two are based on evolutionary algorithms. Experiments on the BAliBASE 2.0 database were conducted and the results confirm that MSAGMOGA obtains the results with better accuracy statistical significance compared with the three well-known methods in aligning multiple sequence alignment with affine gap. The proposed method also finds solutions faster than the other evolutionary approaches mentioned above.  相似文献   

3.
AliMac is an implementation of a sensitive sequence alignment algorithm on a parallel computer. The method achieves reliable alignments for very distantly related sequences from a combined use of amino acid exchange weights and physicochemical characteristics. The algorithm is computing intensive and its usage on conventional computers is limited to a relatively small number of sequences. The parallel implementation uses a Macintosh IIcx host computer and 21 transputers and achieves 22 times the speed of a VAX 8650 at a fraction of the cost. This paper describes the AliMac hardware and software and discusses problems and peculiarities of parallel implementations, especially with transputers. Finally, several popular sequence alignment algorithms are compared in their ability to detect distantly related sequences in searching large databases.  相似文献   

4.
Multiple sequence alignment, known as NP-complete problem, is among the most important and challenging tasks in computational biology. For multiple sequence alignment, it is difficult to solve this type of problems directly and always results in exponential complexity. In this paper, we present a novel algorithm of genetic algorithm with ant colony optimization for multiple sequence alignment. The proposed GA-ACO algorithm is to enhance the performance of genetic algorithm (GA) by incorporating local search, ant colony optimization (ACO), for multiple sequence alignment. In the proposed GA-ACO algorithm, genetic algorithm is conducted to provide the diversity of alignments. Thereafter, ant colony optimization is performed to move out of local optima. From simulation results, it is shown that the proposed GA-ACO algorithm has superior performance when compared to other existing algorithms.  相似文献   

5.
To have efficient data mining systems, we need powerful algorithms to extract and mine the data. In the case of genomes data mining system, the algorithms search for genomes/proteins that share similar properties. Proteins that have a significant biological relationship to one another often share only isolated regions of sequence similarity. When identifying relationships of this nature, the ability to find local regions of optimal similarity is advantageous over global alignments that optimize the overall alignment of two entire sequences. The paper describes a new method for genome sequence comparison. This algorithm can be used in a genomes data mining system. It provides a good theoretical improvement in accuracy with a modest sacrifice in speed as compared to the most commonly used alternatives. The method is based on the popular progressive approach, the dot plot method, but avoids the most serious pitfalls caused by the greedy nature of this technique. The new approach pre-processes a data set of all pair-wise alignments between the sequences. This provides a library of alignment information that can be used to guide the comparison. The algorithm is based on the similar segment method, i.e. having n similar identities in window of size L. The paper presents some results about the termination and correctness of the algorithm and how to include this algorithm into other comparison algorithms. The paper introduces the mechanism to create random sequences. These data will be our main benchmarks for comparing our algorithms.  相似文献   

6.
基于动态规划算法的人脸比对   总被引:1,自引:1,他引:0  
动态规划算法可以有效地用来进行序列的比对,能够给出序列之间的最优比对结果,论文将其用在人脸识别的一些关键特征的比对方面,给出了人脸之间相似程度的一种度量,同时给出了具体的算法,可以有效地应用于人脸的比对和其它进一步的人脸识别中的应用。  相似文献   

7.
Over the past several decades, biologists have conducted numerous studies examining both general and specific functions of proteins. Generally, if similarities in either the structure or sequence of amino acids exist for two proteins, then a common biological function is expected. Protein function is determined primarily based on the structure rather than the sequence of amino acids. The algorithm for protein structure alignment is an essential tool for the research. The quality of the algorithm depends on the quality of the similarity measure that is used, and the similarity measure is an objective function used to determine the best alignment. However, none of existing similarity measures became golden standard because of their individual strength and weakness. They require excessive filtering to find a single alignment. In this paper, we introduce a new strategy that finds not a single alignment, but multiple alignments with di?erent lengths. This method has obvious benefits of high quality alignment. However, this novel method leads to a new problem that the running time for this method is considerably longer than that for methods that find only a single alignment. To address this problem, we propose algorithms that can locate a common region (CORE) of multiple alignment candidates, and can then extend the CORE into multiple alignments. Because the CORE can be defined from a final alignment, we introduce CORE* that is similar to CORE and propose an algorithm to identify the CORE*. By adopting CORE* and dynamic programming, our proposed method produces multiple alignments of various lengths with higher accuracy than previous methods. In the experiments, the alignments identified by our algorithm are longer than those obtained by TM-align by 17% and 15.48%, on average, when the comparison is conducted at the level of super-family and fold, respectively.  相似文献   

8.
Existing methods for getting the locally best matched alignments between a pair of biological sequences require O(N2) computational steps and O(N2) storage, where N is the average sequence length. An improved method is presented with which the storage requirement is greatly reduced, while the computational steps remain O(N2). Only a small number of additional steps are required to display any common sub-sequences with similarity scores greater than a given threshold. The aligments found by the algorithm are optimal in the sense that their scores are locally maximal, where each score is a sum of weights given to individual matches/replacements, insertions and deletions involved in the alignment. The algorithm was implemented in C programming language on a personal computer. Data area of 64 kbytes on random access memory and a few hundred kbytes on a disk is sufficient for comparing two protein or nucleic acid sequences of 2500 residues. The programs are particularly valuable when used in combination with fast sequence search programs.  相似文献   

9.
DNA Translator and Aligner are molecular phylogenetics HyperCard stacks for Macintosh computers. They manipulate sequence data to provide graphical gene mapping, conversions, translations and manual multiple-sequence alignment editing. DNA Translator is able to convert documented GenBank or EMBL documented sequences into linearized, rescalable gene maps whose gene sequences are extractable by clicking on the corresponding map button or by selection from a scrolling list. Provided gene maps, complete with extractable sequences, consist of nine metazoan, one yeast, and one ciliate mitochondrial DNAs and three green plant chloroplast DNAs. Single or multiple sequences can be manipulated to aid in phylogenetic analysis. Sequences can be translated between nucleic acids and proteins in either direction with flexible support of alternate genetic codes and ambiguous nucleotide symbols. Multiple aligned sequence output from diverse sources can be converted to Nexus, Hennig86 or PHYLIP format for subsequent phylogenetic analysis. Input or output alignments can be examined with Aligner, a convenient accessory stack included in the DNA Translator package. Aligner is an editor for the manual alignment of up to 100 sequences that toggles between display of matched characters and normal unmatched sequences. DNA Translator also generates graphic displays of amino acid coding and codon usage frequency relative to all other, or only synonymous, codons for approximately 70 select organism-organelle combinations. Codon usage data is compatible with spreadsheet or UWGCG formats for incorporation of additional molecules of interest. The complete package is available via anonymous ftp and is free for non-commercial uses.  相似文献   

10.
Computer programs that can be used for the design of synthetic genes and that are run on an Apple Macintosh computer are described. These programs determine nucleic acid sequences encoding amino acid sequences. They select DNA sequences based on codon usage as specified by the user, and determine the placement of base changes that can be used to create restriction enzyme sites without altering the amino acid sequence. A new algorithm for finding restriction sites by translating the restriction endonuclease target sequence in all three reading frames and then searching the given peptide or protein amino acid sequence with these short restriction enzyme peptide sequences is described. Examples are given for the creation of synthetic DNA sequences for the bovine prethrombin-2 and ribonuclease A genes.  相似文献   

11.
基于最大权值路径算法的DNA多序列比对方法   总被引:1,自引:0,他引:1  
霍红卫  肖智伟 《软件学报》2007,18(2):185-195
针对生物序列分析中的多序列比对问题,当输入数据量比较大时,人们提出了很多启发式的算法来改善计算速度和比对结果.提出了用于进行全局DNA多序列比对的一种方法:MWPAlign(maximum weighted path alignment).该算法把序列信息用de Bruijn图的形式表示,并将输入序列的信息记录在图的边上,这样,就将求调和序列的问题转化为求图的最大权值路径问题,使多序列比对问题的时间复杂度降低到几乎线性.实验结果显示:MWPAlign是可行的多序列比对算法,尤其对于变异率低于5.2%的大量序列数据,相对于CLUSTALW(cluster alignments weight),T-Coffee和HMMT(hidden Markov model training)有较好的比对结果和运算性能.  相似文献   

12.
This paper describes three weighting schemes for improving the accuracy of progressive multiple sequence alignment methods: (1) global profile pre-processing, to capture for each sequence information about other sequences in a profile before the actual multiple alignment takes place; (2) local pre-processing; which incorporates a new protocol to only use non-overlapping local sequence regions to construct the pre-processed profiles; and (3) local-global alignment, a weighting scheme based on the double dynamic programming (DDP) technique to softly bias global alignment to local sequence motifs. The first two schemes allow the compilation of residue-specific multiple alignment reliability indices, which can be used in an iterative fashion. The schemes have been implemented with associated iterative modes in the PRALINE multiple sequence alignment method, and have been evaluated using the BAliBASE benchmark alignment database. These tests indicate that PRALINE is a toolbox able to build alignments with very high quality. We found that local profile pre-processing raises the alignment quality by 5.5% compared to PRALINE alignments generated under default conditions. Iteration enhances the quality by a further percentage point. The implications of multiple alignment scoring functions and iteration in relation to alignment quality and benchmarking are discussed.  相似文献   

13.
A method of multiple sequence alignment is described based on the double dynamic programming (DDP) algorithm previously used for treating structural constraints encountered in structure comparison and threading. Following these applications, the inconsistencies that emerge when trying to combine pair-wise alignments into a multiple alignment are reconciled by summing all the, possibly inconsistent, paths (low-level alignments) into a matrix which is then used to provide a final (high-level) alignment. This process is applied to all sequence pairs and the pair-wise results combined in a simple multiple sequence alignment program. From this alignment, further constraints are selected to bias the low-level alignments in the DDP algorithm and the process iterated. The results, however, showed that this overall iteration was not needed and one-pass gave results at least as good as the 'standard' progressive method of multiple sequence alignment. Further applications of the method are discussed.  相似文献   

14.
The multiple alignment of the sequences of DNA and proteins is applicable to various important fields in molecular biology. Although the approach based on Dynamic Programming is well-known for this problem, it requires enormous time and space to obtain the optimal alignment. On the other hand, this problem corresponds to the shortest path problem and the A* algorithm, which can efficiently find the shortest path with an estimator, is usable.

First, this paper directly applies the A* algorithm to multiple sequence alignment problem with more powerful estimator in more than two-dimensional case and discusses the extensions of this approach utilizing an upper bound of the shortest path length and of modification of network structure. The algorithm to provide the upper bound is also proposed in this paper. The basic part of these results was originally shown in Ikeda and Imai [11]. This part is similar to the branch-and-bound techniques implemented in MSA program in Gupta et al. [6]. Our framework is based on the edge length transformation to reduce the problem to the shortest path problem, which is more suitable to generalizations to enumerating suboptimal alignments and parametric analysis as done in Shibuya and Imai [15–17]. By this enhanced A* algorithm, optimal multiple alignments of several long sequences can be computed in practice, which is shown by computational results.

Second, this paper proposes a k-group alignment algorithm for multiple alignment as a practical method for much larger-size problem of, say multiple alignments of 50–100 sequences. A basic part of these results were originally presented in Imai and Ikeda [13]. In existing iterative improvement methods for multiple alignment, the so-called group-to-group two-dimensional dynamic programming has been used, and in this respect our proposal is to extend the ordinary two-group dynamic programming to a k-group alignment programming. This extension is conceptually straightforward, and here our contribution is to demonstrate that the k-group alignment can be implemented so as to run in a reasonable time and space under standard computing environments. This is established by generalizing the above A* search approach. The k-group alignment method can be directly incorporated in existing methods such as iterative improvement algorithms [2, 5] and tree-based (iterative) algorithms [9]. This paper performs computational experiments by applying the k-group method to iterative improvement algorithms, and shows that our approach can find better alignments in reasonable time. For example, through larger-scale computational experiments here, 34 protein sequences with very high homology can be optimally 10-group aligned, and 64 sequences with high homology can be optimally 5-group aligned.  相似文献   


15.
Optimizing railway alignments is a quite complex and time-consuming engineering problem. The huge continuous search space, complex constraints, implicit objective function and infinite potential alternatives of this problem pose many challenges. Especially in mountainous regions, finding a near-optimal alignment for extremely complex terrain and constraints is a most arduous task, which cannot be solved satisfactorily with most existing methods. In this study, a stepwise & hybrid particle swarm-genetic algorithm is developed for railway alignment optimization in mountainous regions. It is a continuous search method suitable for railway alignment design. A stepwise horizontal–vertical–integral approach which defines the horizontal and vertical alignments as two kinds of particles, is proposed to solve the three-dimensional railway alignment optimization problem. To enhance the initial diversity and momentum, butterfly-shaped areas are preset on a path generated with a bidirectional distance transform for initializing horizontal particles. For the solution method, specific genetic operators, including roulette wheel selection, four crossovers and two mutations are integrated into the stepwise particle swarm method to address parameter-dependent performance and avoid premature convergence. In addition, a cubic polynomial weight update strategy is employed for thoroughly searching the problem space. This synthesis method has been applied to a real-world case in a very mountainous region. The detailed data analyses demonstrate that it can offer more promising solutions compared with alternatives designed by experienced designers and those generated with a genetic algorithm or non-stepwise particle swarm algorithm.  相似文献   

16.
An algorithm that allows rapid searching of nucleic acid sequences based on pregenerated index files is described. The programs and index files for searching the entire EMBL nucleotide sequence collection are being distributed on the EMBL Data Library's CD-ROM.  相似文献   

17.
We present the first space and time optimal parallel algorithm for the pairwise sequence alignment problem, a fundamental problem in computational biology. This problem can be solved sequentially in O(mn) time and O(m+n) space, where m and n are the lengths of the sequences to be aligned. The fastest known parallel space-optimal algorithm for pairwise sequence alignment takes optimal O(m+n/p) space, but suboptimal O((m+n)/sup 2//p) time, where p is the number of processors. On the other hand, the most space economical time-optimal parallel algorithm takes O(mn/p) time, but O(m+n/p) space. We close this gap by presenting an algorithm that achieves both time and space optimality, i.e. requires only O((m+n)/p) space and O(mn/p) time. We also present an experimental evaluation of the proposed algorithm on an IBM xSeries cluster. Although presented in the context of full sequence alignments, our algorithm is applicable to other alignment problems in computational biology including local alignments and syntenic alignments. It is also a useful addition to the range of techniques available for parallel dynamic programming.  相似文献   

18.
This paper presents a novel two-stage hybrid swarm intelligence optimization algorithm called GA–PSO–ACO algorithm that combines the evolution ideas of the genetic algorithms, particle swarm optimization and ant colony optimization based on the compensation for solving the traveling salesman problem. In the proposed hybrid algorithm, the whole process is divided into two stages. In the first stage, we make use of the randomicity, rapidity and wholeness of the genetic algorithms and particle swarm optimization to obtain a series of sub-optimal solutions (rough searching) to adjust the initial allocation of pheromone in the ACO. In the second stage, we make use of these advantages of the parallel, positive feedback and high accuracy of solution to implement solving of whole problem (detailed searching). To verify the effectiveness and efficiency of the proposed hybrid algorithm, various scale benchmark problems from TSPLIB are tested to demonstrate the potential of the proposed two-stage hybrid swarm intelligence optimization algorithm. The simulation examples demonstrate that the GA–PSO–ACO algorithm can greatly improve the computing efficiency for solving the TSP and outperforms the Tabu Search, genetic algorithms, particle swarm optimization, ant colony optimization, PS–ACO and other methods in solution quality. And the experimental results demonstrate that convergence is faster and better when the scale of TSP increases.  相似文献   

19.
《Computers & chemistry》1993,17(2):219-227
A neural network classification method has been developed as an alternative approach to the search/organization problem of large molecular databases. Two artificial neural systems have been implemented on a Cray for rapid protein/nucleic acid classification of unknown sequences. The system employs a n-gram hashing function for sequence encoding and modular back-propagation networks for classification. The protein system, which classifies proteins into PIR (Protein Identification Resource) superfamilies, has achieved 82–100% sensitivity at a speed that is about an order of magnitude faster than other search methods. The pilot nucleic acid system showed a 91–97% classification accuracy. The software tool could be used as a filter program to reduce the database search time and help organize the molecular sequence databases. The tool is generally applicable to any databases that are organized according to family relationships.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号