首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Given a strong match between regions of two sequences, how far can the match be meaningfully extended if gaps are allowed in the resulting alignment? The aim is to avoid searching beyond the point that a useful extension of the alignment is likely to be found. Without loss of generality, we can restrict attention to the suffixes of the sequences that follow the strong match, which leads to the following formal problem. Given two sequences and a fixed X > 0, align initial portions of the sequences subject to the constraint that no section of the alignment scores below -X. Our results indicate that computing an optimal alignment under this constraint is very expensive. However, less rigorous conditions on the alignment can be guaranteed by quite efficient algorithms. One of these variants has been implemented in a new release of the Blast suite of database search programs.  相似文献   

2.
MOTIVATION: Optimal sequence alignment based on the Smith-Waterman algorithm is usually too computationally demanding to be practical for searching large sequence databases. Heuristic programs like FASTA and BLAST have been developed which run much faster, but at the expense of sensitivity. RESULTS: In an effort to approximate the sensitivity of an optimal alignment algorithm, a new algorithm has been devised for the computation of a gapped alignment of two sequences. After scanning for high-scoring words and extensions of these to form fragments of similarity, the algorithm uses dynamic programming to build an accurate alignment based on the fragments initially identified. The algorithm has been implemented in a program called SALSA and the performance has been evaluated on a set of test sequences. The sensitivity was found to be close to the Smith-Waterman algorithm, while the speed was similar to FASTA (ktup = 2). AVAILABILITY: Searches can be performed from the SALSA homepage at http://dna.uio.no/salsa/ using a wide range of databases. Source code and precompiled executables are also available. CONTACT: torbjorn.rognes@labmed.uio.no  相似文献   

3.
MOTIVATION: We have previously reported an algorithm for discovering patterns conserved in sets of related unaligned protein sequences. The algorithm was implemented in a program called Pratt. Pratt allows the user to define a class of patterns (e.g. the degree of ambiguity allowed and the length and number of gaps), and is then guaranteed to find the conserved patterns in this class scoring highest according to a defined fitness measure. In many cases, this version of Pratt was very efficient, but in other cases it was too time consuming to be applied. Hence, a more efficient algorithm was needed. RESULTS: In this paper, we describe a new and improved searching strategy that has two main advantages over the old strategy. First, it allows for easier integration with programs for multiple sequence alignment and data base search. Secondly, it makes it possible to use branch-and-bound search, and heuristics, to speed up the search. The new search strategy has been implemented in a new version of the Pratt program.  相似文献   

4.
Computer simulations of simple exact lattice models are an aid in the study of protein folding process; they have sometimes resulted in predictions experimentally proved. The contact interactions (CI) method is here proposed as a new algorithm for the conformational search in the low-energy regions of protein chains modeled as copolymers of hydrophobic and polar monomers configured as self-avoiding walks on square or cubic lattices. It may be regarded as an extension of the standard Monte Carlo method improved by the concept of cooperativity deriving from nonlocal contact interactions. A major difference with respect to other algorithms is that criteria for the acceptance of new conformations generated during the simulations are not based on the energy of the entire molecule, but cooling factors associated with each residue define regions of the model protein with higher or lower mobility. Nine sequences of length ranging from 20 to 64 residues were used on the square lattice and 15 sequences of length ranging from 46 to 136 residues were used on the cubic lattice. The CI algorithm proved very efficient both in two and three dimensions, and allowed us to localize energy minima not localized by other searching algorithms described in the literature. Use of this algorithm is not limited to the conformational search, because it allows the exploration of thermodynamic and kinetic behavior of model protein chains.  相似文献   

5.
6.
Resources for construction activities are limited in the real construction world. To avoid the waste and shortage of resources on a construction jobsite, scheduling must include resource allocation. A multicriteria computational optimal scheduling model, which integrates the time∕cost trade-off model, resource-limited model, and resource leveling model, is proposed. A searching technique using genetic algorithms (GAs) is adopted in the model. Furthermore, the nondominated solutions are found by the multiple attribute decision-making method, technique for order preference by similarity to ideal solution. The model can effectively provide the optimal combination of construction durations, resource amounts, minimum direct project costs, and minimum project duration under the constraint of limited resources.  相似文献   

7.
MOTIVATION: In order to increase the accuracy of multiple sequence alignments, we designed a new strategy for optimizing multiple sequence alignments by genetic algorithm. We named it COFFEE (Consistency based Objective Function For alignmEnt Evaluation). The COFFEE score reflects the level of consistency between a multiple sequence alignment and a library containing pairwise alignments of the same sequences. RESULTS: We show that multiple sequence alignments can be optimized for their COFFEE score with the genetic algorithm package SAGA. The COFFEE function is tested on 11 test cases made of structural alignments extracted from 3D_ali. These alignments are compared to those produced using five alternative methods. Results indicate that COFFEE outperforms the other methods when the level of identity between the sequences is low. Accuracy is evaluated by comparison with the structural alignments used as references. We also show that the COFFEE score can be used as a reliability index on multiple sequence alignments. Finally, we show that given a library of structure-based pairwise sequence alignments extracted from FSSP, SAGA can produce high-quality multiple sequence alignments. The main advantage of COFFEE is its flexibility. With COFFEE, any method suitable for making pairwise alignments can be extended to making multiple alignments. AVAILABILITY: The package is available along with the test cases through the WWW: http://www. ebi.ac.uk/cedric CONTACT: cedric.notredame@ebi.ac.uk  相似文献   

8.
The multiple sequence alignment problem is applicable and important in various fields in molecular biology such as the prediction of three-dimensional structures of proteins and the inference of phylogenetic trees. However, the optimal alignment based on the scoring criterion is not always biologically the most significant alignment. We here propose two flexible and efficient approaches to solve this problem. One approach is to provide many suboptimal alignments as alternatives for the optimal one. It has been considered almost impossible to investigate such suboptimal alignments of more than two sequences because of the enormous size of the problem. We propose techniques for enumeration of suboptimal alignments using the Eppstein algorithm. We also discuss what kind of suboptimal alignment is unnecessary to enumerate and propose an efficient enumeration algorithm to enumerate only necessary alignments. The other approach is parametric analysis. The obtained optimal solution with fixed parameters such as gap penalties is not always the biologically best alignment. Thus, it is required to vary parameters and check how the optimal alignments change. The way to vary parameters has been studied well on the problem of two sequences, but not on the multiple alignment problem because of the difficulty of computing the optimal solution. We propose techniques for this parametric multiple alignment problem and examine the features of alignments obtained by various parametric analyses. For both approaches, this paper performs experiments on various groups of actual protein sequences and examines the efficiency of these algorithms and properties of sequence groups.  相似文献   

9.
This paper presents a new algorithm for the design of layout geometry of looped water distribution networks based on rectilinear grids. The algorithm is an evolution program based on genetic algorithms. It incorporates new methods for generating the initial population and performing the operations of crossover and mutation. The new methods overcome the problem of generating infeasible solutions that result when the commonly used genetic algorithm methods operate on solutions using the chosen coding scheme. This paper includes the results of tests that measure the effectiveness and computational effort of the new methods and a demonstration of the algorithm through application to an example problem.  相似文献   

10.
The development of new techniques in sequencing nuclei acids has produced a great amount of sequence data and has led to the discovery of new relationships. In this paper, we study a method for parallelizing the algorithm WORDUP, which detects the presence of statistically significant patterns in DNA sequences. WORDUP implements an efficient method to identify the presence of statistically significant oligomers in a non-homologous group of sequences. It is based on a modified version of the Boyer-Moore algorithm, which is one of the fastest algorithms for string matching available in the literature. The aim of the parallel version of WORDUP presented here is to speed up the computational time and allow the analysis of a greater set of longer nucleotide sequences, which is usually impractical with sequential algorithms.  相似文献   

11.
12.
Molecular biology is becoming a computationally intense realm of contemporary science and faces some of the current grand scientific challenges. In its context, tools that identify, store, compare and analyze effectively large and growing numbers of bio-sequences are found of increasingly crucial importance. Biosequences are routinely compared or aligned, in a variety of ways, to infer common ancestry, to detect functional equivalence, or simply while searching for similar entries in a database. A considerable body of knowledge has accumulated on sequence alignment during the past few decades. Without pretending to be exhaustive, this paper attempts a survey of some criteria of wide use in sequence alignment and comparison problems, and of the corresponding solutions. The paper is based on presentations and literature given at the Workshop on Sequence Alignment held at Princeton, N.J., in November 1994, as part of the DIMACS Special Year on Mathematical Support for Molecular Biology.  相似文献   

13.
The main objective of this paper is to investigate efficiency and correctness of different real-coded genetic algorithms and identification criteria in nonlinear system identification within the framework of non-classical identification techniques. Two conventional genetic algorithms have been used, standard genetic algorithm and microgenetic algorithm. Moreover, an advanced multispecies genetic algorithm has been proposed: it combines an adaptive rebirth operator, a migration strategy, and a search space reduction technique. Initially, a critical analysis has been conducted on these soft computing strategies to provide some guidelines for similar engineering and physical applications. Therefore, the hysteretic Bouc-Wen model has been numerically investigated to achieve three main results. First, the computational effectiveness and accuracy of the proposed strategy are checked to show that the proposed optimizer outperforms the aforementioned conventional genetic algorithms. Secondarily, a comparative study is performed to show that an improved performance can be obtained by using the Hilbert transform-based acceleration envelope as objective function in the optimization problem (instead of the pure acceleration response). Finally, system identification is conducted by making use of the proposed optimizer to verify its substantial noise-insensitive property also in the presence of high noise-to-signal ratio.  相似文献   

14.
This paper proposes a methodology for the optimal design of water distribution systems based on genetic algorithms. The objective of the optimization is to minimize the capital cost, subject to ensuring adequate pressures at all nodes during peak demands. The proposed method is novel in that it involves the use of a pipe index vector to control the genetic algorithm search. The pipe index vector is a measure of the relative importance of pipes in a network in terms of their impact on the hydraulic performance of the network. By using the pipe index vector it is possible to exclude regions of the search space where impractical and infeasible solutions exist. By reducing the search space it is possible to generate feasible solutions more quickly and hence process much healthier populations than would be the case in a standard genetic algorithm. This results in optimal solutions being found in a fewer number of generations resulting in a substantial saving in terms of computational time. The method has been tested on several networks, including networks used for benchmark testing least cost design algorithms, and has been shown to be efficient and robust.  相似文献   

15.
Inverse problems that are constrained by large-scale partial differential equation (PDE) systems demand very large computational resources. Solutions to these problems generally require the solution of a large number of complex PDE systems. Three-dimensional groundwater inverse problems fall under this category. In this paper, we describe the implementation of a parallel simulation-optimization framework for solving PDE-based inverse problems and demonstrate it for the solution of groundwater contaminant source release history reconstruction problems that are of practical importance. The optimization component employs several optimization algorithms, including genetic algorithms (GAs) and several local search (LS) approaches that can be used in a hybrid mode. This hybrid GA-LS optimizer is used to drive a parallel finite-element (FEM) groundwater forward transport simulator. Parallelism is exploited within the transport simulator (fine grained parallelism) as well as the optimizer (coarse grained parallelism) through the exclusive use of the Message Passing Interface (MPI) communication library. Algorithmic and parallel performance results are presented for an IBM SP3 supercomputer. Simulation and performance results presented in this paper illustrate that an effective combination of efficient optimization algorithms and parallel computing can enable solution to three-dimensional groundwater inverse problems of a size and complexity not attempted before.  相似文献   

16.
17.
A new computational method (chimeric alignment) has been developed to detect chimeric 16S rRNA artifacts generated during PCR amplification from mixed bacterial populations. In contrast to other nearest-neighbor methods (e.g., CHECK_CHIMERA) that define sequence similarity by k-tuple matching, the chimeric alignment method uses the score from dynamic programming alignments. Further, the chimeric alignments are displayed to the user to assist in sequence classification. The distribution of improvement scores for 500 authentic, nonchimeric sequences and 300 artificial chimeras (constructed from authentic sequences) was used to study the sensitivity and accuracy of both chimeric alignment and CHECK_CHIMERA. At a constant rate of authentic sequence misclassification (5%), chimeric alignment incorrectly classified 13% of the artificial chimeras versus 14% for CHECK_CHIMERA. Interestingly, only 1% of nonchimeras and 10% of chimeras were misclassified by both programs, suggesting that optimum performance is obtained by using the two methods to assign sequences to three classes: high-probability nonchimeras, high-probability chimeras, and sequences that need further study by other means. This study suggests that k-tuple-based matching methods are more sensitive than alignment-based methods when there is significant parental sequence similarity, while the opposite becomes true as the sequences become more distantly related. The software and a World Wide Web-based server are available at http://www-hto.usc.edu/software/mglobal CHI.  相似文献   

18.
Irrigation Scheduling with Genetic Algorithms   总被引:1,自引:0,他引:1  
A typical irrigation scheduling problem is one of preparing a schedule to service a group of outlets that may be serviced simultaneously. This problem has an analogy with the classical multimachine earliness/tardiness scheduling problem in operations research (OR). In previously published work, integer programming was used to solve irrigation scheduling problems; however, such scheduling problems belong to a class of combinatorial optimization problems known to be computationally demanding. This is widely reported in OR literature. Hence integer programs (IPs) can be used only to solve relatively small problems typically in a research environment where considerable computational resources and time can be allocated to solve a single schedule. For practical applications, metaheuristics such as genetic algorithms, simulated annealing, or tabu search methods need to be used. However, these need to be formulated carefully and tested thoroughly. The current research explores the potential of genetic algorithms to solve the simultaneous irrigation scheduling problem. For this purpose, two models are presented: the stream tube model and the time block model. These are formulated as genetic algorithms, which are then tested extensively, and the solution quality is compared with solutions from an IP. The suitability of these models for the simultaneous irrigation scheduling problem is reported.  相似文献   

19.
A method is described for searching protein sequence databases using tandem mass spectra of tryptic peptides. The approach uses a de novo sequencing algorithm to derive a short list of possible sequence candidates which serve as query sequences in a subsequent homology-based database search routine. The sequencing algorithm employs a graph theory approach similar to previously described sequencing programs. In addition, amino acid composition, peptide sequence tags and incomplete or ambiguous Edman sequence data can be used to aid in the sequence determinations. Although sequencing of peptides from tandem mass spectra is possible, one of the frequently encountered difficulties is that several alternative sequences can be deduced from one spectrum. Most of the alternative sequences, however, are sufficiently similar for a homology-based sequence database search to be possible. Unfortunately, the available protein sequence database search algorithms (e.g. Blast or FASTA) require a single unambiguous sequence as input. Here we describe how the publicly available FASTA computer program was modified in order to search protein databases more effectively in spite of the ambiguities intrinsic in de novo peptide sequencing algorithms.  相似文献   

20.
Genetic algorithms have been shown to be very effective optimization tools for a number of engineering problems. Since the genetic processes typically operate independent of the actual problem, a core genetic algorithm library consisting of all the genetic operators having an interface to a generic objective function can serve as a very useful tool for learning as well as for solving a number of practical optimization problems. This paper discusses the object-oriented design and implementation of such a core library. Object-oriented design, apart from giving a more natural representation of information, also facilitates better memory management and code reusability. Next, it is shown how classes derived from the implemented libraries can be used for the practical size optimization of large space trusses, where several constructibility aspects have been incorporated to simulate real-world design constraints. Strategies are discussed to model the chromosome and to code genetic operators to handle such constraints. Strategies are also suggested for member grouping for reducing the problem size and implementing move-limit concepts for reducing the search space adaptively in a phased manner. The implemented libraries are tested on a number of large previously fabricated space trusses, and the results are compared with previously reported values. It is concluded that genetic algorithms implemented using efficient and flexible data structures can serve as a very useful tool in engineering design and optimization.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号