首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Natural proteins quickly fold into a complicated three-dimensional structure. Evolutionary algorithms have been used to predict the native structure with the lowest energy conformation of the primary sequence of a given protein. Successful structure prediction requires a free energy function sufficiently close to the true potential for the native state, as well as a method for exploring the conformational space. Protein structure prediction is a challenging problem because current potential functions have limited accuracy and the conformational space is vast. In this work, we show an innovative approach to the protein folding (PF) problem based on an hybrid Immune Algorithm (IMMALG) and a quasi-Newton method starting from a population of promising protein conformations created by the global optimizer DIRECT. The new method has been tested on Met-Enkephelin peptide, which is a paradigmatic example of multiple–minima problem, 1POLY, 1ROP and the three helix protein 1BDC. DIRECT produces an initial population of promising candidate solutions within a potentially optimal rectangle for the funnel landscape of the PF problem. Hence, IMMALG starts from a population of promising protein conformations created by the global optimizer DIRECT. The experimental results show that such a multistage approach is a competitive and effective search method in the conformational search space of real proteins, in terms of solution quality and computational cost comparing the results of the current state-of-art algorithms.  相似文献   

2.
The protein threading problem is the problem of determining the three-dimensional structure of a given but arbitrary protein sequence from a set of known structures of other proteins. This problem is known to be NP-hard and current computational approaches to threading are unrealistic for long proteins and/or large template data sets. In this paper, we propose an evolution strategy for the solution of the protein threading problem. We also propose three parallel methods for fast threading. Our experiments produced encouraging preliminary results in term of threading energy as well as significant reduction in threading time.  相似文献   

3.
《Computers & chemistry》1998,21(5):369-375
Six protein pairs, all with known 3D-structures, were used to evaluate different protein structure prediction tools. Firstly, alignments between a target sequence and a template sequence or structure were obtained by sequence alignment with QUANTA or by threading with THREADER, 123D and PHD Topits. Secondly, protein structure models were generated using MODELLER. The two protein structure assessment tools used were the root mean square deviation (RMSD) compared with the experimental target structure and the total 3D profile score. Also the accuracy of the active sites of models built in the absence and presence of ligands was investigated. Our study confirms that threading methods are able to yield more accurate models than comparative modelling in cases of low sequence identity (<30%). However, a gap of 2 Å(RMSD) exists between the theoretically best model and the models obtained by threading methods. For high sequence identities (>30%) comparative modelling using MODELLER resulted in accurate models. Furthermore, the total 3D profile score was not always able to distinguish correct from incorrect folds when different alignment methods were used. Finally, we found it to be important to include possible ligands in the model-building process in order to prevent unrealistic filling of active site areas.  相似文献   

4.
The protein structure code: what is its present status?   总被引:3,自引:0,他引:3  
Current methods of prediction of protein conformation are reviewed and the algorithms on which they rely are presented. For non-homologous proteins and after cross-validation the reported methods exhibit a probability index, i.e. the per cent of correctly predicted residues per predicted residues, of 63-65% with a standard deviation of the order of 7% for three conformational states--helix, beta-strand and coil. This present limitation in the accuracy of predictions that use only the information of the local sequence can be related essentially to the effect of long-range interactions specific for each protein family. The methods based on sequence similarity can improve the accuracy of prediction by expressing explicitly the homology of the protein to be predicted with proteins in the database. In these circumstances the probability index can reach 87% with a standard deviation of 6.6%. This property can be used for modeling homologous proteins by aiding in amino acid sequence alignments. The prediction of the tertiary structure of a protein is still limited to the case of modeling a structure based on the known three-dimensional structure of a homologous protein.  相似文献   

5.
A method of multiple sequence alignment is described based on the double dynamic programming (DDP) algorithm previously used for treating structural constraints encountered in structure comparison and threading. Following these applications, the inconsistencies that emerge when trying to combine pair-wise alignments into a multiple alignment are reconciled by summing all the, possibly inconsistent, paths (low-level alignments) into a matrix which is then used to provide a final (high-level) alignment. This process is applied to all sequence pairs and the pair-wise results combined in a simple multiple sequence alignment program. From this alignment, further constraints are selected to bias the low-level alignments in the DDP algorithm and the process iterated. The results, however, showed that this overall iteration was not needed and one-pass gave results at least as good as the 'standard' progressive method of multiple sequence alignment. Further applications of the method are discussed.  相似文献   

6.
7.
基于改进的禁忌搜索的蛋白质三维结构预测   总被引:4,自引:4,他引:0       下载免费PDF全文
禁忌搜索算法是一种局部搜索能力很强的全局迭代优化算法,已经被成功地应用到各种组合优化问题中。基于AB非格模型,该文将一种改进的禁忌搜索算法应用于蛋自质三维折叠结构预测。实验结果表明改进的禁忌算法求得的蛋白质三维最低能量构形的最低能量值比已有的算法求得的最低能量值要低,同时三维构形中形成了一个疏水核,被亲水残基包围,反映了真实蛋白质的结构特征。该算法效率高,可以有效地用于蛋白质三维折叠预测。  相似文献   

8.
Over the past several decades, biologists have conducted numerous studies examining both general and specific functions of proteins. Generally, if similarities in either the structure or sequence of amino acids exist for two proteins, then a common biological function is expected. Protein function is determined primarily based on the structure rather than the sequence of amino acids. The algorithm for protein structure alignment is an essential tool for the research. The quality of the algorithm depends on the quality of the similarity measure that is used, and the similarity measure is an objective function used to determine the best alignment. However, none of existing similarity measures became golden standard because of their individual strength and weakness. They require excessive filtering to find a single alignment. In this paper, we introduce a new strategy that finds not a single alignment, but multiple alignments with di?erent lengths. This method has obvious benefits of high quality alignment. However, this novel method leads to a new problem that the running time for this method is considerably longer than that for methods that find only a single alignment. To address this problem, we propose algorithms that can locate a common region (CORE) of multiple alignment candidates, and can then extend the CORE into multiple alignments. Because the CORE can be defined from a final alignment, we introduce CORE* that is similar to CORE and propose an algorithm to identify the CORE*. By adopting CORE* and dynamic programming, our proposed method produces multiple alignments of various lengths with higher accuracy than previous methods. In the experiments, the alignments identified by our algorithm are longer than those obtained by TM-align by 17% and 15.48%, on average, when the comparison is conducted at the level of super-family and fold, respectively.  相似文献   

9.
In this paper we describe the application of a so called Self-Generating Memetic Algorithm to the Maximum Contact Map Overlap problem (MAX-CMO). The maximum overlap of contact maps is emerging as a leading modeling technique to obtain structural alignment among pairs of protein structures. Identifying structural alignments (and hence similarity among proteins) is essential to the correct assessment of the relation between proteins structure and function. A robust methodology for structural comparison could have impact on the process of rational drug design.The Self-Generating Memetic Algorithm we present in this work evolves concurrently both the solutions (i.e. proteins alignments) and the local search move operators that it needs to solve the problem instance at hand. The concurrent generation of local search strategies and solutions allows the Memetic Algorithm to produce better results than those given by a Genetic Algorithm and a Memetic Algorithm with human-designed local searchers. The approach has been tried in four different data sets (1 data set composed of randomly generated proteins and the other 3 data sets with real world proteins) with encouraging results.  相似文献   

10.
A protein folding potential function ideally has several properties: it favors the native conformations for a number of protein sequences over a variety of nonnative folds; it can guide the search over conformations for the native state; it reflects changes in stability of the native fold due to changes in sequence; and it is relatively insensitive to small changes in conformation. While these are not mutually incompatible goals, attaining one property does not ensure that the others are satisfied. Examples are given of simple potentials having one property but lacking others. A new functional form of a folding potential is described where interactions between all nonhydrogen atoms are used to estimate interresidue interactions and implicit solvation. Its parameters can be adjusted to satisfy the above properties at least for barnase and a few other proteins.  相似文献   

11.
The determination of a protein's structure from the knowledge of its linear chain is one of the important problems that remains as a bottleneck in interpreting the rapidly increasing repository of genetic sequence data. One approach to this problem that has shown promise and given a measure of success is threading. In this approach contact energies between different amino acids are first determined by statistical methods applied to known structures. These contact energies are then applied to a sequence whose structure is to be determined by threading it through various known structures and determining the total threading energy for each candidate structure. That structure that yields the lowest total energy is then considered the leading candidate among all the structures tested. Additional information is often needed in order to support the results of threading studies, as it is well known in the field that the contact potentials used are not sufficiently sensitive to allow definitive conclusions. Here, we investigate the hypothesis that the environment of an amino acid residue realized as all those residues not local to it on the chain but sufficiently close spatially can supply information predictive of the type of that residue that is not adequately reflected in the individual contact energies. We present evidence that confirms this hypothesis and suggests a high order cooperativity between the residues that surround a given residue and how they interact with it. We suggest a possible application to threading.  相似文献   

12.
To utilize fully all available information in protein structure prediction, including both backbone and side-chain structures, we present a novel algorithm for solving a generalized threading problem. In this problem we consider simultaneous backbone threading and side-chain packing during the process of a protein structure prediction. For a given query protein sequence and a template structure, our goal is to find a threading alignment between the query sequence and the template structure, along with a rotamer assignment for each side-chain of the query protein, which optimizes an energy function that combines a backbone threading energy and a side-chain packing energy. This highly computationally challenging problem is solved through first formulating this problem as a graph-based optimization problem. Various graph-theoretic techniques are employed to achieve the computational efficiency to make our algorithm practically useful, which takes advantage of a number of special properties of the graph representing this generalized threading problem. The overall framework of our algorithm is a dynamic programming algorithm implemented on an optimal tree decomposition of the graph representation of our problem. By using various additional heuristic techniques such as dead-end elimination, we have demonstrated that our algorithm can solve a generalized threading problem within a practically acceptable amount of time and space, the first of its kind.  相似文献   

13.
To fully utilize all available information in protein structure prediction, including both backbone and side-chain structures, we present a novel algorithm for solving a generalized threading problem. In this problem, we consider simultaneously backbone threading and side-chain packing during the process of a protein structure prediction. For a given query protein sequence and a template structure, our goal is to find a threading alignment between the query sequence and the template structure, along with a rotamer assignment for each side-chain of the query protein, which optimizes an energy function that combines a backbone threading energy and a side-chain packing energy. This highly computationally challenging problem is solved through first formulating this problem as a graph-based optimization problem. Various graph-theoretic techniques are employed to achieve the computational efficiency to make our algorithm practically useful, which takes advantage of a number of special properties of the graph representing this generalized threading problem. The overall framework of our algorithm is a dynamic programming algorithm implemented on an optimal tree decomposition of the graph representation of our problem. By using various additional heuristic techniques such as the dead-end elimination, we have demonstrated that our algorithm can solve a generalized threading problem within practically acceptable amount of time and space, the first of its kind.  相似文献   

14.
Although an ordered 3D structure is generally considered to be anecessary pre-condition for protein functionality, there are disorderedcounter examples found to have biological activity. The objectives ofour data mining project are: (1) to generalize from the limitedset of counter examples and then apply this knowledge to large databases of amino acid sequence in order to estimate commonness ofdisordered protein regions in nature, and (2) to determine whether thereare different types of protein disorder. For general disorderestimation, a neural network based predictor was designed and tested ondata built from several public domain data banks through a nontrivialsearch, statistical analysis and data dimensionality reduction. Inaddition, predictors for identification of family-specific disorder weredeveloped by extracting knowledge from databases generated throughmultiple sequence alignments of a known disordered sequence to otherhighly related proteins. Family-specific predictors were also integratedto test quality of general protein disorder identification from suchhybrid prediction systems. Out-of-sample cross validation performance ofseveral predictors was computed first, followed by tests on an unrelateddatabase of proteins with long disordered regions, and the applicationof few selected predictors to two large protein data banks:Nrl_3D, currently containing more than 10,000 protein fragmentsof known 3D structure, and Swiss Protein, having almost 60,000 proteinsequences. The obtained results provide evidence that long disorderedregions are common in nature, with an estimate that 11% of allthe residues in the Swiss Protein data bank belong to disordered regionsof length 40 or greater. The hypothesis that different protein disordertypes exist is supported by high specificity/low sensitivity resultsof two family-specific predictors, by hybrid systems outperforminggeneral models on a two-family test, and by existence of significantgaps in Swiss Protein vs. Nrl_3D disorder frequency estimates forboth families. These findings prompt the need for a revision in thecurrent understanding of protein structure and function, as well as forthe developing of improved disorder predictors that should haveimportant uses in biotechnology applications.  相似文献   

15.
从低同源关系的氨基酸序列预测蛋白质的三维结构被称为从头预测,它是计算生物学领域中的挑战之一.蛋白质骨架预测是从头预测的必要先导步骤.本文应用一种基于共享信息素的并行蚁群优化算法,在现有能量函数指导下,通过不同能量项之间的定性互补,构建具有最低能量的蛋白质骨架结构,并通过聚类选择构象候选集合中具有最低自由能的构象.在CASP8/9所公布的从头建模目标上应用了该方法,CASP8的13个从头建模目标中,模型1中有2个目标的预测结果超过CASP8中最好的结果,7个位列前10名;CASP9的29个从头建模目标中,候选集中的最佳结果中有20个进入Server组的前10名,模型1中有11个进入前10名.本文的结果说明融合多个不同的能量函数指导并行搜索,可以更好地模拟天然蛋白质的折叠行为.同时,在本算法载体上实现了不同种类搜索策略的融合并行,对于用非确定性算法解决类似的优化问题来说也是一种新颖的方法.  相似文献   

16.
Indexing and retrieval for genomic databases   总被引:2,自引:0,他引:2  
Genomic sequence databases are widely used by molecular biologists for homology searching. Amino acid and nucleotide databases are increasing in size exponentially, and mean sequence lengths are also increasing. In searching such databases, it is desirable to use heuristics to perform computationally intensive local alignments on selected sequences and to reduce the costs of the alignments that are attempted. We present an index-based approach for both selecting sequences that display broad similarity to a query and for fast local alignment. We show experimentally that the indexed approach results in significant savings in computationally intensive local alignments and that index-based searching is as accurate as existing exhaustive search schemes  相似文献   

17.
《Computers & chemistry》1994,18(3):255-258
The identification and characterization of local residue patterns or conserved segments shared by a set of biopolymers has provided a number of insights in molecular biology. Biopolymer sequences are observations from macro molecules that share common structural or function features. The approach taken here rests on the notion that information may be most efficiently extracted from these observations through the use of a model that faithfully represents macro-molecular characteristics. Accordingly, our efforts are focused on statistical models which attempt to capture central features of protein structure, function, and change. Here the assumptions that underlie two new methods for the analysis of protein sequence data are explicitly delineated. (1) Threading of a sequence through structural motifs seeks to determine if a protein sequence fits a known protein structure. The assumptions delineated here also generally apply to other contact based threading methods that have been recently described. (2) Multiple sequence alignment via the Gibbs sampling algorithm seeks to identify position specific empirical free energy models for residue sites in common motifs and simultaneously the align sequence observations form these motifs.  相似文献   

18.
摘要:在蛋白质结构预测的研究中,一个重要的问题就是正确预测二硫键的连接,二硫键的准确预测可以减少蛋白质构像的搜索空间,有利于蛋白质的3D结构的预测。本文将一个蛋白质结构中二硫键的预测问题,等价为一个寻找图的最大权的匹配问题。图的顶点表示序列中的半胱氨酸残基,边连接每一顶点,表示一种可能的连接方式,边的权根据一个权值函数赋值,用EJ算法寻找具有最大权的匹配,则这个匹配对应二硫键的正确连接。应用这个方法对蛋白质结构的二硫键进行了预测取得了良好的结果。  相似文献   

19.
蛋白质的翻译是一个非常复杂且至关重要的生命过程。翻译速率会沿着mRNA上发生改变借以调控伴随翻译的蛋白质折叠,并对蛋白质的最终构象产生重要影响。本文以TIMbeta-alphabarrel折叠子中的两个不同物种HisA蛋白质为研究对象,初步分析了它们基因序列中局部密码子使用偏好、局部残基带电性分布、局部GC含量分布与蛋白质对称结构的相关性,探讨翻译过程中各种因素在蛋白质对称结构形成过程中的可能调控机制。结果表明,在两个不同物种的HisA蛋白质中,对称结构与密码子使用偏好、残基带电性以及GC含量都存在一定程度的相关性。  相似文献   

20.
Predicting the fold, or approximate 3D structure, of a protein from its amino acid sequence is an important problem in biology. The homology modeling approach uses a protein database to identify fold-class relationships by sequence similarity. The main limitation of this method is that some proteins with similar structures appear to have very different sequences, which we call the hidden-homology problem. As in other real-world domains for machine learning, this difficulty may be caused by a low-level representation. Learning in such domains can be improved by using domain knowledge to search for representations that better match the inductive bias of a preferred algorithm. In this domain, knowledge of amino acid properties can be used to construct higher-level representations of protein sequences. In one experiment using a 179-protein data set, the accuracy of fold-class prediction was increased from 77.7% to 81.0%. The search results are analyzed to refine the grouping of small residues suggested by Dayhoff. Finally, an extension to the representation incorporates sequential context directly into the representation, which can express finer relationships among the amino acids. The methods developed in this domain are generalized into a framework that suggests several systematic roles for domain knowledge in machine learning. Knowledge may define both a space of alternative representations, as well as a strategy for searching this space. The search results may be summarized to extract feedback for revising the domain knowledge.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号