首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 636 毫秒
1.
We developed a novel Monte Carlo threading algorithm which allows gaps and insertions both in the template structure and threaded sequence. The algorithm is able to find the optimal sequence-structure alignment and sample suboptimal alignments. Using our algorithm we performed sequence-structure alignments for a number of examples for three protein folds (ubiquitin, immunoglobulin and globin) using both "ideal" set of potentials (optimized to provide the best Z-score for a given protein) and more realistic knowledge-based potentials. Two physically different scenarios emerged. If a template structure is similar to the native one (within 2 A RMS), then (i) the optimal threading alignment is correct and robust with respect to deviations of the potential from the "ideal" one; (ii) suboptimal alignments are very similar to the optimal one; (iii) as Monte Carlo temperature decreases a sharp cooperative transition to the optimal alignment is observed. In contrast, if the template structure is only moderately close to the native structure (RMS greater than 3.5 A), then (i) the optimal alignment changes dramatically when an "ideal" potential is substituted by the real one; (ii) the structures of suboptimal alignments are very different from the optimal one, reducing the reliability of the alignment; (iii) the transition to the apparently optimal alignment is non-cooperative. In the intermediate cases when the RMS between the template and the native conformations is in the range between 2 A and 3.5 A, the success of threading alignment may depend on the quality of potentials used. These results are rationalized in terms of a threading free energy landscape. Possible ways to overcome the fundamental limitations of threading are discussed briefly.  相似文献   

2.
With the advent of genome sequencing projects, the amino acid sequences of thousands of proteins are determined every year. Each of these protein sequences must be identified with its function and its 3-dimensional structure for us to gain a full understanding of the molecular biology of organisms. To meet this challenge, new methods are being developed for fold recognition, the computational assignment of newly determined amino acid sequences to 3-dimensional protein structures. These methods start with a library of known 3-dimensional target protein structures. The new probe sequence is then aligned to each target protein structure in the library and the compatibility of the sequence for that structure is scored. If a target structure is found to have a significantly high compatibility score, it is assumed that the probe sequence folds in much the same way as the target structure. The fundamental assumptions of this approach are that many different sequences fold in similar ways and there is a relatively high probability that a new sequence possesses a previously observed fold. We review various approaches to fold recognition and break down the process into its main steps: creation of a library of target folds; representation of the folds; alignment of the probe sequence to a target fold using a sequence-to-structure compatibility scoring function; and assessment of significance of compatibility. We emphasize that even though this new field of fold recognition has made rapid progress, technical problems remain to be solved in most of the steps. Standard benchmarks may help identify the problem steps and find solutions to the problems.  相似文献   

3.
This paper evaluates the results of a protein structure prediction contest. The predictions were made using threading procedures, which employ techniques for aligning sequences with 3D structures to select the correct fold of a given sequence from a set of alternatives. Nine different teams submitted 86 predictions, on a total of 21 target proteins with little or no sequence homology to proteins of known structure. The 3D structures of these proteins were newly determined by experimental methods, but not yet published or otherwise available to the predictors. The predictions, made from the amino acid sequence alone, thus represent a genuine test of the current performance of threading methods. Only a subset of all the predictions is evaluated here. It corresponds to the 44 predictions submitted for the 11 target proteins seen to adopt known folds. The predictions for the remaining 10 proteins were not analyzed, although weak similarities with known folds may also exist in these proteins. We find that threading methods are capable of identifying the correct fold in many cases, but not reliably enough as yet. Every team predicts correctly a different set of targets, with virtually all targets predicted correctly by at least one team. Also, common folds such as TIM barrels are recognized more readily than folds with only a few known examples. However, quite surprisingly, the quality of the sequence-structure alignments, corresponding to correctly recognized folds, is generally very poor, as judged by comparison with the corresponding 3D structure alignments. Thus, threading can presently not be relied upon to derive a detailed 3D model from the amino acid sequence. This raises a very intriguing question: how is fold recognition achieved? Our analysis suggests that it may be achieved because threading procedures maximize hydrophobic interactions in the protein core, and are reasonably good at recognizing local secondary structure.  相似文献   

4.
A protein fold recognition method was tested by the blind prediction of the structures of a set of proteins. The method evaluates the compatibility of an amino acid sequence with a three-dimensional structure using the four evaluation functions: side-chain packing, solvation, hydrogen-bonding, and local conformation functions. The structures of 14 proteins containing 19 sequences were predicted. The predictions were compared with the experimental structures. The experimental results showed that 9 of the 19 target sequences have known folds or portions of known folds. Among them, the folds of Klebsiella aerogenes urease beta subunit (KAUB) and pyruvate phosphate dikinase domain 4 (PPDK4) were successfully recognized; our method predicted that KAUB and PPDK4 would adopt the folds of macromomycin (Ig-fold) and phosphoribosylanthranilate isomerase:indoleglycerol-phosphate synthase (TIM barrel), respectively, and the experimental structure revealed that they actually adopt the predicted folds. The predictions for the other targets were not successful, but they often gave secondary structural patterns similar to those of the experimental structures.  相似文献   

5.
In order to calculate the tertiary structure of a protein from its amino acid sequence, the thermodynamic approach requires a potential function of sequence and conformation that has its global minimum at the native conformation for many different proteins. Here we study the behavior of such functions for the simplest model system that still has the essential features of the protein folding problem, namely two-dimensional square lattice chain configurations involving two residue types. First we demonstrate a method for accurately recovering the given contact potential from only a knowledge of which sequences fold to which structures and what the non-native structures are. Second, we show how to derive from the same information more general potential functions having much better positive correlations between potential function value and conformational deviation from the native. These functions consequently permit faster and more reliable searches for the native conformation, given the native sequence. Furthermore, the method for finding such potentials is easily applied to more realistic protein models.  相似文献   

6.
7.
We apply a simple method for aligning protein sequences on the basis of a 3D structure, on a large scale, to the proteins in the scop classification of fold families. This allows us to assess, understand, and improve our automatic method against an objective, manually derived standard, a type of comprehensive evaluation that has not yet been possible for other structural alignment algorithms. Our basic approach directly matches the backbones of two structures, using repeated cycles of dynamic programming and least-squares fitting to determine an alignment minimizing coordinate difference. Because of simplicity, our method can be readily modified to take into account additional features of protein structure such as the orientation of side chains or the location-dependent cost of opening a gap. Our basic method, augmented by such modifications, can find reasonable alignments for all but 1.5% of the known structural similarities in scop, i.e., all but 32 of the 2,107 superfamily pairs. We discuss the specific protein structural features that make these 32 pairs so difficult to align and show how our procedure effectively partitions the relationships in scop into different categories, depending on what aspects of protein structure are involved (e.g., depending on whether or not consideration of side-chain orientation is necessary for proper alignment). We also show how our pairwise alignment procedure can be extended to generate a multiple alignment for a group of related structures. We have compared these alignments in detail with corresponding manual ones culled from the literature. We find good agreement (to within 95% for the core regions), and detailed comparison highlights how particular protein structural features (such as certain strands) are problematical to align, giving somewhat ambiguous results. With these improvements and systematic tests, our procedure should be useful for the development of scop and the future classification of protein folds.  相似文献   

8.
The multiple sequence alignment problem is applicable and important in various fields in molecular biology such as the prediction of three-dimensional structures of proteins and the inference of phylogenetic trees. However, the optimal alignment based on the scoring criterion is not always biologically the most significant alignment. We here propose two flexible and efficient approaches to solve this problem. One approach is to provide many suboptimal alignments as alternatives for the optimal one. It has been considered almost impossible to investigate such suboptimal alignments of more than two sequences because of the enormous size of the problem. We propose techniques for enumeration of suboptimal alignments using the Eppstein algorithm. We also discuss what kind of suboptimal alignment is unnecessary to enumerate and propose an efficient enumeration algorithm to enumerate only necessary alignments. The other approach is parametric analysis. The obtained optimal solution with fixed parameters such as gap penalties is not always the biologically best alignment. Thus, it is required to vary parameters and check how the optimal alignments change. The way to vary parameters has been studied well on the problem of two sequences, but not on the multiple alignment problem because of the difficulty of computing the optimal solution. We propose techniques for this parametric multiple alignment problem and examine the features of alignments obtained by various parametric analyses. For both approaches, this paper performs experiments on various groups of actual protein sequences and examines the efficiency of these algorithms and properties of sequence groups.  相似文献   

9.
Searches of the major public protein databases with core and linker chicken and human histone sequences have resulted in the compilation of an annotated set of histone protein sequences. In addition, new database searches with two distinct motif search algorithms have identified several members of the histone fold family, including human DRAP1 and yeast CSE4. Database resources include information on conflicts between similar sequence entries in different source databases, multiple sequence alignments, links to the Entrez integrated information retrieval system, structures for histone and histone fold proteins, and the ability to visualize structural data through Cn3D. The database currently contains >1000 protein sequences, which are searchable by protein type, accession number, organism name, or any other free text appearing in the definition line of the entry. All sequences and alignments in this database are available through the World Wide Web at http://www.nhgri.nih. gov/DIR/GTB/HISTONES or http://www.ncbi.nlm.nih. gov/Baxevani/HISTONES  相似文献   

10.
Peptides have the potential for targeting vaccines against pre-specified epitopes on folded proteins. When polyclonal antibodies against native proteins are used to screen peptide libraries, most of the peptides isolated align to linear epitopes on the proteins. The mechanism of cross-reactivity is unclear; both structural mimicry by the peptide and induced fit of the epitope may occur. The most effective peptide mimics of protein epitopes are likely to be those that best mimic both the chemistry and the structure of epitopes. Our goal in this work has been to establish a strategy for characterizing epitopes on a folded protein that are candidates for structural mimicry by peptides. We investigated the chemical and structural bases of peptide-protein cross-reactivity using phage-displayed peptide libraries in combination with computational structural analysis. Polyclonal antibodies against the well-characterized antigens, hen eggwhite lysozyme and worm myohemerythrin, were used to screen a panel of phage-displayed peptide libraries. Most of the selected peptide sequences aligned to linear epitopes on the corresponding protein; the critical binding sequence of each epitope was revealed from these alignments. The structures of the critical sequences as they occur in other non-homologous proteins were analyzed using the Sequery and Superpositional Structural Assignment computer programs. These allowed us to evaluate the extent of conformational preference inherent in each sequence independent of its protein context, and thus to predict the peptides most likely to have structural preferences that match their protein epitopes. Evidence for sequences having a clear structural bias emerged for several epitopes, and synthetic peptides representing three of these epitopes bound antibody with sub-micromolar affinities. The strong preference for a type II beta-turn predicted for one peptide was confirmed by NMR and circular dichroism analyses. Our strategy for identifying conformationally biased epitope sequences provides a new approach to the design of epitope-targeted, peptide-based vaccines.  相似文献   

11.
As the structural database continues to expand, new methods are required to analyse and compare protein structures. Whereas the recognition, comparison, and classification of folds is now more or less a solved problem, tools for the study of constellations of small numbers of residues are few and far between. In this paper, two programs are described for the analysis of spatial motifs in protein structures. The first, SPASM, can be used to find the occurrence of a motif consisting of arbitrary main-chain and/or side-chains in a database of protein structures. The program also has a unique capability to carry out "fuzzy pattern matching" with relaxed requirements on the types of some or all of the matching residues. The second program, RIGOR, scans a single protein structure for the occurrence of any of a set of pre-defined motifs from a database. In one application, spatial motif recognition combined with profile analysis enabled the assignment of the structural and functional class of an uncharacterised hypothetical protein in the sequence database. In another application, the occurrence of short left-handed helical segments in protein structures was investigated, and such segments were found to be fairly common. Potential applications of the techniques presented here lie in the analysis of (newly determined) structures, in comparative structural analysis, in the design and engineering of novel functional sites, and in the prediction of structure and function of uncharacterised proteins.  相似文献   

12.
We present a fast method for finding optimal parameters for a low-resolution (threading) force field intended to distinguish correct from incorrect folds for a given protein sequence. In contrast to other methods, the parameterization uses information from >10(7) misfolded structures as well as a set of native sequence-structure pairs. In addition to testing the resulting force field's performance on the protein sequence threading problem, results are shown that characterize the number of parameters necessary for effective structure recognition.  相似文献   

13.
BACKGROUND: Structural studies by nuclear magnetic resonance (NMR) of RNA and DNA aptamer complexes identified through in vitro selection and amplification have provided a wealth of information on RNA and DNA tertiary structure and molecular recognition in solution. The RNA and DNA aptamers that target ATP (and AMP) with micromolar affinity exhibit distinct binding site sequences and secondary structures. We report below on the tertiary structure of the AMP-DNA aptamer complex in solution and compare it with the previously reported tertiary structure of the AMP-RNA aptamer complex in solution. RESULTS: The solution structure of the AMP-DNA aptamer complex shows, surprisingly, that two AMP molecules are intercalated at adjacent sites within a rectangular widened minor groove. Complex formation involves adaptive binding where the asymmetric internal bubble of the free DNA aptamer zippers up through formation of a continuous six-base mismatch segment which includes a pair of adjacent three-base platforms. The AMP molecules pair through their Watson-Crick edges with the minor groove edges of guanine residues. These recognition G.A mismatches are flanked by sheared G.A and reversed Hoogsteen G.G mismatch pairs. CONCLUSIONS: The AMP-DNA aptamer and AMP-RNA aptamer complexes have distinct tertiary structures and binding stoichiometries. Nevertheless, both complexes have similar structural features and recognition alignments in their binding pockets. Specifically, AMP targets both DNA and RNA aptamers by intercalating between purine bases and through identical G.A mismatch formation. The recognition G.A mismatch stacks with a reversed Hoogsteen G.G mismatch in one direction and with an adenine base in the other direction in both complexes. It is striking that DNA and RNA aptamers selected independently from libraries of 10(14) molecules in each case utilize identical mismatch alignments for molecular recognition with micromolar affinity within binding-site pockets containing common structural elements.  相似文献   

14.
By incorporating predicted secondary and tertiary restraints derived from multiple sequence alignments into ab initio folding simulations, it has been possible to assemble native-like tertiary structures for a test set of 19 nonhomologous proteins ranging from 29 to 100 residues in length and representing all secondary structural classes. Secondary structural restraints are provided by the PHD secondary structure prediction algorithm that incorporates multiple sequence information. Multiple sequence alignments also provide predicted tertiary restraints via a two-step process: First, seed side chain contacts are selected from a correlated mutation analysis, and then an inverse folding algorithm expands these seed contacts. The predicted secondary and tertiary restraints are incorporated into a lattice-based, reduced protein model for structure assembly and refinement. The resulting native-like topologies exhibit a coordinate root-mean-square deviation from native for the whole chain between 3.1 and 6.7 A, with values ranging from 2.6 to 4.1 A over approximately 80% of the structure. Overall, this study suggests that the use of restraints derived from multiple sequence alignments combined with a fold assembly algorithm is a promising approach to the prediction of the global topology of small proteins.  相似文献   

15.
Recent studies in the field of de novo protein design have focused on the construction of native-like structures. Here we describe the design and characterization of an isoleucine zipper peptide intended to form a parallel triple-stranded coiled coil. To obtain the native-like structural uniqueness, the hydrophobic interface of the peptide consists of beta-branched Ile residues for complementary side chain packing. The peptide forms a stable triple-stranded coiled coil, as determined by circular dichroism and sedimentation equilibrium analyses. A fluorescence quenching assay after the incorporation of acridine revealed a parallel orientation of the peptides. The structural uniqueness of the coiled coil was confirmed by proton-deuterium amide hydrogen exchange and hydrophobic dye binding. The peptide contains amide protons with hydrogen exchange rates that are approximately an order of magnitude slower than those expected if the exchange occurred via global unfolding. A hydrophobic dye does not bind to the peptide. These results strongly suggest that the peptide folds into a well-packed structure that is very similar to the native state of a natural protein. Thus, Ile residues in the hydrophobic interface can improve the side chain packing, which can impart native-like structural uniqueness to the designed coiled coil.  相似文献   

16.
We present two new sets of energy functions for protein structure recognition, given the primary sequence of amino acids along the polypeptide chain. The first set of potentials is based on the positions of alpha- and the second on positions of beta- and alpha-carbon atoms of amino acid residues. The potentials are derived using a theory of Boltzmann-like statistics of protein structure. The energy terms incorporate both long-range interactions between residues remote along a chain and short-range interactions between near neighbors. Distance dependence is approximated by a piecewise constant function defined on intervals of equal size. The size of the interval is optimized to preserve as much detail as possible without introducing excessive error due to limited statistics. A database of 214 non-homologous proteins was used both for the derivation of the potentials, and for the 'threading' test originally suggested by Hendlich et al. (1990) J. Mol. Biol., 216, 167-180. Special care is taken to avoid systematic error in this test. For threading, we used 100 non-homologous protein chains of 60-205 residues. The energy of each of the native structures was compared with the energy of 43,000 to 19,000 alternative structures generated by threading. Of these 100 native structures, 92 have the lowest energy with alpha-carbon-based potentials and, even more, 98 of these 100 structures, have the lowest energy with the beta- and alpha-carbon based potentials.  相似文献   

17.
18.
MOTIVATION: In order to increase the accuracy of multiple sequence alignments, we designed a new strategy for optimizing multiple sequence alignments by genetic algorithm. We named it COFFEE (Consistency based Objective Function For alignmEnt Evaluation). The COFFEE score reflects the level of consistency between a multiple sequence alignment and a library containing pairwise alignments of the same sequences. RESULTS: We show that multiple sequence alignments can be optimized for their COFFEE score with the genetic algorithm package SAGA. The COFFEE function is tested on 11 test cases made of structural alignments extracted from 3D_ali. These alignments are compared to those produced using five alternative methods. Results indicate that COFFEE outperforms the other methods when the level of identity between the sequences is low. Accuracy is evaluated by comparison with the structural alignments used as references. We also show that the COFFEE score can be used as a reliability index on multiple sequence alignments. Finally, we show that given a library of structure-based pairwise sequence alignments extracted from FSSP, SAGA can produce high-quality multiple sequence alignments. The main advantage of COFFEE is its flexibility. With COFFEE, any method suitable for making pairwise alignments can be extended to making multiple alignments. AVAILABILITY: The package is available along with the test cases through the WWW: http://www. ebi.ac.uk/cedric CONTACT: cedric.notredame@ebi.ac.uk  相似文献   

19.
A structure-based scoring matrix MDPRE was derived from amino acid spatial preferences in protein structures. Sequence alignment and evolutionary studies by using MDPRE matrix gave similar results as those from ordinary sequence and structure alignments. It is interesting that a matrix derived from structure data solely could give comparable alignment results, strongly indicating the intimate connection between protein sequences and structures. The branch order and length from this approach were close to those obtained by a structure comparison method. Thus, by applying this structure-based matrix, the trees obtained should reflect evolutionary characteristics of protein structure. This approach takes advantage over a direct structure comparison in that (1) only a sequence and MDPRE matrix are needed, making it simple and widely applicable (especially in the absence of 3-dimensional protein structure data); (2) an established algorithm for sequence alignment and tree building could be employed, providing opportunities for direct comparison between matrices from different methodologies. One of the most striking features of this method is its capability to detect protein structure homologies when the sequence identities are low. This was well reflected in the given examples of the alignment of dinucleotide-binding domains.  相似文献   

20.
A new symmetric-iterative method for multiple alignment of protein sequences is presented. The method can be described as a combination of motif finding and dynamic programming procedures. It uses each sequence as a standard to which all sequences are aligned based on the significant segment pair alignment (SSPA) protocol. Sequences are further matched using a reduced scoring threshold to provide fillers and extensions between highly significant segment pair matches. The method produces alignment blocks that accommodate indels and are separated by variable-length unaligned segments. Construction of consensus sequences is iterative, assigning greater weights to more distantly related sequences. A consensus sequence and various measures of conservation at each aligned position can be used for comparisons between protein families, for data base searches, and for analysis of functional and evolutionary features. The method is illustrated on the extended family of prokaryotic and eukaryotic RecA-like sequences. The RecA-like sequences reveal extended alignments among eubacterial RecA and separately among eukaryotic/archaebacterial Rad51/RadA. Eleven conserved blocks are common to both groups, two of them encompassing the ATP-binding A and B-sites. Among the most conserved positions are glycine residues. For example, they occur twice as doublets putatively serving as hinge connections that provide opportunity for alternative structural conformations. Also several charged/polar residues are highly conserved, probably consequent upon the extensive intermonomer interactions in RecA/Rad51 filament formation and possibly relevant protein-protein and protein-nucleic acid interactions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号