首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Recent approaches to the 3-D-l-D compatibility problem ave triedto predict protein 3-D structure from sequence. One of the criticalfactors in this issue is the evaluation of fitness between agiven 3-D structure and any sequence mounted on it. We havedeveloped an evaluation function composed of four terms, sidechain packing, hydration, hydrogen bonding and local conformationpotentials, which were empirically derived from 101 proteinsof known structure. The efficiency of the evaluation functionwas tested hi two ways. In the first test, the sequence of proteinA is mounted (without gaps) on the structure of protein B whichis greater in size than A. For 81 proteins examined, the nativestructure was always detected. In the second test, a standardsequence homology search is performed against the entire database,followed by an assessment of the alignment with its proposedstructure, using the empirical evaluation function. When thistest was applied to the 101 proteins, our evaluation functionsuccessfully discriminated truly homologous sequence pairs fromnon-homologous proteins even when the sequence similaritieswere very weak. This approach was found to have clear advantagesover conventional sequence search methods.  相似文献   

2.
An automatic algorithm for defining topological equivalencesin protein structures is presented. The algorithm is based ona dynamic programming technique and self-consistent scoringmethod. We have used it to align pairs of similar protein structuresof several protein families and to identify recurrent structuraldomains in aspartic proteinase 2APR. Its ability to find suboptimalpaths permits a thorough comparison of proteins at each levelin the hierarchy of the protein structure: secondary structure,super-secondary structure, domain and entire globular structure.The algorithm has been extended to the structure alignment ofribonucleic acid and can be extended to the structure alignmentof any linear polymer.  相似文献   

3.
The machine learning program GOLEM was applied to discover topologicalrules in the packing ofß-sheets in /ß-domainproteins. Rules (constraints) were determined for four featuresof ß-sheet packing: (i) whether a ß-strandis at an edge; (ii) whether two consecutive ß-strandspack parallel or anti-parallel; (iii) whether twoß-strandspack adjacently; and (iv) the winding direction of two consecutiveß-strands. Rules were found with high predictive accuracyand coverage. The errors were generally associated with complicationsin domain folds, especially in one doubly wound domains. Investigationof the rules revealed interesting patterns, some of which wereknown previously, others that are novel. Novel features include(i) the relationship between pairs of sequential strands isin general one of decreasing size; (ii) more sequential pairsof strands wind in the direction out than in; and (iii) it takesa larger alteration in hydrophobicity to change a strand fromwinding in the direction out than in. These patterns in thedata may be the result of folding pathways in the domains. Therules found are of predictive value and could be used in thecombinatorial prediction of protein structure, or as a generaltest of model structures, e.g. those produced by threading.We conclude that machine learning has a useful role in the analysisof protein structures.  相似文献   

4.
Optimal sequence threading can be used to recognize membersof a library of protein folds which are closely related in 3-Dstructure to the native fold of an input test sequence, evenwhen the test sequence is not significantly homologous to thesequence of any member of the fold library. The methods providean alignment between the residues of the test sequence and theresidue positions in a template fold. This alignment optimizesa score function, and the predicted fold is the highest scoringmember of the library of folds. Most score functions containa pairwise interaction energy term. This, coupled with the needto introduce gaps into the alignment, means that the optimizationproblem is NP hard. We report a comparison between two heuristicoptimization algorithms used in the literature, double dynamicprogramming and an iterative algorithm based on the so-calledfrozen approximation. These are compared in terms of both theranking of likely folds and the quality of the alignment produced.  相似文献   

5.
6.
In search of the ideal protein sequence   总被引:1,自引:0,他引:1  
The inverse of a folding problem is to find the ideal sequencethat folds into a particular protein structure. This problemhas been addressed using the topology fingerprintbased threadingalgorithm, capable of calculating a score (energy) of an arbitrarysequence-structure pair. At first, the search is conducted byunconstrained minimization of the energy in sequence space.It is shown that using energy as the only design criterion leadsto spurious solutions with incorrect amino acid composition.The problem lies in the general features of the protein energysurface as a function of both structure and sequence. The proposedsolution is to design the sequence by maximizing the differencebetween its energy in the desired structure and in other knownprotein structures. Depending on the size of the database ofstructures ‘to avoid’, sequences bearing significantsimilarity to the native sequence of the target protein areobtained using this procedure.  相似文献   

7.
We introduce a completely automatic and objective procedurefor the comparison of protein structures. A genetic algorithmis used to search for a near optimal solution of the rigid-bodysuperposition of two whole protein structures. The specificationof an initial set of equivalences is not required. Topologicalequivalences in the final structural alignment are defined bya conventional dynamic programming routine, which is commonlyused to compare protein sequences. A least-squares fitting algorithmis then used to optimize the fit between the final set of equivalences.We have applied our method to the comparison of ribonucleicacid structures, as well as protein structures. The structuralalignments are generally consistent with those previously published.In fact, on most occasions our method defines at least the samenumber of topological equivalences as other procedures, butalways with a lower r.m.s. distance between them.  相似文献   

8.
Genetic algorithms are very efficient search mechanisms whichmutate, recombine and select amongst tentative solutions toa problem until a near optimal one is achieved. We introducethem as a new tool to study proteins. The identification andmotivation for different fitness functions is discussed. Theevolution of the zinc finger sequence motif from a random startis modelled. User specified changes of the repressor structurewere simulated and critical sites and exchanges for mutagenesisidentified. Vast conformational spaces are efficiently searchedas illustrated by the ab initio folding of a model protein ofa four ß strand bundle. The genetic algorithm simulationwhich mimicked important folding constraints as overall hydrophobicpackaging and a propensity of the betaphilic residues for transpositions achieved a unique fold. Cooperativity in the ßstrand regions and a length of 3–5 for the interconnectingloops was critical. Specific interaction sites were considerablyless effective in driving the fold.  相似文献   

9.
An automatic procedure is proposed to identify, from the proteinsequence database, conserved amino acid patterns (or sequencemotifs) that are exclusive to a group of functionally relatedproteins. This procedure is applied to the PIR database anda dictionary of sequence motifs that relate to specific superfamiliesconstructed. The motifs have a practical relevance in identifyingthe membership of specific superfamilies without the need toperform sequence database searches in 20% of newly determinedsequences. The sequence motifs identified represent functionallyimportant sites on protein molecules. When multiple blocks existin a single motif they are often close together in the 3-D structure.Furthermore, occasionally these motif blocks were found to besplit by introns when the correlation with exon structures wasexamined.  相似文献   

10.
In a systematic study of the periplasmic folding of antibodyfragments in Escherichia coli, we have analysed the expressionof an aggregation-prone and previously non-functional anti-phosphorylcholineantibody, T15, as a model system and converted it to a functionalmolecule. Introduction of heavy chain framework mutations previouslyfound to improve the folding of a related antibody led to improvedfolding of T15 fragments and improved physiology of the hostE.coli cells. Manipulation of the complementarity determiningregions (CDR) of the framework-mutated forms of T15 furtherimproved folding and bacterial host physiology, but no improvementwas seen in the wild type, suggesting the existence of a hierarchyin sequence positions leading to aggregation. Rational mutagenesisof the T15 light chain led to the production of functional T15fragments for the first time, with increased levels of functionalprotein produced from VH manipulated constructs. We proposethat a hierarchical analysis of the primary amino acid sequence,as we have described, provides guidelines on how correctly folding,functional antibodies might be achieved and will allow furtherdelineation of the decisive structural factors and pathwaysfavouring protein aggregation.  相似文献   

11.
The ‘H5’ segment located between the putative fifthand sixth transmembrane helices is the most highly conservedregion in voltage-gated potassium channels and it is believedto constitute a major part of the ion conduction path (pore).Here we present a two-step procedure, comprising secondary structureprediction and hydrophobic moment profiling, to predict thestructure of this important region. Combined results from theapplication of the procedure to the H5 region of four voltage-gatedand five other K+ channel sequences lead to the prediction ofa ß-strand-turn-(3-strand structure for H5. The reasonsfor the application of these soluble protein methods to partsof membrane proteins are: (i) that pore-lining residues areaccessible to water and (ii) that a large enough database ofhighresolution membrane protein structures does not yet existThe results are compared with experimental results, in particularspectroscopic studies of two peptides based on the H5 sequenceof SHAKER potassium channel. The procedure developed here maybe applicable to wateraccessible regions of other membrane proteins.  相似文献   

12.
The Engrailed Homeodomain folds on the microsecond time scale via an intermediate that is experimentally well characterised using structural Engrailed-Homeodomain mimics. Here, we analysed directly the changes in distance between key residues during the kinetics of unfolding and at equilibrium using fluorescence resonance energy transfer (FRET). Trp was the donor and 5-(((acetylamino)ethyl)amino) naphthalene-1-sulphate, the acceptor, substituted in positions that caused little change in stability. Distances calculated for the native state were in good agreement with those derived from the NMR structure. The distances between the N- and C-termini of Helix I and of Helix III increased, then decreased and finally increased again with increasing GdmCl concentration on equilibrium denaturation. This behaviour implied that there was a folding intermediate on the folding pathway and that this intermediate was populated at low concentrations of GdmCl concentration ( approximately 1 M). We analysed the changes in distance during temperature-jump relaxation kinetics, using a qualitative and very conservative procedure that drew conclusions only when changes in fluorescence of mutants containing either the donor or the acceptor alone would not obscure the change in the FRET signal when both donor and acceptor were present. The distance changes obtained under equilibrium and kinetic measurements were self-consistent and also consistent with the known high-resolution structures of the mimics of the folding intermediates. We showed that for analysing distances in disordered ensembles, it is important to use FRET probes with a critical distance close to the average separation in the ensemble. Otherwise, average distances could be over or underestimated.  相似文献   

13.
We have studied the question of how much extra predictive powerthe correlated mutational behaviour of pairs of amino acid residuesseparated along a sequence has concerning the likelihood ofthose residues being in contact in the folded protein. The mutationalbehaviour is deduced from multiple sequence alignments. Ourfindings are that there is, indeed, some valuable informationavailable from this source and that it is sufficient to makea significant improvement in our ability to predict contacts,when compared with earlier methods that do not take into accountthe correlations between the mutations. This improvement isapproximately twice as large as can be obtained by the moreeconomical method of simply averaging pair preferences overthe same sequence alignment. Even when using a method basedon pair preferences, a further significant improvement can bemade by penalizing more variable regions (on the reasonableassumption that invariant residues are relatively more likelyto be in contact), though we have found no way of improvingthe pair preference method to the extent that it matches themethod based on correlated behaviour. Our new method is thoughtto be the best data-based method of contact prediction developedso far, achieving, on average, an improvement over a random(i.e. information-free) prediction of a factor of five whenthe number of contacts predicted is chosen to match the numberthat actually occur.  相似文献   

14.
Recently some heat-shock proteins have been linked to functionsof ‘chaperoning’ protein folding in vivo. Here currentexperimental evidence is reviewed and possible requirementsfor such an activity are discussed. It is proposed that onemode of chaperone action is to actively unfold misfolded orbadly aggregated proteins to a conformation from whkh they couldrefold spontaneously; that improperly folded proteins are recognizedby excessive stretches of solvent-exposed backbone, rather thanby exposed hydrophobic patches; and that the molecular mechanismfor unfolding is either repeated binding and dissociation (‘plucking’)or translocation of the protein backbone through a binding cleft(‘threading’), allowing the threaded chain to refoldspontaneously. The observed hydrolysis of ATP would providethe energy for active unfolding. These hypotheses can be appliedto both monomeric folding and oligomeric assembly and are sufficientlydetailed to be open to directed experimental verification.  相似文献   

15.
The catalytic residues of an enzyme are defined as the aminoacids directly involved in chemical catalysis. They mainly actas a general acid–base, electrophilic or nucleophiliccatalyst or they polarize and stabilize the transition state.An analysis of the structural features of 36 catalytic residuesin 17 enzymes of known structure and with defined mechanismis reported. Residues that bind metal ions (Zn2 and Cu2) areconsidered separately. The features examined are: residue type,location in secondary structure, separation between the residues,accessibility to solvent, intra-protein electrostatic interactions,mobility as evaluated from crystallographic temperature factors,polarity of the environment and the sequence conservation betweenhomologous enzymes of residues that were sequentially or spatiallyclose to the catalytic residue. In general the environment ofcatalytic residues is similar to that of polar side chains thathave low accessibility to solvent. Two algorithms have beendeveloped to identify probable catalytic residues. Scanningan alignment of homologous enzyme sequences for peaks of sequenceconservation identifies 13 out of the 16 catalytic residueswith 50 residues overpredicted. When the conservation of thespatially close residues is used instead, a different set of13 residues are identified with 47 residues overpredicted. Acombination of the two algorithms identifies 11 residues with36 residues overpredicted.  相似文献   

16.
The computational task of protein structure prediction is believedto require exponential time, but previous arguments as to itsintractability have taken into account only the size of a protein'sconformational space. Such arguments do not rule out the possibleexistence of an algorithm, more selective than exhaustive search,that is efficient and exact. (An efficient algorithm is onethat is guaranteed, for all possible inputs, to run in timebounded by a function polynomial in the problem size. An intractableproblem is one for which no efficient algorithm exists.) Questionsregarding the possible intractability of problems are oftenbest answered using the theory of NP-completeness. In this treatmentwe show the NPhardness of two typical mathematical statementsof empirical potential energy function minimization of macromolecules.Unless all NP-compiete problems can be solved efficiently, theseresults imply that a function minimization algorithm can beefficient for protein structure prediction only if it exploitsprotein-specific properties that prohibit the simple geometricconstructions that we use in our proofs. Analysis of furthermathematical statements of molecular structure prediction couldconstitute a systematic methodology for identifying sourcesof complexity in protein folding, and for guiding developmentof predictive algorithms.  相似文献   

17.
Immunoglobulin (Ig)-like proteins have been shown to fold following formation of a nucleus comprising interactions between residues that are distant in the primary sequence. What role do the loops connecting these nucleus residues play? Here, the importance of loops connecting beta-strands in different sheets of the Ig fold is investigated, by insertion of five glycine residues into the B-C loop of an Ig domain from human titin, TI I27. The folding pathway of this elongated 'pseudo wild-type' TI I27 is probed using protein engineering and Phi-value analysis. The Phi-values calculated for mutants within the pseudo wild-type protein indicate that the folding nucleus in wild-type TI I27 is conserved, supporting the hypothesis that the inter-sheet loop is not critical to the formation of a long-range folding nucleus.  相似文献   

18.
A methodology is proposed to solve a difficult modeling problemrelated to the recently sequenced P39 protein. This sequenceshares no similarity with any known 3D structure, but a foldis proposed by several threading tools. The difficulty in aligningthe target sequence on one of the proposed template structuresis overcome by combining the results of several available predictionmethods and by refining a rational consensus between them. Insilico validation of the obtained model and a preliminary cross-checkwith experimental features allow us to state that this borderlineprediction is at least reasonable. This model raises relevanthypotheses on the main structural features of the protein andallows the design of site-directed mutations. Knowing the geneticcontext of the P39 reading frame, we are now able to suggesta function for the P39 protein: it would act as a periplasmicsubstrate-binding protein.  相似文献   

19.
Predictions of protein secondary structure using current methodsare often unrealistic, i.e. the predicted -helices or ß-strandsare too short. To improve the realism, various heuristic ‘filtering’or ‘smoothing’ methods are used. They are more orless intuitive and are based on ad hoc corrections. We presenta regularization method to obtain a realistic secondary structurefrom predicted propensities. It is based on the known dynamicprogramming algorithm and is quite objective. It can be usedwith any prediction method which yields propensities. The regularizedpredictions conserve well the overall prediction accuracy andimprove the ‘protein-likeness’ of the prediction.  相似文献   

20.
Using the well-characterized antibody McPC603 as a model, wehad found that the Fv fragment can be isolated from Escherichiacoli as a functional protein in good yields, whereas the amountof the correctly folded Fab fragment of the same antibody producedunder identical conditions is significantly lower. In this paper,we analyse the reasons for this difference. We found that avariety of signal sequences function in the secretion of theisolated chains of the Fab fragment or in the co-secretion ofboth chains in E.coli. The low yield of functional Fab fragmentis not caused by inefficient expression or secretion in E.coli,but by inefficient folding and/or assembly in the periplasm.We compared the folding yields for the Fv and the Fab fragmentin the periplasm under various conditions. Several diagnosticframework variants were constructed and their folding yieldsmeasured. The results show that substitutions affecting cis-prolineresidues and those affecting various disulphide bonds in theprotein are by themselves insufficient to dramatically changethe partitioning of the folding pathway to the native structure,and the cause must lie in a facile aggregation of folding intermediatescommon to all structural variants. However, all structural variantscould be obtained in native form, demonstrating the generalutility of the secretory expression strategy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号