首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 7 毫秒
1.
In recent protein structure prediction research there has beena great deal of interest in using amino acid interaction preferences(e.g. contact potentials or potentials of mean force) to align(‘thread’) a protein sequence to a known structuralmotif. An important open question is whether a polynomial timealgorithm for finding the globally optimal threading is possible.We identify the two critical conditions governing this question:(i) variable-length gaps are admitted into the alignment, and(ii) interactions between amino acids from the sequence areadmitted into the score function. We prove that if both theseconditions are allowed then the protein threading decision problem(does there exist a threading with a score K?) is NP-complete(in the strong sense, i.e. is not merely a number problem) andthe related problem of finding the globally optimal proteinthreading is NP-hard. Therefore, no polynomial time algorithmis possible (unless P = NP). This result augments existing proofsthat the direct protein folding problem is NP-complete by providingthe corresponding proof for the ‘inverse’ proteinfolding problem. It provides a theoretical basis for understandingalgorithms currently in use and indicates that computationalstrategies from other NP-complete problems may be useful forpredictive algorithms.  相似文献   

2.
Making an alignment of the amino acid sequences is an essentialstep in the prediction of an unknown protein structure by modelbuilding from the known structure of a protein of the same family.To improve the accuracy of the alignments, we introduced theconcept of hydrophobic core scores, which restrains puttinginsertions/deletions in the hydrophobic core regions of theprotein. Eight pairs of protein sequences were aligned by thismethod, and the quality of the alignments were assessed byreference to those obtained by the structural superposition.The introduction of the hydrophobic core scores derived fromthe knowledge of the tertiary structure of one of each pairresulted in an improvement of the accuracy of the alignments.The quality of the alignment was found to depend on the homologyof the protein sequences.  相似文献   

3.
Genetic algorithms are very efficient search mechanisms whichmutate, recombine and select amongst tentative solutions toa problem until a near optimal one is achieved. We introducethem as a new tool to study proteins. The identification andmotivation for different fitness functions is discussed. Theevolution of the zinc finger sequence motif from a random startis modelled. User specified changes of the repressor structurewere simulated and critical sites and exchanges for mutagenesisidentified. Vast conformational spaces are efficiently searchedas illustrated by the ab initio folding of a model protein ofa four ß strand bundle. The genetic algorithm simulationwhich mimicked important folding constraints as overall hydrophobicpackaging and a propensity of the betaphilic residues for transpositions achieved a unique fold. Cooperativity in the ßstrand regions and a length of 3–5 for the interconnectingloops was critical. Specific interaction sites were considerablyless effective in driving the fold.  相似文献   

4.
A strategy has been developed for the construction of a validated,comprehensive composite protein sequence database. Entries areamalgamated from primary source data bases by a largely automatedset of processes in which redundant and trivially differententries are eliminated. A modular approach has been adoptedto allow scientific judgement to be used at each stage of databaseprocessing and amalgamation. Source databases are assigned apriority depending on the quality of sequence validation andcommenting. Rejection of entries from the lower priority database,in each pairwise comparison of databases, is carried out accordingto optionally defined redundancy criteria based on sequencesegment mismatches. Efficient algorithms for this methodologyare embodied in the COMPO software system. COMPO has been appliedfor over 2 years in construction and regular updating of theOWL composite protein sequence database from the source databasesNBRF-PIR, SWISS-PROT, a GenBank translation retrieved from thefeature tables, NBRF-NEW, NEWAT86, PSD-KYOTO and the sequencescontained in the Brookhaven protein structure databank. OWLis part of the ISIS integrated data resource of protein sequenceand structure [Akrigg et al. (1988) Nature, 335, 745–746].The modular nature of the integration process greatly facilitatesthe frequent updating of OWL following releases of the sourcedatabases. The extent of redundancy in these sources is revealedby the comparison process. The advantages of a robust compositedatabase for sequence similarity searching and information retrievalare discussed.  相似文献   

5.
Probabilities of all possible correspondences of residues inaligning two proteins are evaluated by assuming that the statisticalweight of each alignment is proportional to the exponent ofits total similarity score. Based on such probabilities, a probabilityalignment that includes the most probable correspondences isproposed. In the cases of highly similar sequence pairs, theprobability alignments agree with the maximum similarity alignmentsthat correspond to the alignments with the maximum similarityscore. Significant correspondences in the probability alignmentsare those whose probabilities are >0.5. The probability alignmentmethod is applied to a few protein pairs, and results indicatethat such highly probable correspondences in the probabilityalignments are probably correct correspondences that agree withthe structural alignments and that incorrect correspondencesin the maximum similarity alignments are usually insignificantcorrespondences in the probability alignments. The root meansquare deviations in superimposition of corresponding residuestend to be smaller for significant correspondences in the probabilityalignments than for all correspondences in the maximum similarityalignments, indicating that incorrect correspondences in themaximum similarity alignments tend to be insignificant correspondencesin probability alignments. This fact is also confirmed in 109protein pairs that are similar to each other with sequence identitiesbetween 90 and 35%. In addition, the probability alignment methodmay better predict correct correspondences than the maximumsimilarity alignment method. Probability alignments do, of course,depend on a scoring scheme but are less sensitive to the valueof parameters such as gap penalties. The present probabilityalignment method is useful for constructing reliable alignmentsbased on the probabilities of correspondences and can be usedwith any scoring scheme.  相似文献   

6.
Rop is a four-helix bundle protein composed of two identical helix-loop-helix monomers. Protein folding monitored by stopped-flow fluorescence or CD exhibits biphasic kinetics when folding to low final denaturant concentrations. As the final concentration of denaturant is increased, the amplitude of the fast phase decreases, until at the highest concentrations the kinetics appear monophasic. We propose that the fast phase represents the formation of an intermediate. Here, we use real-time NMR to detect the formation of this intermediate and to characterize its structural features.  相似文献   

7.
A data bank merging related protein structures and sequences   总被引:1,自引:0,他引:1  
A data collection which merges protein structural and sequenceinformation is described. Structural superpositions amongstproteins with similar main-chain fold were performed or collectedfrom the literature. Sequences taken from the protein primarystructure databases were associated with the multiple structuralalignments providing they were at least 50% homologous in residueidentity to one of the structural sequences and at least 50%of the structural sequence residues were alignable. Such restrictionsallow reasonable confidence that the primary sequences sharethe conformation of the tertiary structural templates, exceptin the less conserved loop regions. Multiple structural superpositionswere collected for 38 familial groups containing a total of209 tertiary structures; 45 structures had no superposable matesand were used individually. Other information is also providedas main-chain and side-chain conformational angles, secondarystructural assignments and the like. Wedding the primary andtertiary structural data resulted in an 8-fold increase of databank sequence entries over those associated with the known three-dimensionalarchitectures alone.  相似文献   

8.
Evaluation and improvements in the automatic alignment of protein sequences   总被引:1,自引:0,他引:1  
The accuracy of protein sequence alignment obtained by applyinga commonly used global sequence comparison algorithm is assessed.Alignments based on the superposition of the three-dimensionalstructures are used as a standard for testing the automatic,sequence-based methods. Alignments obtained from the globalcomparison of five pairs of homologous protein sequences studiedgave 54% agreement overall for residues in secondary structures.The inclusion of information about the secondary structure ofone of the proteins in order to limit the number of gaps insertedin regions of secondary structure, improved this figure to 68%.A similarity score of greater than six standard deviation unitssuggests that an alignment which is greater than 75% correctwithin secondary structural regions can be obtained automaticallyfor the pair of sequences.  相似文献   

9.
As the amino acid sequence of a given protein changes alongthe phylogenetic tree, enough of the overall folding patternmust be conserved to ensure that the protein still fulfils itsbiological function. Eighteen published scales which tabulatevarious side chain properties are compared here by computingthe variance of each scale when applied to each of several proteinfamilies. The conservation of each scale of side chain propertiesis examined for the 20 627 residues in 60 mammalian myoglobins,31 mammalian ribonucleases, insulin A and B chains (29 sequenceseach), 29 vertebrate and 28 plant cytochrome c's. Those scaleswhich are the most highly conserved through the evolution ofeach protein family may well be the best predictors of proteinfolding patterns. The mean-area-buried scale and the optimizedmatching hydrophobicities scale are more conserved than otherscales. An additional result is the relatively poor conservationacross evolution of the Chou-Fasman secondary structure predictors.  相似文献   

10.
A multiple sequence alignment algorithm is described that usesa dynamic programming-based pattern construction method to aligna set of homologous sequences based on their common patternof conserved sequence elements. This pattern-induced multi-sequencealignment (PUMA) algorithm can employ secondary-structure dependentgap penalties for use in comparative modelling of new sequenceswhen the three-dimensional structure of one or more membersof the same family is known. We show that the use of secondarystructure information can significantly improve the accuracyof aligning structure boundaries in a set of homologous sequenceseven when the structure of only one member of the family isknown  相似文献   

11.
Variable gap penalty for protein sequence-structure alignment   总被引:1,自引:0,他引:1  
The penalty for inserting gaps into an alignment between two protein sequences is a major determinant of the alignment accuracy. Here, we present an algorithm for finding a globally optimal alignment by dynamic programming that can use a variable gap penalty (VGP) function of any form. We also describe a specific function that depends on the structural context of an insertion or deletion. It penalizes gaps that are introduced within regions of regular secondary structure, buried regions, straight segments and also between two spatially distant residues. The parameters of the penalty function were optimized on a set of 240 sequence pairs of known structure, spanning the sequence identity range of 20-40%. We then tested the algorithm on another set of 238 sequence pairs of known structures. The use of the VGP function increases the number of correctly aligned residues from 81.0 to 84.5% in comparison with the optimized affine gap penalty function; this difference is statistically significant according to Student's t-test. We estimate that the new algorithm allows us to produce comparative models with an additional approximately 7 million accurately modeled residues in the approximately 1.1 million proteins that are detectably related to a known structure.  相似文献   

12.
Stabilization of lysozyme by the introduction of Gly-Pro sequence   总被引:1,自引:0,他引:1  
Three mutant lysozymes where the Asp101 – Gly102 sequenceof lysozyme was converted to Asp101–Pro102, Gly101–Pro102and Pro101–Gly102 were prepared to investigate the effectof proline residues on the stabilization of proteins. The freeenergy changes of lysozymes for the unfolding in aqueous solutionat pH 5.5 and 35°C were 10.0, 10.1, 11.0 and 7.7 kcal/molfor wild type, Asp101Pro102, Gly101Pro102 and Pro101Gly102 lysozymerespectively. When the energy level in the unfolded state ofwild type lysozyme was fixed at a standard level, the energylevels in the folded state of Asp101Pro102 and Pro101Gly102lysozymes were found to be higher than that of wild type lysozymeon the basis of GD(H2O) and entropy losses of their polypeptidechains in the unfolded state. The presence of some strain inthe folded state of these lysozymes was supported by both thecalculation of conformational energy for a trans-L-prolyl residue[Schimmel, P.R. and Flory,P.J. (1968) J. Mol. Biol, 34, 105–120] and the analysis of structures of energy-minimizedmutant lysozymes. Therefore, it is concluded that the formationof the Gly-Pro sequence is effective in avoiding possible strainin the folded state of a protein caused by the introductionof proline residue(s).  相似文献   

13.
Compensating changes in protein multiple sequence alignments   总被引:2,自引:0,他引:2  
A method was developed to identify compensating changes betweenresidues at positions in a multiple sequence alignment. (Forexample, one position might always contain a positively chargedresidue when the other is negatively charged and vice versa.)A correlation-based method was used to measure the compensationfound in the four residues at a pair of positions in any twosequences in a multiple alignment. All possible sequence pairingswere measured at the pair of positions and the resulting matrixanalysed to give a measure of cooperathity among the pairs.The basic method was sufficiently flexible to consider a numberof amino acid relatedness models based both on scalar and vectorialproperties. Pairs of compensating positions were selected bythe method and their mean separation (in a protein of knownstructure) was compared to both the mean pair-wise separationover all residues and the pairwise separation over an equivalentsample of pairs of residues selected on the basis of their conservationalone. The latter is an important control that has been omittedfrom previous studies. The results indicated that, at best,there was a slight effect (of marginal significance) leadingto the selection of closer pairs by the compensation measurewhen compared to the mean of all pairs. However, this was neveras good as the simpler measure based on conservation alone,which always found a significant majority of proteins with asample mean less than the overall mean  相似文献   

14.
Local protein sequence similarity does not imply a structural relationship   总被引:1,自引:0,他引:1  
A database search often will find a seemingly strong sequencesimilarity between two fragments of proteins that are not expectedto have an evolutionary or functional relationship. It is temptingto suggest that the two fragments will adopt a similar conformationdue to a common pattern of residues that dictate a particularsubstructure. To investigate the likelihood of such a structuralsimilarity, local sequence similarities between proteins ofknown conformation were identified by a standard database searchalgorithm. Significant sequence similarity was identified aswhen the chance probability of obtaining the relatedness scorefrom a scan of the entire database was <1%. In this regionboth true homologies and false homologies are detected. A totalof 69 false homologies was located of length between 20 and262 aligned positions. Many of these alignments had 25% sequenceidentity and a further 25% of conservative changes. However,the results show in general these aligned fragments did nothave a significant similarity in secondary or tertiary structure.Thus local sequence does not indicate a structural similaritywhen there is neither an evolutionary nor functional explanationto support this. Accordingly structure predictions based onfinding a local sequence similarity with an evolutionary unrelatedprotein of known conformation are unlikely to be valid.  相似文献   

15.
The residue pair preference profile (R3P) method is an inversefolding method that combines environmental profiles and pairpreference profiles. The method uses statistical preferencesfor residue pairs which score the likelihood of finding a profiledresidue to be paired with a residue within its local environmentAll pairs are characterized by their dihedral angles, secondarystructure and number of neighboring residues as a function ofresidue type. Each residue pair preference is expressed forall 20 amino acids of the profiled residue and is weighted bythe compatibility of the environment residue with its own localenvironment The R3P method produces an initial profile-sequencealignment which is then refined by converting the initial profileinto a profile of a target sequence threaded into the structureof the initial profile. We have tested this method by evaluatingalignments of sequences with known 3-D structures using structuralsuperposition alignments as reference. R3P-sequence alignmentsare 50% correct on average for sequences whose 3-D structurepairs superimpose with an r.m.s. deviation of 1.97 Å.The average improvement in correctness during this iterativerefinement is 14%. The R3P-sequence alignments are comparedwith sequence-sequence and 3-D profile-sequence alignments.When all three methods are combined, on average 50% of the alignmentsare correct for pairs of 3-D structures that superimpose within2.12 Å. A 3-D model of HisA is predicted with the combinedmethod.  相似文献   

16.
We describe an algorithm to predict tertiary structures of smallproteins. In contrast to most current folding algorithms, ituses very few energy parameters. Given the secondary structuralelements in the sequence—-helices and ß-strands—thealgorithm searches -the remaining conformational space of asimplified real-space representation of chains to find a minimumenergy of an exceedingly simple potential function. The potentialis based only on a single type of favorable interaction betweenhydrophobic residues, an unfavorable excluded volume term ofspatial overlaps and, for sheet proteins, an interstrand hydrogenbond interaction. Where appropriate, the known disulfide bondsare constrained by a square-law potential. Conformations aresearched by a genetic algorithm. The model predicts reasonablywell the known tertiary folds of seven out of the 10 small proteinswe consider. We draw two conclusions. First, for the proteinswe tested, this exceedingly simple potential function is noworse than others having hundreds of energy parameters in findingthe right general tertiary structures. Second, despite its simplicity,the potential function is not the weak link in this algorithm.Differences between our predicted structures and the correcttargets can be ascribed to shortcomings in our search strategy.This potential function may be useful for testing other conformationalsearch strategies.  相似文献   

17.
We investigated the correlation between the Shannon information entropy, 'sequence entropy', with respect to the local flexibility of native globular proteins as described by inverse packing density. These are determined at each residue position for a total set of 130 query proteins, where sequence entropies are calculated from each set of aligned residues. For the accompanying aggregate set of 130 alignments, a strong linear correlation is observed between the calculated sequence entropy and the corresponding inverse packing density determined at an associated residue position. This region of linearity spans the range of C(alpha) packing densities from 12 to 25 amino acids within a sphere of 9 angstrom radius. Three different hydrophobicity scales all mimic the behavior of the sequence entropies. This confirms the idea that the ability to accommodate mutations is strongly dependent on the available space and on the propensity for each amino acid type to be buried. Future applications of these types of methods may prove useful in identifying both core and flexible residues within a protein.  相似文献   

18.
The automatic identification of motifs associated with a givenfunction is an important challenge for molecular sequence analysis.A method is presented for the extraction of such patterns fromlarge sets of unaligned sequences with related but general function,for example, a set of heat shock proteins. In such a set ofproteins there can often be several subfamilies each characterizedby one or more distinct motifs. The aim is to develop computationaltools to identify these motifs. The algorithm presented locateshigh frequency words of length k with a given number of positions,r, fixed. Statistics for a binomial distribution are used toassess the significance of the words. The high-frequency wordsare clustered and highly populated clusters retained. The compositionof the clusters is displayed graphically. A set of motifs associatedwith the sequence family can automatically be extracted. Themethod is benchmarked on a set of 106 heat shock sequences anda set of 257 toxin sequences. It is shown to recover previouslyidentified motifs.  相似文献   

19.
The energetics of alkane dissolution and partition between waterand organic solvent are described in terms of the energy ofcavity formation and solute-solvent interaction using scaledparticle theory. Thermodynamic arguments are proposed that allowcomparison of experimental measurements of the surface areawith values calculated from an all-atom representation of thesolute. While the surface tension relating to the accessiblesurface is shape dependent, it is found that for the molecularsurface it is not. This model rationalizes the change in surfacetension between the microscopic (20–30 cal/mol/A2) andmacroscopic (70–75 cal/mol/A2) regimes without the needto invoke Flory-Huggins theory or to apply other corrections.The difference in the values arises (i) to a small extent asa result of the curvature dependence of surface tension and(ii) to a large extent due to the difference in the molecularsurface derived from the experiment and that calculated froman extended all-atom model. The model suggests that the primarydriving force for alkane association in water is due to thetendency of water to reduce the solute cavity surface. It isargued that to model the energetics of alkane association, thesurface tension should be related to the molecular surface (ratherthan the accessible surface) with a surface tension near themacroscopic limit for water. This model is compared with resultsfrom theoretical simulations of the hydrophobic effect for twowell-studied systems. The implications for antibody– antigeninteractions and the effect of hydrophobic amino acid deletionon protein stability are discussed. The approach can be usedto model the solute cavity formation energy in solution as afirst step in the continuum modelling of biomolecular interactions  相似文献   

20.
The Engrailed Homeodomain folds on the microsecond time scale via an intermediate that is experimentally well characterised using structural Engrailed-Homeodomain mimics. Here, we analysed directly the changes in distance between key residues during the kinetics of unfolding and at equilibrium using fluorescence resonance energy transfer (FRET). Trp was the donor and 5-(((acetylamino)ethyl)amino) naphthalene-1-sulphate, the acceptor, substituted in positions that caused little change in stability. Distances calculated for the native state were in good agreement with those derived from the NMR structure. The distances between the N- and C-termini of Helix I and of Helix III increased, then decreased and finally increased again with increasing GdmCl concentration on equilibrium denaturation. This behaviour implied that there was a folding intermediate on the folding pathway and that this intermediate was populated at low concentrations of GdmCl concentration ( approximately 1 M). We analysed the changes in distance during temperature-jump relaxation kinetics, using a qualitative and very conservative procedure that drew conclusions only when changes in fluorescence of mutants containing either the donor or the acceptor alone would not obscure the change in the FRET signal when both donor and acceptor were present. The distance changes obtained under equilibrium and kinetic measurements were self-consistent and also consistent with the known high-resolution structures of the mimics of the folding intermediates. We showed that for analysing distances in disordered ensembles, it is important to use FRET probes with a critical distance close to the average separation in the ensemble. Otherwise, average distances could be over or underestimated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号