首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Although the conformational states of protein side chains canbe described using a library of rotamers, the determinationof the global minimum energy conformation (GMEC) of a largecollection of side chains, given fixed backbone coordinates,represents a challenging combinatorial problem with importantapplications in the field of homology modelling. Recently, wehave developed a theoretical framework, called the dead-endelimination method, which allows us to identify efficientlyrotamers that cannot be members of the GMEC. Such dead-endingrotamers can be iteratively removed from the system under studythereby tracking down the size of the combinatorial problem.Here we present new developments to the dead-end eliminationmethod that allow us to handle larger proteins and more extensiverotamer libraries. These developments encompass (i) a procedureto determine weight factors in the generalized dead-end eliminationtheorem thereby enhancing the elimination of dead-ending rotamersand (ii) a novel strategy, mainly based on logical argumentsderived from the logic pairs theorem, to use dead-ending rotamerpairs in the efficient elimination of single rotamers. Thesedevelopments are illustrated for proteins of various sizes andthe flow of the current method is discussed in detail. The effectivenessof dead-end elimination is increased by two orders of magnitudeas compared with previous work. In addition, it now becomesfeasible to use extremely detailed libraries. We also providean appendix in which the validity of the generalized dead-endcriterion is shown. Finally, perspectives for further applicationswhich may now become within reach are discussed.  相似文献   

2.
A 3-D model of a protein can be constructed from its amino acidsequence and the 3-D structures of one or more homologues byannealing three sets of fragments: the structurally conservedregions, structurally variable regions and the side chains.The method encoded in the computer program COMPOSER was assessedby generating 3-D models of eight proteins whose crystal structuresare already known and for which 3-D structures of homologuesare available. In the structurally conserved regions, differencesbetween modelled and X-ray structures are smaller than the differencesbetween the X-ray structures of the modelled protein and thehomologues used to build the model. When several homologuesare used, the contributions of the known structures are weighted,preferably by the square of sequence similarity; this is especiallyimportant when the similarities of the homologues to the modelledstructure differ greatly. The ‘collar’ extensionapproach, in which a similar region of different length in ahomologue is used to extend the framework, can result in a moreaccurate model. If known homologues comprise more than one relatedgroup of proteins and they are both distantly related to theunknown, then alignment of the sequence to be modelled witheach group of homologues facilitates identification of structurallyconserved regions of the unknown and leads to an improved model.Models have root mean square differences (r.m.s.d.s) with thestructures defined by X-ray analysis of between 0.73 and 1.56Å for all C atoms, for seven of the eight models. Forthe model of mucor pepsin, where the closest homologue has 33%sequence identity and 20% of the residues are in structurallyvariable regions, the r.m.s.d. for the framework region is 1.71Å and the r.m.s.d. for all C atoms is 3.47 Â.  相似文献   

3.
An object-oriented database system has been developed whichis being used to store protein structure data. The databasecan be queried using the logic programming language Prolog orthe query language Daplex. Queries retrieve information by navigatingthrough a network of objects which represent the primary, secondaryand tertiary structures of proteins. Routines written in bothProlog and Daplex can integrate complex calculations with theretrieval of data from the database, and can also be storedin the database for sharing among users. Thus object-orienteddatabases are better suited to prototyping applications andanswering complex queries about protein structure than relationaldatabases. This system has been used to find loops of varyinglength and anchor positions when modelling homologous proteinstructures.  相似文献   

4.
A general solution to the problem of directly incorporatingdata from multiple sequence alignments into the constructionof molecular models was approached through the calculation ofan estimated pairwise distance based on conserved hydrophobidty.A scaling method was developed that allowed the required bulkgeometric properties of the estimated pair-wise distances (meanand mean squared) to mimic those expected in a globular protein.These properties were maintained independently of the composition,length, number or degree of conservation of the original sequences.Despite being a poor estimate for individual distances, thescaled distances were found to be compatible with the nativestructure and could be weighted highly. While the estimateddistances provided a general drive towards hydrophobk packing,more specific structures (including secondary structures andmotifs) were induced by regularization towards an ideal form.These constraints were used to refine an outline starting structure(derived only from secondary structure axes) towards a compactform that was sufficiently protein-like for side chains to beadded with almost no further adjustment of the a-carbon positions.This process allows rough folds based on abstract representationsof protein architecture to be rapidly converted to a form wherethey can be analysed by the growing number of methods designedto assess molecular models.  相似文献   

5.
Protein sequence alignments can be unproved when at least oneof the proteins to be aligned has a known 3-D structure. Inthis work, geometrical constraints extracted from the targetfold are evaluated in independent units that deal with complementarystructural features. This information is used to set up mutationtables specific to the locally observed structural environments.The resulting partial evaluations are then combined linearlyinto a global function which is optimized by dynamic programming.Eventually, a score based on tertiary interactions can be usedas a selection criterion to discriminate among a set of suboptimalalignments. The relevance of the scores given by each unit istested on a representative set of protein families. Finally,a method for combining the different scores is described andits efficiency is evaluated on a few pairs of weakly homologousproteins.  相似文献   

6.
A model of the lignin peroxidase LIII of Phlebia radiata wasconstructed on the basis of the structure of cytochrome c peroxidase(CCP). Because of the low percentage of amino acid identitybetween the CCP and the lignin peroxidase LIII of Phlebia radiata,alignment of the sequences was based on the generation of atemplate from a knowledge of the 3-D structure of CCP and consensussequences of lignin peroxidases. This approach gave an alignmentin which all the insertions in the lignin peroxidase were placedat loop regions of CCP, with a 21.1% identity for these twoproteins. The model was constructed using this alignment andthe computer program COMPOSER, which assembles the model asa series of rigid fragments derived from CCP and other proteins.Manual intervention was required for some of the longer loopregions. The -helices forming the structural framework, andespecially the haem environment of CCP, are conserved in theLIII model and the core is close packed without holes. A possiblesite of the substrate oxidation at the haem edge of LIII isdiscussed.  相似文献   

7.
The method of simulated annealing can be of use in protein structureprediction by homology modelling where side chain conformationsmust be predicted. In this study an attempt has been made tooptimize a molecular dynamics method for this purpose. Heatingand cooling protocols to maximize the accuracy of the predictionshave been developed. The optimized protocol involves coolingfrom 3000 to 0 K over 20 ps while simultaneously introducingthe non-bonded energy term. The use of a 'soft' non-bonded interactionenergy term in place of a standard 6–12 potential is foundto be important. The reliability of the predictions has beenanalysed in terms of the environment of the residues (solventaccessibility) and the degree of uncertainty in the structure(number of unknown torsion angles). Depending on these factorsthe percentage of unknown side chain torsion angles that arecorrectly predicted within 30° ranges from –50 to75%. Potential problems and limitations of the method are discussed.  相似文献   

8.
A major problem in predicting protein structure by homologymodelling is that the sequence alignment from which the modelis built may not be the best one in terms of the correct equivalencingof residues assessed by structural or functional criteria. Auseful strategy is to generate and examine a number of suboptimalalignments as better alignments can often be found away fromthe optimal. A procedure to filter rapidly suboptimal alignmentsbased on measurement of core volumes and packing pair potentialsis investigated. The approach is benchmarked on three pairsof sequences which are non-trivial to align correctly, namelytwo immunoglobulin domains, plastocyanin with azurin and twodistant globin sequences. It is shown to be useful to reducea large ensemble of possible alignments down to a few whichcorrespond more closely to the correct (structure based) alignment.  相似文献   

9.
In search of the ideal protein sequence   总被引:1,自引:0,他引:1  
The inverse of a folding problem is to find the ideal sequencethat folds into a particular protein structure. This problemhas been addressed using the topology fingerprintbased threadingalgorithm, capable of calculating a score (energy) of an arbitrarysequence-structure pair. At first, the search is conducted byunconstrained minimization of the energy in sequence space.It is shown that using energy as the only design criterion leadsto spurious solutions with incorrect amino acid composition.The problem lies in the general features of the protein energysurface as a function of both structure and sequence. The proposedsolution is to design the sequence by maximizing the differencebetween its energy in the desired structure and in other knownprotein structures. Depending on the size of the database ofstructures ‘to avoid’, sequences bearing significantsimilarity to the native sequence of the target protein areobtained using this procedure.  相似文献   

10.
Membrane proteins: from sequence to structure   总被引:12,自引:0,他引:12  
The prediction of protein structure from sequence has been along-standing goal of molecular biology. Integral membrane proteins,once abhorred by protein chemists and crystallographers becauseof their insolubility and stubborn refusal to yield good crystals,now appear to hold great promises for efficient structure predictionand engineering. This is mainly due to the constraints on permissiblestructures imposed by the lipid environment, and to the apparentuncoupling between an initial membrane targeting and insertionprocess which determines the overall topological arrangementof the transmembrane segments and a subsequent –condensation—of these segments into a unique folded state. Recent work suggeststhat the membrane insertion process is controlled by simplesequence elements composed of different combinations of longhydrophobic regions and flanking charged residues. In this reviewwe sketch the most unportant structural rules relating aminoacid sequence to membrane insertion to fully folded molecule,and their use for prediction and protein-engineering purposes.  相似文献   

11.
Bovine pancreatic /S-trypsin (PDB ID-code: 1TPO) which is registeredin the Brookhaven Protein Data Bank (PDB) consists of four exons.The results of homology searches for each exon in the PDB showedthat homologous proteins were tonin (PDB ID-code: 1TON), ratmast cell protease (PDB ID-code: 3RP2_A), kaffikrein A (PDBID-code: 2PKA_B) and kallikrein A (2PKA_B) respectively. Thus,for the three-dimensional structure prediction of 1TPO, a chimeraprotein was constructed from the three proteins mentioned aboveand the 3-D structure prediction was performed using this chimerareference protein. The modelled structure of 1TPO was energeticallyoptimized by molecular mechanics and molecular dynamics simulationand was compared with its X-ray crystal structure registeredin the PDB. The root mean square deviations (r.m.s.d.) of mainchain atoms and the neighbouring active site (5 sphere fromHis57, AsplO2 and Serl95) between the modelled structure andthe X-ray structure were 1.66 and 0.94 respectively. Porcinepancreatic elastase (PDB ID-code: 3EST) which is registeredin the PDB was used as the reference protein and the modelledstructure from 3EST was also compared with the X-ray data. Ther.m.s.d. of main chain atoms and that of the active site were2.14 and 1.18 respectively. These results dearly support thepropriety of this method using the chimera reference protein.  相似文献   

12.
Homology modelling has been used to model stefin A based onthe X-ray structure of stefin B. Several models have been producedby interactive modelling or positioning of the side chains byMonteCarlo procedure with simulated annealing.The quality of modelswas evaluated by calculation of the free energy of hydration,3D-1D potential or buried area of surface accessibility. StefinA is a thermostable protein, exhibiting a two-state denaturation,while stefin B denaturesat a 40°C lower temperature andforms a stable molten globule intermediate under mild denaturingconditions. From the tertiary structures, thermodynamic functionswere predicted, conforming closely to the experimental calorimetrkresults. Polar and apolar buried areas of surface accessibilitywere obtained by structural deconvolution of the thermograms.It is suggested that the bask difference between the stefinsis the domination of hydrophobic interaction in the stabilizationof stefin B, which is due to its non-specific nature leadingto the formation of a molten globule intermediate. Modellingof stefin A predicts increased numbers of hydrogen bonds whichstabilize it and the increase the cooperativity of its denaturation.  相似文献   

13.
A comparison has been made between the homology and hydrophobkityprofiles of six interleukin amino add sequences and that ofthe human interleukin 1ß (IL-lß) for whicha crystal structure exists. The resulting sequence alignmentwas used to build model structures for the sequences for threeIL-l, two IL-1ß and an interleukin receptor antagonist.Analysis of these structures demonstrates that the interleukinmolecule has a strong electric dipole which is generated bythe topological position of the amino acids in the sequence.Electrostatic surface calculations implicate a particular residues(Lysl45) as being fundamental to interleukin activity and thissupports site-directed mutation evidence that this residue isrequired for activity.  相似文献   

14.
Haloalkane dehalogenases catalyse environmentally importantdehalogenation reactions. These microbial enzymes representobjects of interest for protein engineering studies, attemptingto improve their catalytic efficiency or broaden their substratespecificity towards environmental pollutants. This paper presentsthe results of a comparative study of haloalkane dehalogenasesoriginating from different organisms. Protein sequences andthe models of tertiary structures of haloalkane dehalogenaseswere compared to investigate the protein fold, reaction mechanismand substrate specificity of these enzymes. Haloalkane dehalogenasescontain the structural motifs of /ß-hydrolases and epoxidaseswithin their sequences. They contain a catalytic triad withtwo different topological arrangements. The presence of a structurallyconserved oxyanion hole suggests the two-step reaction mechanismpreviously described for haloalkane dehalogenase from Xanthobacterautotrophicus GJ10. The differences in substrate specificityof haloalkane dehalogenases originating from different speciesmight be related to the size and geometry of an active siteand its entrance and the efficiency of the transition stateand halide ion stabilization by active site residues. Structurallyconserved motifs identified within the sequences can be usedfor the design of specific primers for the experimental screeningof haloalkane dehalogenases. Those amino acids which were predictedto be functionally important represent possible targets forfuture site-directed mutagenesis experiments.  相似文献   

15.
The three-dimensional structure of tomato P31 and T10 Cu,Znsuperoxide dismutases (SODs) were computer modelled using thestructure of the bovine enzyme as a template. The structure-essentialresidues retain in the models the position occupied in the otherCu,Zn SODs of known 3D structure and the overall packing ofthe ß-barrel is maintained. Formation of ‘aromaticpairs’occurs between newly inserted aromatic residues.The number of total charges changes in the two variants andsome charged residues located in the proximity of the activesite in most Cu,Zn SODs disappear in tomato enzymes. Calculationof the electrostatic potential field, carried out by numericallysolving the Poisson-Boltzmann equation, indicates that in bothvariants a negative potential field surrounds all the proteinsurface except the active site areas, characterized by positivepotential values, as already observed in the bovine enzyme.This result confirms that coordinated mutations of charged residueshave occurred in the evolution of this enzyme giving rise toa peculiar electrostatic potential distribution common to allmembers of this protein family.  相似文献   

16.
The automatic identification of motifs associated with a givenfunction is an important challenge for molecular sequence analysis.A method is presented for the extraction of such patterns fromlarge sets of unaligned sequences with related but general function,for example, a set of heat shock proteins. In such a set ofproteins there can often be several subfamilies each characterizedby one or more distinct motifs. The aim is to develop computationaltools to identify these motifs. The algorithm presented locateshigh frequency words of length k with a given number of positions,r, fixed. Statistics for a binomial distribution are used toassess the significance of the words. The high-frequency wordsare clustered and highly populated clusters retained. The compositionof the clusters is displayed graphically. A set of motifs associatedwith the sequence family can automatically be extracted. Themethod is benchmarked on a set of 106 heat shock sequences anda set of 257 toxin sequences. It is shown to recover previouslyidentified motifs.  相似文献   

17.
The instabilities of the native structures of mutant proteinswith an amino acid exchange are estimated by using the contactenergy and the number of contacts for each type of amino acidpair, which were estimated from 18 192 residue–residuecontacts observed in 42 crystals of globular proteins. Theywere then used to evaluate a transition probability matrix ofcodon substitutions and a log relatedness odds matrix, whichis used as a scoring matrix to measure the similarity betweenprotein sequences. To consider amino acid substitutions in homologousproteins, base mutation rates and the effects of the geneticcode are also taken into account. The average fitness of anamino acid exchange is approximated to be proportional to thestructural stability of the mutant protein, which is then approximatedby the average energy change of the protein native structureexpected for the ammo acid exchange with neglect of the energychange of the denatured state. In global and local homologysearches, this scoring matrix tends to yield significantly higheralignment scores than either the unitary matrix or the geneticcode matrix, and also may yield higher alignment scores fordistantly related protein pairs than MDM78. One of advantagesof this scoring matrix is that the equilibrium frequencies ofcodons and also base mutation rates can be adjusted.  相似文献   

18.
A relational database of protein structure has been developedto enable rapid and flexible enquiries about the occurrenceof many aspects of protein architecture. The coordinates of294 proteins from the Brookhaven Data Bank have been processedby standard computer programs to generate many additional termsthat quantify aspects of protein structure. These terms includesolvent accessibility, main-chain and side-chain dihedral angles,and secondary structure. In a relational database, the informationis stored in tables with columns holding the different termsand rows holding the different entries for the terms. The differentrelational base tables store the information about the proteincoordinate set, the different chains in the protein, the aminoacid residues and ligands, the atomic coordinates, the saltbridges, the hydrogen bonds, the disulphide bridges and theclose tertiary contacts. The database was established underORACLE management system. Enquiries are constructed in ORACLEusing SQL (structured query language) which is simple to useand alleviates the need for extensive computer programs. A singletable can be searched for entries that meet various criteria,e.g. all protein solved to better than a given resolution. Thepower of the database occurs when several tables, or the entriesin a single table, are cross-correlated. For example the dihedralangles of proline in the fourth position in an -helix in highresolution structures can be rapidly obtained. The structuraldatabase provides a powerful tool to obtain empirical rulesabout protein conformation. This database of protein structuresis part of a joint project between Birkbeck College and LeedsUniversity to establish an integrated data resource of proteinsequences and structures (ISIS) that encodes the complex patternsof residues and coordinates that define protein conformation.The entire data resource (ISIS) will provide a system to guideall areas of protein modelling including structure prediction,site-directed mutagenesis and de novo protein design. The availabilityof ISIS is described in the paper.  相似文献   

19.
A multiple sequence alignment algorithm is described that usesa dynamic programming-based pattern construction method to aligna set of homologous sequences based on their common patternof conserved sequence elements. This pattern-induced multi-sequencealignment (PUMA) algorithm can employ secondary-structure dependentgap penalties for use in comparative modelling of new sequenceswhen the three-dimensional structure of one or more membersof the same family is known. We show that the use of secondarystructure information can significantly improve the accuracyof aligning structure boundaries in a set of homologous sequenceseven when the structure of only one member of the family isknown  相似文献   

20.
The calcitonin gene-related peptide (CGRP) is a 37 residue neuropeptidewhich causes vasodilatation, increases heart rate and inhibitsbone resorption. These effects make it an interesting lead fordrug discovery. We have combined current structural and biologicalinformation to model the structure of hCGRP-ß to beused as a basis for the rational design of novel analogues.Distinct regions of CGRP have been shown to be responsible forthe activity of the whole molecule. Thus, the structure of thepeptide was modelled in four parts which were finally combined.A random search of conformational space was performed for thefragments CGRP1–8 and CGRP30–37 which have beenshown to be central for receptor activation and binding, respectively.Five low-energy hCGRP-ß structures were obtained frommodelled fragments by molecular dynamics. The relevance of theapproach was verified by comparing the models with NMR structuresof CGRP and calcitonin. The models obtained for the N- and C-terminalfragments should enable the design of novel agonists and antagonistsof the CGRP receptor, respectively. Models of the whole moleculemay be used in the design of peptides with shortened spacersbetween the receptor-bound regions. The approach described isapplicable to several related peptide hormones, like growthhormone-releasing hormone and secretin.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号