首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 11 毫秒
1.
A relational database of protein structure has been developedto enable rapid and flexible enquiries about the occurrenceof many aspects of protein architecture. The coordinates of294 proteins from the Brookhaven Data Bank have been processedby standard computer programs to generate many additional termsthat quantify aspects of protein structure. These terms includesolvent accessibility, main-chain and side-chain dihedral angles,and secondary structure. In a relational database, the informationis stored in tables with columns holding the different termsand rows holding the different entries for the terms. The differentrelational base tables store the information about the proteincoordinate set, the different chains in the protein, the aminoacid residues and ligands, the atomic coordinates, the saltbridges, the hydrogen bonds, the disulphide bridges and theclose tertiary contacts. The database was established underORACLE management system. Enquiries are constructed in ORACLEusing SQL (structured query language) which is simple to useand alleviates the need for extensive computer programs. A singletable can be searched for entries that meet various criteria,e.g. all protein solved to better than a given resolution. Thepower of the database occurs when several tables, or the entriesin a single table, are cross-correlated. For example the dihedralangles of proline in the fourth position in an -helix in highresolution structures can be rapidly obtained. The structuraldatabase provides a powerful tool to obtain empirical rulesabout protein conformation. This database of protein structuresis part of a joint project between Birkbeck College and LeedsUniversity to establish an integrated data resource of proteinsequences and structures (ISIS) that encodes the complex patternsof residues and coordinates that define protein conformation.The entire data resource (ISIS) will provide a system to guideall areas of protein modelling including structure prediction,site-directed mutagenesis and de novo protein design. The availabilityof ISIS is described in the paper.  相似文献   

2.
A 3-D model of a protein can be constructed from its amino acidsequence and the 3-D structures of one or more homologues byannealing three sets of fragments: the structurally conservedregions, structurally variable regions and the side chains.The method encoded in the computer program COMPOSER was assessedby generating 3-D models of eight proteins whose crystal structuresare already known and for which 3-D structures of homologuesare available. In the structurally conserved regions, differencesbetween modelled and X-ray structures are smaller than the differencesbetween the X-ray structures of the modelled protein and thehomologues used to build the model. When several homologuesare used, the contributions of the known structures are weighted,preferably by the square of sequence similarity; this is especiallyimportant when the similarities of the homologues to the modelledstructure differ greatly. The ‘collar’ extensionapproach, in which a similar region of different length in ahomologue is used to extend the framework, can result in a moreaccurate model. If known homologues comprise more than one relatedgroup of proteins and they are both distantly related to theunknown, then alignment of the sequence to be modelled witheach group of homologues facilitates identification of structurallyconserved regions of the unknown and leads to an improved model.Models have root mean square differences (r.m.s.d.s) with thestructures defined by X-ray analysis of between 0.73 and 1.56Å for all C atoms, for seven of the eight models. Forthe model of mucor pepsin, where the closest homologue has 33%sequence identity and 20% of the residues are in structurallyvariable regions, the r.m.s.d. for the framework region is 1.71Å and the r.m.s.d. for all C atoms is 3.47 Â.  相似文献   

3.
Two principal methods of determining the conformation of shortpieces of polypeptide backbone in proteins have been developed:using a database of known structures and systematicallygeneratingall conformations. In this paper, we compare the effectivenessof these two techniques. The completeness of the database forsegments of different lengths is examined and it is found tocontain most conformations for segments seven residues long,but to deteriorate rapidly for longer regions. When the databasesegment is to be incorporated into the rest of a structure,at least seven residues are required to build four new residues,because of the need to positionthe segment relative to the restof the structure.It is found that such positioning using flankingresidues results in large errors in the inserted region. Weconclude that the database method is currently not effectivefor comparative modeling, even for short segments. The systematicsearchprocedure is found to generate almost all structures of shortsegments found in proteinsIn contrast to the database method,low root mean square error structures are obtained for a setof trial segments embedded in the rest of a protein structure.Thus, it should be considered the method of choice.  相似文献   

4.
Easy adaptation of protein structure to sequence   总被引:4,自引:0,他引:4  
An investigation into the conservation of coarse, medium andfine grain structural properties has been performed over a dataset of 175 protein tertiary structures in 34 different families,each characterized by a common core fold and a library of conservedsites formed for each family. It is shown that, while the conservationof coarse and medium grain properties correlates to the structuraldeviation between the proteins, fine grain properties are poorlyconserved except in functional sites. This flexibility in finegrain properties suggests that folding can be viewed as an optimizationprocess whereby side chains have freedom to position themselvesas best as possible given environmental conformationa] constraintsand that given a basic framework, the local structure is ableto adapt easily to sequence variation. The conserved cores ofthe 34 families are used to estimate a minimal core size of35% of the fold, consistent with buried residue considerations.Finally, conservation in side chain l torsion angles is combinedwith structural deviation, sequence deviation and resolutionto suggest a set of example structure pairs suitable for testingautomatic homology modelling programs  相似文献   

5.
6.
An algorithm for automatically generating protein topology cartoons   总被引:4,自引:0,他引:4  
An algorithm is described for automatically generating proteintopology cartoons. This algorithm optimally places circles andtriangles depicting ß-heikes and ß-strands respectivelygiving a pictorial topological summary of any protein structure.ß-Sheets, sandwiches and barrels are automatically identifiedand represented using special templates. The output from thisalgorithm may be controlled by adjustment of variable weightsduring the optimization step giving a preferred result. Therules for generating protein toplogy cartoons, including considerationof the handedness of local structure motifs, are discussed.The design of this algorithm is completely general and is easilyadapted to include further rules that dictate the generationof the cartoons  相似文献   

7.
Although the conformational states of protein side chains canbe described using a library of rotamers, the determinationof the global minimum energy conformation (GMEC) of a largecollection of side chains, given fixed backbone coordinates,represents a challenging combinatorial problem with importantapplications in the field of homology modelling. Recently, wehave developed a theoretical framework, called the dead-endelimination method, which allows us to identify efficientlyrotamers that cannot be members of the GMEC. Such dead-endingrotamers can be iteratively removed from the system under studythereby tracking down the size of the combinatorial problem.Here we present new developments to the dead-end eliminationmethod that allow us to handle larger proteins and more extensiverotamer libraries. These developments encompass (i) a procedureto determine weight factors in the generalized dead-end eliminationtheorem thereby enhancing the elimination of dead-ending rotamersand (ii) a novel strategy, mainly based on logical argumentsderived from the logic pairs theorem, to use dead-ending rotamerpairs in the efficient elimination of single rotamers. Thesedevelopments are illustrated for proteins of various sizes andthe flow of the current method is discussed in detail. The effectivenessof dead-end elimination is increased by two orders of magnitudeas compared with previous work. In addition, it now becomesfeasible to use extremely detailed libraries. We also providean appendix in which the validity of the generalized dead-endcriterion is shown. Finally, perspectives for further applicationswhich may now become within reach are discussed.  相似文献   

8.
An investigation of protein subunit and domain interfaces   总被引:2,自引:0,他引:2  
Protein structures were collected from the Brookhaven Databaseof tertiary architectures that displayed oligomeric association(24 molecules) or whose polypeptide folding revealed domains(34 proteins). The subunit and domain interfaces for these proteinswere respectively examined from the following aspects: percentagewater-accessible surface area buried by the respective associations,surface compositions and physical characteristics of the residuesinvolved in the subunit and domain contacts, secondary structuralstate of the interface amino acids, preferred polar and non-polarinteractions, spatial distribution of polar and non-polar residueson the interface surface, same residue interactions in the oligomeric:contacts, and overall cross-section and shape of the contactsurfaces. A general, consistent picture emerged for both thedomain and subunit interfaces.  相似文献   

9.
The method of simulated annealing can be of use in protein structureprediction by homology modelling where side chain conformationsmust be predicted. In this study an attempt has been made tooptimize a molecular dynamics method for this purpose. Heatingand cooling protocols to maximize the accuracy of the predictionshave been developed. The optimized protocol involves coolingfrom 3000 to 0 K over 20 ps while simultaneously introducingthe non-bonded energy term. The use of a 'soft' non-bonded interactionenergy term in place of a standard 6–12 potential is foundto be important. The reliability of the predictions has beenanalysed in terms of the environment of the residues (solventaccessibility) and the degree of uncertainty in the structure(number of unknown torsion angles). Depending on these factorsthe percentage of unknown side chain torsion angles that arecorrectly predicted within 30° ranges from –50 to75%. Potential problems and limitations of the method are discussed.  相似文献   

10.
To examine the feasibility of a ß structure for thepore-lining region of the voltage-gated potassium channel, wehave characterized a family of 12 antiparallel ß-barrels.Each is comprised of four identical pairs of ß-strandsorganized with approximate 4-fold symmetry about a channel axis.The Cand N-termini of the ß-strand pairs are assumedto be at the extracellular end of the channel, and each pairis connected by a hairpin turn at the intracellular end of thechannel. The models differ in the residues located in the hairpinturn and in the orientation of the two strands of each pairin the barrel, i.e. whether the C-terminus of a pair is clockwise(CW) or counterclockwise (CCW) from the N-terminus when thechannel is viewed from outside the cell. Following known structureprecedents and potential energy predictions, the barrel is assumedto be right-twisting in all cases. All models have crowded layersof inward-projecting aromatic sidechains near the center ofthe channel which could regulate channel selectivity. The modelswith an odd number of amino acids in the hairpin turn have theadvantage of predicting that F433 points into the barrel, butthe disadvantage that V438 does not. Of these models, two ofthe models are most consistent with the external tetraethylanunonhim(TEA) block data, and of those, one (T439 CCW 3:5) is most consistentwith the internal TEA block data.  相似文献   

11.
Using discriminant analysis, three types of protein secondarystructure segments—helices, ß-strands and coils—arediscriminated by amino acid sequence information alone. A variablein the discriminant analysis is defined by the amino acid indexused to represent the sequence data and by the calculation methodused to extract a feature in this representation. Thus, thethree types of secondary structure segments derived from a setof non-homologous proteins from the Protein Data Bank are analyzedby 888 variables, which correspond to the mean, standard deviation,3.6-residue periodicity and 2-residue periodicity for the numericalprofiles determined from 222 published amino acid indices. Thesevariables are combined to obtain best discrimination of thethree types of segments. When up to three variables are combined,the best discrimination rate was 75%. The variables selectedconsist of the mean of propensity (or turn propensity), themean of ß propensity, and the 3.6-residue periodicityof hydrophobicity. This variable selection procedure can alsobe applied to other types of discrimination problem, once groupsof sequence data are properly organized.  相似文献   

12.
We present a novel method that predicts transmembrane domainsin proteins using solely information contained in the sequenceitself. The PRED-TMR algorithm described, refines a standardhydrophobicity analysis with a detection of potential termini(`edges', starts and ends) of transmembrane regions. This allowsone both to discard highly hydrophobic regions not delimitedby clear start and end configurations and to confirm putativetransmembrane segments not distinguishable by their hydrophobiccomposition. The accuracy obtained on a test set of 101 non-homologoustransmembrane proteins with reliable topologies compares wellwith that of other popular existing methods. Only a slight decreasein prediction accuracy was observed when the algorithm was appliedto all transmembrane proteins of the SwissProt database (release35). A WWW server running the PRED-TMR algorithm is availableat http://o2.db.uoa.gr/PRED-TMR/  相似文献   

13.
Structural models for the eukaryotic cell cycle control proteinp34 from human, S.pombe and S.cerevisiae have been derived fromthe crystallographic coordinates of the cAMP-dependent proteinkinase (cAPK) catalytic subunit (active conformation) and comparedwith the structure of Inactive CDK2 apoenzyme. Differences betweenthe p34 and cAPK catalytic sites provide a possible explanationfor their different substrate specificities. The p34 modelslocalize Tyrl5 and Thrl4 close to the sites of catalysis andsubstrate recognition where their phosphorylatlon could inhibitp34 kinase activity either by blocking MgATP or substrate binding.The conserved sequences PSTAIRE and LYLIFEFL are both closeto the catalytic site and accessible on the protein surfaceavailable to mediate interactions with other proteins. It ispredicted that p34 has an active-site cleft composed almostentirely of sequences common to all protein kinases and sequencesunique to the p34 protein family. Genetic and biochemical analysesof p34 have shown that it interacts extensively with a numberof other proteins. The model allows the relative dispositionof these sites of mutation to each other and to the sites ofcatalysis and substrate recognition to be appreciated. Surfaceregions on p34 that are important for function have been identified.These sites identify residues that may interact with p13SUCL,cydin, plO7WEEL and p80cdc25  相似文献   

14.
A method for comparison of protein sequences based on theirprimary and secondary structure is described. Protein sequencesare annotated with predicted secondary structures (using a modifiedChou and Fasman method). Two lettered code sequences are generated(Xx, where X is the amino acid and x is its annotated secondarystructure). Sequences are compared with a dynamic programmingmethod (STRALIGN) that includes a similarity matrix for boththe amino acids and secondary structures. The similarity valuefor each paired two-lettered code is a linear combination ofsimilarity values for the paired amino acids and their annotatedsecondary structures. The method has been applied to eight globinproteins (28 pairs) for which the X-ray structure is known.For protein pairs with high primary sequence similarity (>45%),STRALIGN alignment is identical to that obtained by a dynamicprogramming method using only primary sequence information.However, alignment of protein pairs with lower primary sequencesimilarity improves significantly with the addition of secondarystructure annotation. Alignment of the pair with the least primarysequence similarity of 16% was improved from 0 to 37% ‘correct’alignment using this method. In addition, STRALIGN was successfullyapplied to seven pairs of distantly related cytochrome c proteins,and three pairs of distantly related picornavirus proteins.  相似文献   

15.
PRINTS-a protein motif fingerprint database   总被引:7,自引:0,他引:7  
The PRINTS database of protein ‘fingerprints’ isdescribed. Fingerprints comprise sets of motifs excised fromconserved regions of sequence alignments, their diagnostic poweror potency being refined by iterative database scanning (inthis case the OWL composite sequence database). Generally, themotifs do not overlap, but are separated along a sequence, thoughthey may be contiguous in 3-D space. The use of groups of independent,linearly or spatially separate motifs allows particular proteinfolds and functionalities to be characterized more flexiblyand powerfully than conventional single-component patterns orregular expressions. The current version of the database (4.0)contains 150 entries (encoding >700 motifs), covering a widerange of globular and membrane proteins, modular polypeptidesand so on. The growth of the database is influenced by a numberof factors, e.g. the use of multiple motifs, the maximizationof sequence information through iterative database scanningand the fact that the database searched is a large composite.The information contained within PRINTS is distinct from butcomplementary to the single consensus expressions stored inthe widely used PROSITE dictionary of patterns.  相似文献   

16.
We have performed molecular dynamics simulation of Rhizomucormiehei lipase (Rml) with explicit water molecules present. Thesimulation was carried out in periodic boundary conditions andconducted for 1.2 ns in order to determine the concerted proteindynamics and to examine how well the essential motions are preservedalong the trajectory. Protein motions are extracted by meansof the essential dynamics analysis method for different lengthsof the trajectory. Motions described by eigenvector 1 convergeafter approximately 200 ps and only small changes are observedwith increasing simulation time. Protein dynamics along eigenvectorswith larger indices, however, change with simulation time andgenerally, with increasing eigenvector index, longer simulationtimes are required for observing similar protein motions (alonga particular eigenvector). Several regions in the protein showrelatively large fluctuations and in particular motions in theactive site lid and the segments Thr57–Asn63 and the activesite hinge region Pro101–Gly104 are seen along severaleigenvectors. These motions are generally associated with glycineresidues, while no direct correlations are observed betweenthese fluctuations and the positioning of prolines in the proteinstructure. The partial opening/closing of the lid is an exampleof induced fit mechanisms seen in other enzymes and could bea general mechanism for the activation of Rml.  相似文献   

17.
In protein engineering and design it is very important thatresidues can be inspected in their specific environment. A standardrelational database system cannot serve this purpose adequatelybecause it cannot handle relations between individual residues.With SCAN3D we introduce a new database system for integratedsequence and structure analysis of proteins. It uses the relationalparadigm wherever possible. Its main power, however, stems fromthe ability to retrieve stretches of consecutive residues withcertain properties by comparing a property profile with allstretches of residues in the database, exploiting the orderedcharacter of proteins. In doing so, it bypasses the large numberof join operations that would be required by relational databasesystems. An additional advantage of using property profile matchingis that searches can be carried out allowing a pre-set numberof mismatches. Also, as the database is read-only, SCAN3D doesnot need interactive data update mechanisms. Queries typicalof a molecular engineering environment are demonstrated withspecific examples: analysis of peptides that induce local structure,analysis of site-dependent rotamers and residue-residue contactanalysis  相似文献   

18.
A multiple sequence alignment algorithm is described that usesa dynamic programming-based pattern construction method to aligna set of homologous sequences based on their common patternof conserved sequence elements. This pattern-induced multi-sequencealignment (PUMA) algorithm can employ secondary-structure dependentgap penalties for use in comparative modelling of new sequenceswhen the three-dimensional structure of one or more membersof the same family is known. We show that the use of secondarystructure information can significantly improve the accuracyof aligning structure boundaries in a set of homologous sequenceseven when the structure of only one member of the family isknown  相似文献   

19.
Bovine pancreatic /S-trypsin (PDB ID-code: 1TPO) which is registeredin the Brookhaven Protein Data Bank (PDB) consists of four exons.The results of homology searches for each exon in the PDB showedthat homologous proteins were tonin (PDB ID-code: 1TON), ratmast cell protease (PDB ID-code: 3RP2_A), kaffikrein A (PDBID-code: 2PKA_B) and kallikrein A (2PKA_B) respectively. Thus,for the three-dimensional structure prediction of 1TPO, a chimeraprotein was constructed from the three proteins mentioned aboveand the 3-D structure prediction was performed using this chimerareference protein. The modelled structure of 1TPO was energeticallyoptimized by molecular mechanics and molecular dynamics simulationand was compared with its X-ray crystal structure registeredin the PDB. The root mean square deviations (r.m.s.d.) of mainchain atoms and the neighbouring active site (5 sphere fromHis57, AsplO2 and Serl95) between the modelled structure andthe X-ray structure were 1.66 and 0.94 respectively. Porcinepancreatic elastase (PDB ID-code: 3EST) which is registeredin the PDB was used as the reference protein and the modelledstructure from 3EST was also compared with the X-ray data. Ther.m.s.d. of main chain atoms and that of the active site were2.14 and 1.18 respectively. These results dearly support thepropriety of this method using the chimera reference protein.  相似文献   

20.
The three-dimensional structure of tomato P31 and T10 Cu,Znsuperoxide dismutases (SODs) were computer modelled using thestructure of the bovine enzyme as a template. The structure-essentialresidues retain in the models the position occupied in the otherCu,Zn SODs of known 3D structure and the overall packing ofthe ß-barrel is maintained. Formation of ‘aromaticpairs’occurs between newly inserted aromatic residues.The number of total charges changes in the two variants andsome charged residues located in the proximity of the activesite in most Cu,Zn SODs disappear in tomato enzymes. Calculationof the electrostatic potential field, carried out by numericallysolving the Poisson-Boltzmann equation, indicates that in bothvariants a negative potential field surrounds all the proteinsurface except the active site areas, characterized by positivepotential values, as already observed in the bovine enzyme.This result confirms that coordinated mutations of charged residueshave occurred in the evolution of this enzyme giving rise toa peculiar electrostatic potential distribution common to allmembers of this protein family.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号