首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
A relational database of protein structure has been developedto enable rapid and flexible enquiries about the occurrenceof many aspects of protein architecture. The coordinates of294 proteins from the Brookhaven Data Bank have been processedby standard computer programs to generate many additional termsthat quantify aspects of protein structure. These terms includesolvent accessibility, main-chain and side-chain dihedral angles,and secondary structure. In a relational database, the informationis stored in tables with columns holding the different termsand rows holding the different entries for the terms. The differentrelational base tables store the information about the proteincoordinate set, the different chains in the protein, the aminoacid residues and ligands, the atomic coordinates, the saltbridges, the hydrogen bonds, the disulphide bridges and theclose tertiary contacts. The database was established underORACLE management system. Enquiries are constructed in ORACLEusing SQL (structured query language) which is simple to useand alleviates the need for extensive computer programs. A singletable can be searched for entries that meet various criteria,e.g. all protein solved to better than a given resolution. Thepower of the database occurs when several tables, or the entriesin a single table, are cross-correlated. For example the dihedralangles of proline in the fourth position in an -helix in highresolution structures can be rapidly obtained. The structuraldatabase provides a powerful tool to obtain empirical rulesabout protein conformation. This database of protein structuresis part of a joint project between Birkbeck College and LeedsUniversity to establish an integrated data resource of proteinsequences and structures (ISIS) that encodes the complex patternsof residues and coordinates that define protein conformation.The entire data resource (ISIS) will provide a system to guideall areas of protein modelling including structure prediction,site-directed mutagenesis and de novo protein design. The availabilityof ISIS is described in the paper.  相似文献   

3.
Amino acid sequence patterns suggested to characterize specificrecurrent turn conformations in proteins are tested as to theirpredictive power in a database containing 75 proteins of knownstructure. Many of these patterns are found to be associatedwith local structures that differ from the motifs originallyused to derive them. It is therefore concluded that, while theycould be useful for improving predictions made by other methods,their stand-alone predictive power is poor. The issue of derivingand validating consensus sequence patterns for use in proteinstructure prediction is raised.  相似文献   

4.
The instabilities of the native structures of mutant proteinswith an amino acid exchange are estimated by using the contactenergy and the number of contacts for each type of amino acidpair, which were estimated from 18 192 residue–residuecontacts observed in 42 crystals of globular proteins. Theywere then used to evaluate a transition probability matrix ofcodon substitutions and a log relatedness odds matrix, whichis used as a scoring matrix to measure the similarity betweenprotein sequences. To consider amino acid substitutions in homologousproteins, base mutation rates and the effects of the geneticcode are also taken into account. The average fitness of anamino acid exchange is approximated to be proportional to thestructural stability of the mutant protein, which is then approximatedby the average energy change of the protein native structureexpected for the ammo acid exchange with neglect of the energychange of the denatured state. In global and local homologysearches, this scoring matrix tends to yield significantly higheralignment scores than either the unitary matrix or the geneticcode matrix, and also may yield higher alignment scores fordistantly related protein pairs than MDM78. One of advantagesof this scoring matrix is that the equilibrium frequencies ofcodons and also base mutation rates can be adjusted.  相似文献   

5.
An automatic algorithm for defining topological equivalencesin protein structures is presented. The algorithm is based ona dynamic programming technique and self-consistent scoringmethod. We have used it to align pairs of similar protein structuresof several protein families and to identify recurrent structuraldomains in aspartic proteinase 2APR. Its ability to find suboptimalpaths permits a thorough comparison of proteins at each levelin the hierarchy of the protein structure: secondary structure,super-secondary structure, domain and entire globular structure.The algorithm has been extended to the structure alignment ofribonucleic acid and can be extended to the structure alignmentof any linear polymer.  相似文献   

6.
An artificial neural network system is used for pattern recognitionin protein side-chain-side-chain contact maps. A back-propagationnetwork was trained on a set of patterns which are popular inside-chain contact maps of protein structures. Several neuralnetwork architectures and different training parameters weretested to decide on the best combination for the neural network.The resulting network can distinguish between original (fromprotein structures) and randomized patterns with an accuracyof 84.5% and a Matthews' coefficient of 0.72 for the testingset. Applications of this system for protein structure evaluationand refinement are also proposed. Examples include structuresobtained after the application of molecular dynamics to crystalstructures, structures obtained from X-ray crystallography atvarious stages of refinement, structures obtained from a denovo folding algorithm and deliberately misfolded structures.  相似文献   

7.
A composite plot for depicting in two dimensions the conformationand the secondary structural features of protein residues hasbeen developed. Instead of presenting the exact values of themain- and side-chain torsion angles (, and 1), it indicatesthe region in the three-dimensional conformational space towhich a residue belongs. Other structural aspects, like thepresence of a cis peptide bond and disulfide linkages, are alsodisplayed. The plot may be used to recognize patterns in thebackbone and side-chain conformation along a polypeptide chainand to compare protein structures derived from X-ray crystallography,NMR spectroscopy or molecular modelling studies and also tohighlight the effect of mutation on structure.  相似文献   

8.
A data bank merging related protein structures and sequences   总被引:1,自引:0,他引:1  
A data collection which merges protein structural and sequenceinformation is described. Structural superpositions amongstproteins with similar main-chain fold were performed or collectedfrom the literature. Sequences taken from the protein primarystructure databases were associated with the multiple structuralalignments providing they were at least 50% homologous in residueidentity to one of the structural sequences and at least 50%of the structural sequence residues were alignable. Such restrictionsallow reasonable confidence that the primary sequences sharethe conformation of the tertiary structural templates, exceptin the less conserved loop regions. Multiple structural superpositionswere collected for 38 familial groups containing a total of209 tertiary structures; 45 structures had no superposable matesand were used individually. Other information is also providedas main-chain and side-chain conformational angles, secondarystructural assignments and the like. Wedding the primary andtertiary structural data resulted in an 8-fold increase of databank sequence entries over those associated with the known three-dimensionalarchitectures alone.  相似文献   

9.
An empirical relationship between occupancy and the atomic displacementparameter of water molecules in protein crystal structures hasbeen found by comparing a set of well refined sperm whale myoglobincrystal structures. The relationship agrees with a series ofindependent structural features whose impact on water occupancycan easily be predicted as well as with other known data andis independent of the protein fold. The estimation of the wateroccupancy in protein crystal structures may help in understandingthe physico-chemical properties of the protein–solventinterface and can allow the monitoring of the accuracy of theprotein crystal structure refinement.  相似文献   

10.
Cleavage-site motifs in mitochondrial targeting peptides   总被引:3,自引:0,他引:3  
Although mitochondrial targeting peptides lack a common consensussequence, a certain bias in the positional distribution of aminoacids has recently been found. These patterns seem to be associatedwith cleavage of the precursor proteins by matrix processingproteases. We have extended the previous studies and found newsequence motifs that are conserved within subgroups of mitochondrialtargeting peptides. These motifs have certain common themes,indicating that they are associated with cleavage by one singleprotease. Two of the conserved patterns have a high predictivevalue, but even for sequences that do not possess these patterns,a fairly accurate prediction of the cleavage site is shown tobe possible. We also suggest that a well-conserved RXY (S/A)pattern may be used to engineer efficiently recognized cleavagesites into uncleaved or artificial mitochondrial targeting peptides.  相似文献   

11.
The automatic identification of motifs associated with a givenfunction is an important challenge for molecular sequence analysis.A method is presented for the extraction of such patterns fromlarge sets of unaligned sequences with related but general function,for example, a set of heat shock proteins. In such a set ofproteins there can often be several subfamilies each characterizedby one or more distinct motifs. The aim is to develop computationaltools to identify these motifs. The algorithm presented locateshigh frequency words of length k with a given number of positions,r, fixed. Statistics for a binomial distribution are used toassess the significance of the words. The high-frequency wordsare clustered and highly populated clusters retained. The compositionof the clusters is displayed graphically. A set of motifs associatedwith the sequence family can automatically be extracted. Themethod is benchmarked on a set of 106 heat shock sequences anda set of 257 toxin sequences. It is shown to recover previouslyidentified motifs.  相似文献   

12.
An analysis of the geometry of metal binding by carboxylic andcarboxamide groups in proteins is presented. Most of the ligandsare from aspartic and glutamic acid side chains. Water moleculesbound to carboxylate anions are known to interact with oxygenlone-pairs. However, metal ions are also found to approach thecarboxylate group along the C - O direction. More metal ionsare found to be along the syn than the anti lone-pair direction.This seems to be the result of the stability of the five-memberedring that is formed by the carboxylate anion hydrogen bondedto a ligand water molecule and the metal ion in the syn position.Ligand residues are usually from the helix, turn or regionswith no regular secondary structure. Because of the steric interactionsassociated with bringing all the ligands around a metal center,a calcium ion can bind only near the ends of a helix; a metal,like zinc, with a low coordination number, can bind anywherein the helix. Based on the analysis of the positions of watermolecules in the metal coordination sphere, the sequence ofthe EF hand (a calcium-binding structure) is discussed.  相似文献   

13.
In aligning homologous protein sequences, it is generally assumedthat amino acid substitutions subsequent in time occur independentlyof amino acid substitutions previous in time, i.e. that patternsof mulation are similar at low and high sequence divergence.This assumption is examined here and shown to be incorrect inan interesting way. Separate mutation matrices were constructedfor aligned protein sequence pairs at divergences ranging from5 to 100 PAM units (point accepted mutations per 100 alignedpositions). From these, the corresponding log-odds (Day-hoff)matrices, normalized to 250 PAM units, were constructed. Thematrices show that the genetic code influences accepted pointmutations strongly at early stages of divergence, while thechemical properties of the side chains dominate at more advancedstages.  相似文献   

14.
A new similarity score (-score) is proposed which is able tofind the correct protein structure among the very close alternativesand to distinguish between correct and deliberately misfoldedstructures. This score is based on the general principle `similarlikes similar', and it favors hydrophobic and hydrophilic contacts,and disfavors hydrophobic-to-hydrophilic contacts in proteins.The values of -scores calculated for the high-resolution proteinstructures from the representative set are compared with thoseof alternatives: (i) very close alternatives which are onlyslightly distorted by conformational energy minimization invacuo; (ii) alternatives with subsequently growing distortions,generated by molecular dynamics simulations in vacuo; (iii)structures derived by molecular dynamics simulation in solventat 300 K; (iv) deliberately misfolded protein models. In nearlyall tested cases the similarity score can successfully distinguishbetween experimental structure and its alternatives, even ifthe root mean square displacement of all heavy atoms is lessthan 1 Å. The confidence interval of the similarity scorewas estimated using the high-resolution X-ray structures ofdomain pairs related by non-crystallographic symmetry. The similarityscore can be used for the evaluation of the general qualityof the protein models, choosing the correct structures amongthe very close alternatives, characterization of models simulatingfolding/unfolding, etc.  相似文献   

15.
An analysis of structural instances of low complexity sequence segments   总被引:1,自引:0,他引:1  
Amino acid sequence databases contain many low complexity, compositionallybiased sequence segments. However, only a limited number ofrelatively short instances of these segments occur in proteinsof known structure. An analysis is presented of structural instancesof these low complexity sequence segments in the BrookhavenProtein Data Bank with regard to preferences for sequence composition,secondary structural conformation and the local atomic environment.The complexity varies almost linearly with segment length, reflectingthe absence of very long, low complexity segments in the structuraldatabase. The low complexity segments identified are not disorderedand have temperature factors which are generally the same asthe rest of the protein. It is observed that these segmentsare predominantly exposed and either helical or coiled, in excessof what would be expected by chance. Secondary structure predictionmethods perform well in correctly predicting those low complexitysegments which are helical but poorly in correctly predictingsegments that are strands.  相似文献   

16.
Genetic algorithms are very efficient search mechanisms whichmutate, recombine and select amongst tentative solutions toa problem until a near optimal one is achieved. We introducethem as a new tool to study proteins. The identification andmotivation for different fitness functions is discussed. Theevolution of the zinc finger sequence motif from a random startis modelled. User specified changes of the repressor structurewere simulated and critical sites and exchanges for mutagenesisidentified. Vast conformational spaces are efficiently searchedas illustrated by the ab initio folding of a model protein ofa four ß strand bundle. The genetic algorithm simulationwhich mimicked important folding constraints as overall hydrophobicpackaging and a propensity of the betaphilic residues for transpositions achieved a unique fold. Cooperativity in the ßstrand regions and a length of 3–5 for the interconnectingloops was critical. Specific interaction sites were considerablyless effective in driving the fold.  相似文献   

17.
We have developed a variable gap penalty function for use inthe comparison program COMPARER which aligns protein sequenceson the basis of their 3-D structures. For deletions and insertions,components are a function of structural features of individualamino acid residues (e.g. secondary structure and accessibility).We have also obtained relative weights for different featuresused in the comparison by examining the equivalent residuesin weight matrices and in alignments for pairs of 3-D structureswhere the equivalences are relatively unambiguous. We have usedthe new parameters and the varible gap penalty function in COMPARERto align protein structures in the Brookhaven Data Bank. Thevariable gap penalty function is useful especially in avoidinggaps in secondary structure elements and the new feature weightsgive improved alignments. The alignments for both azurins andplastocyanins and N- and C-terminal lobes for aspartic proteinasesare discussed  相似文献   

18.
The three-dimensional structure of a proteolytically modifiedprotein C inhibitor, a member of the serine protease inhibitorsuperfamily, was constructed with computer graphics based onits amino acid sequence homology with that of the modified 1-antitrypsinwhose structure had been elucidated by X-ray crystallography.The intact form of protein C inhibitor was predicted with an-carbon model based on its hydrophilicity and hydrogen bondpattern. Furthermore, a model of its interaction with activatedprotein C was constructed based on the structure of the complexbetween trypsin and its inhibitor, which had been determinedby X-ray crystallography.  相似文献   

19.
An analysis of the geometry and the orientation of metal ionsbound to histidine residues in proteins is presented. Cationsare found to lie in the imidazole plane along the lone pairon the nitrogen atom. Out of the two tautomeric forms of theimidazole ring, the NE2-protonated form is normally preferred.However, when bound to a metal ion the ND1-protonated form ispredominant and NE2 is the ligand atom. When the metal coordinationis through ND1, steric interactions shift the side chain torsionalangle, X2 from its preferred value of 90 or 270. The orientationof histidine residues is usually stabilized through hydrogenbonding; ND1-protonated form of a helical residue can form ahydrogen bond with the carbonyl oxygen atom in the precedingturn of the helix. A considerable number of ligands are foundin helices and ß-sheets. A helical residue hound toa heme group is usually found near the C-terminus of the helix.Two ligand groups four residues apart in a helix, or two residuesapart in a ß-strand are used in many proteins to bindmetal ions.  相似文献   

20.
In search of the ideal protein sequence   总被引:1,自引:0,他引:1  
The inverse of a folding problem is to find the ideal sequencethat folds into a particular protein structure. This problemhas been addressed using the topology fingerprintbased threadingalgorithm, capable of calculating a score (energy) of an arbitrarysequence-structure pair. At first, the search is conducted byunconstrained minimization of the energy in sequence space.It is shown that using energy as the only design criterion leadsto spurious solutions with incorrect amino acid composition.The problem lies in the general features of the protein energysurface as a function of both structure and sequence. The proposedsolution is to design the sequence by maximizing the differencebetween its energy in the desired structure and in other knownprotein structures. Depending on the size of the database ofstructures ‘to avoid’, sequences bearing significantsimilarity to the native sequence of the target protein areobtained using this procedure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号