首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Evaluation and improvements in the automatic alignment of protein sequences   总被引:1,自引:0,他引:1  
The accuracy of protein sequence alignment obtained by applyinga commonly used global sequence comparison algorithm is assessed.Alignments based on the superposition of the three-dimensionalstructures are used as a standard for testing the automatic,sequence-based methods. Alignments obtained from the globalcomparison of five pairs of homologous protein sequences studiedgave 54% agreement overall for residues in secondary structures.The inclusion of information about the secondary structure ofone of the proteins in order to limit the number of gaps insertedin regions of secondary structure, improved this figure to 68%.A similarity score of greater than six standard deviation unitssuggests that an alignment which is greater than 75% correctwithin secondary structural regions can be obtained automaticallyfor the pair of sequences.  相似文献   

2.
A method for comparison of protein sequences based on theirprimary and secondary structure is described. Protein sequencesare annotated with predicted secondary structures (using a modifiedChou and Fasman method). Two lettered code sequences are generated(Xx, where X is the amino acid and x is its annotated secondarystructure). Sequences are compared with a dynamic programmingmethod (STRALIGN) that includes a similarity matrix for boththe amino acids and secondary structures. The similarity valuefor each paired two-lettered code is a linear combination ofsimilarity values for the paired amino acids and their annotatedsecondary structures. The method has been applied to eight globinproteins (28 pairs) for which the X-ray structure is known.For protein pairs with high primary sequence similarity (>45%),STRALIGN alignment is identical to that obtained by a dynamicprogramming method using only primary sequence information.However, alignment of protein pairs with lower primary sequencesimilarity improves significantly with the addition of secondarystructure annotation. Alignment of the pair with the least primarysequence similarity of 16% was improved from 0 to 37% ‘correct’alignment using this method. In addition, STRALIGN was successfullyapplied to seven pairs of distantly related cytochrome c proteins,and three pairs of distantly related picornavirus proteins.  相似文献   

3.
Probabilities of all possible correspondences of residues inaligning two proteins are evaluated by assuming that the statisticalweight of each alignment is proportional to the exponent ofits total similarity score. Based on such probabilities, a probabilityalignment that includes the most probable correspondences isproposed. In the cases of highly similar sequence pairs, theprobability alignments agree with the maximum similarity alignmentsthat correspond to the alignments with the maximum similarityscore. Significant correspondences in the probability alignmentsare those whose probabilities are >0.5. The probability alignmentmethod is applied to a few protein pairs, and results indicatethat such highly probable correspondences in the probabilityalignments are probably correct correspondences that agree withthe structural alignments and that incorrect correspondencesin the maximum similarity alignments are usually insignificantcorrespondences in the probability alignments. The root meansquare deviations in superimposition of corresponding residuestend to be smaller for significant correspondences in the probabilityalignments than for all correspondences in the maximum similarityalignments, indicating that incorrect correspondences in themaximum similarity alignments tend to be insignificant correspondencesin probability alignments. This fact is also confirmed in 109protein pairs that are similar to each other with sequence identitiesbetween 90 and 35%. In addition, the probability alignment methodmay better predict correct correspondences than the maximumsimilarity alignment method. Probability alignments do, of course,depend on a scoring scheme but are less sensitive to the valueof parameters such as gap penalties. The present probabilityalignment method is useful for constructing reliable alignmentsbased on the probabilities of correspondences and can be usedwith any scoring scheme.  相似文献   

4.
A data bank merging related protein structures and sequences   总被引:1,自引:0,他引:1  
A data collection which merges protein structural and sequenceinformation is described. Structural superpositions amongstproteins with similar main-chain fold were performed or collectedfrom the literature. Sequences taken from the protein primarystructure databases were associated with the multiple structuralalignments providing they were at least 50% homologous in residueidentity to one of the structural sequences and at least 50%of the structural sequence residues were alignable. Such restrictionsallow reasonable confidence that the primary sequences sharethe conformation of the tertiary structural templates, exceptin the less conserved loop regions. Multiple structural superpositionswere collected for 38 familial groups containing a total of209 tertiary structures; 45 structures had no superposable matesand were used individually. Other information is also providedas main-chain and side-chain conformational angles, secondarystructural assignments and the like. Wedding the primary andtertiary structural data resulted in an 8-fold increase of databank sequence entries over those associated with the known three-dimensionalarchitectures alone.  相似文献   

5.
In search of the ideal protein sequence   总被引:1,自引:0,他引:1  
The inverse of a folding problem is to find the ideal sequencethat folds into a particular protein structure. This problemhas been addressed using the topology fingerprintbased threadingalgorithm, capable of calculating a score (energy) of an arbitrarysequence-structure pair. At first, the search is conducted byunconstrained minimization of the energy in sequence space.It is shown that using energy as the only design criterion leadsto spurious solutions with incorrect amino acid composition.The problem lies in the general features of the protein energysurface as a function of both structure and sequence. The proposedsolution is to design the sequence by maximizing the differencebetween its energy in the desired structure and in other knownprotein structures. Depending on the size of the database ofstructures ‘to avoid’, sequences bearing significantsimilarity to the native sequence of the target protein areobtained using this procedure.  相似文献   

6.
Designing amino acid sequences to fold with good hydrophobic cores   总被引:3,自引:0,他引:3  
We present two methods for designing amino acid sequences ofproteins that will fold to have good hydrophobic cores. Giventhe coordinates of the desired target protein or polymer structure,the methods generate sequences of hydrophobic (H) and polar(P) monomers that are intended to fold to these structures.One method designs hydrophobic inside, polar outside; the otherminimizes an energy function in a sequence evolution process.The sequences generated by these methods agree at the levelof 60–80% of the sequence positions in 20 proteins inthe Protein Data Bank. A major challenge in protein design isto create sequences that can fold uniquely, i.e. to a singleconformation rather than to many. While an earlier lattice-basedsequence evolution method was shown not to design unique folders,our method generates unique folders in lattice model tests.These methods may also be useful in designing other types offoldable polymer not based on amino acids  相似文献   

7.
Evolutionarily conserved hydrophobic residues at the core of protein structures are generally assumed to play a structural role in protein folding and stability. Recent studies have implicated that their importance to protein structures is uneven, with a few of them being crucial and the rest of them being secondary. In this work, we explored the possibility of employing this feature of native structures for discriminating non-native structures from native ones. First, we developed a network tool to quantitatively measure the structural contributions of individual amino acid residues. We systematically applied this method to diverse fold-type sets of native proteins. It was confirmed that this method could grasp the essential structural features of native proteins. Next, we applied it to a number of decoy sets of proteins. The results indicate that such an approach indeed identified non-native structures in most test cases. This finding should be of help for the investigation of the fundamental problem of protein structure prediction.  相似文献   

8.
Compensating changes in protein multiple sequence alignments   总被引:2,自引:0,他引:2  
A method was developed to identify compensating changes betweenresidues at positions in a multiple sequence alignment. (Forexample, one position might always contain a positively chargedresidue when the other is negatively charged and vice versa.)A correlation-based method was used to measure the compensationfound in the four residues at a pair of positions in any twosequences in a multiple alignment. All possible sequence pairingswere measured at the pair of positions and the resulting matrixanalysed to give a measure of cooperathity among the pairs.The basic method was sufficiently flexible to consider a numberof amino acid relatedness models based both on scalar and vectorialproperties. Pairs of compensating positions were selected bythe method and their mean separation (in a protein of knownstructure) was compared to both the mean pair-wise separationover all residues and the pairwise separation over an equivalentsample of pairs of residues selected on the basis of their conservationalone. The latter is an important control that has been omittedfrom previous studies. The results indicated that, at best,there was a slight effect (of marginal significance) leadingto the selection of closer pairs by the compensation measurewhen compared to the mean of all pairs. However, this was neveras good as the simpler measure based on conservation alone,which always found a significant majority of proteins with asample mean less than the overall mean  相似文献   

9.
The residue pair preference profile (R3P) method is an inversefolding method that combines environmental profiles and pairpreference profiles. The method uses statistical preferencesfor residue pairs which score the likelihood of finding a profiledresidue to be paired with a residue within its local environmentAll pairs are characterized by their dihedral angles, secondarystructure and number of neighboring residues as a function ofresidue type. Each residue pair preference is expressed forall 20 amino acids of the profiled residue and is weighted bythe compatibility of the environment residue with its own localenvironment The R3P method produces an initial profile-sequencealignment which is then refined by converting the initial profileinto a profile of a target sequence threaded into the structureof the initial profile. We have tested this method by evaluatingalignments of sequences with known 3-D structures using structuralsuperposition alignments as reference. R3P-sequence alignmentsare 50% correct on average for sequences whose 3-D structurepairs superimpose with an r.m.s. deviation of 1.97 Å.The average improvement in correctness during this iterativerefinement is 14%. The R3P-sequence alignments are comparedwith sequence-sequence and 3-D profile-sequence alignments.When all three methods are combined, on average 50% of the alignmentsare correct for pairs of 3-D structures that superimpose within2.12 Å. A 3-D model of HisA is predicted with the combinedmethod.  相似文献   

10.
The use of multiple sequence alignments for secondary structurepredictions is analysed. Seven different protein families, containingonly sequences of known structure, were considered to providea range of alignment and prediction conditions. Using alignmentsobtained by spatial superposition of main chain atoms in knowntertiary protein structures allowed a mean of 8% in secondarystructure prediction accuracy, when compared to those obtainedfrom the individual sequences. Substitution of these alignmentsby those determined directly from an automated sequence alignmentalgorithm showed variations in the prediction accuracy whichcorrelated with the quality of the multiple alignments and distanceof the primary sequence. Secondary structure predictions canbe reliably improved using alignments from an automatic alignmentprocedure with a mean increase of 6.87percnt;, giving an overallprediction accuracy of 68.5%, if there is a minimum of 25% sequenceidentity between all sequences in a family.  相似文献   

11.
A general protein sequence alignment methodology for detectinga priori unknown common structural and functional regions isdescribed. The method proposed in this paper is based on twobasic requirements for a meaningful alignment. First, each sequenceor segment of a sequence is characterized by a multivariatephysicochemical profile. Second, the alignment is performedby considering all the sequences simultaneously, and the algorithmdetects those regions that form a set of similar profiles. Inorder to test the structural meaning of the alignment obtainedfrom the sequences, quantitative comparisons are performed withstructurally conserved regions (SCR) determined from the X-raystructures of three serine proteases. Results suggest that thelimits of the SCR may be predicted from the similarities betweenthe physicochemical profiles of the sequences. The proceduresare not completely automated. The final step requires a visualscreening of alternative pathways in order to determine an optimalalignment.  相似文献   

12.
A multiple sequence alignment algorithm is described that usesa dynamic programming-based pattern construction method to aligna set of homologous sequences based on their common patternof conserved sequence elements. This pattern-induced multi-sequencealignment (PUMA) algorithm can employ secondary-structure dependentgap penalties for use in comparative modelling of new sequenceswhen the three-dimensional structure of one or more membersof the same family is known. We show that the use of secondarystructure information can significantly improve the accuracyof aligning structure boundaries in a set of homologous sequenceseven when the structure of only one member of the family isknown  相似文献   

13.
Variable gap penalty for protein sequence-structure alignment   总被引:1,自引:0,他引:1  
The penalty for inserting gaps into an alignment between two protein sequences is a major determinant of the alignment accuracy. Here, we present an algorithm for finding a globally optimal alignment by dynamic programming that can use a variable gap penalty (VGP) function of any form. We also describe a specific function that depends on the structural context of an insertion or deletion. It penalizes gaps that are introduced within regions of regular secondary structure, buried regions, straight segments and also between two spatially distant residues. The parameters of the penalty function were optimized on a set of 240 sequence pairs of known structure, spanning the sequence identity range of 20-40%. We then tested the algorithm on another set of 238 sequence pairs of known structures. The use of the VGP function increases the number of correctly aligned residues from 81.0 to 84.5% in comparison with the optimized affine gap penalty function; this difference is statistically significant according to Student's t-test. We estimate that the new algorithm allows us to produce comparative models with an additional approximately 7 million accurately modeled residues in the approximately 1.1 million proteins that are detectably related to a known structure.  相似文献   

14.
An automatic algorithm for defining topological equivalencesin protein structures is presented. The algorithm is based ona dynamic programming technique and self-consistent scoringmethod. We have used it to align pairs of similar protein structuresof several protein families and to identify recurrent structuraldomains in aspartic proteinase 2APR. Its ability to find suboptimalpaths permits a thorough comparison of proteins at each levelin the hierarchy of the protein structure: secondary structure,super-secondary structure, domain and entire globular structure.The algorithm has been extended to the structure alignment ofribonucleic acid and can be extended to the structure alignmentof any linear polymer.  相似文献   

15.
Using genetically engineered mutants of the neutral pro-teasefrom Bacillus stearothermophilus (BsteNP), it had been shownthat the surface-exposed structural motif constituted by Phe63embedded in a four amino acid hydrophobic pocket is criticalfor the thermal stability of the thermophilic neutral proteasesfrom Bacilli. To measure the stabilizing contribution of eachhydrophobic interaction taking place between Phe63 and the hydrophobicpocket, we grafted this structural motif in the neutral proteasefrom the mesophile Bacillus subtilis (BsubNP). This was accomplishedby first creating the Thr63Phe mutant of BsubNP and then generatinga series of mutants in which the four amino acids which in thermolysinsurround Phe63 and form the hydrophobic pocket were added oneafter the other. By analysing the thermal stability of eachmutant it was found that the 2°C destabilizing effect ofthe Thr63Phe substitution was completely suppressed by the additionof the four amino acid hydrophobic pocket, each replacementproviding a stabilizing contribution of approxi mately 0.8–1°C.These results are discussed in the light of the peculiar mechanismof thermal inactivation of proteolytic enzymes.  相似文献   

16.
A major problem in predicting protein structure by homologymodelling is that the sequence alignment from which the modelis built may not be the best one in terms of the correct equivalencingof residues assessed by structural or functional criteria. Auseful strategy is to generate and examine a number of suboptimalalignments as better alignments can often be found away fromthe optimal. A procedure to filter rapidly suboptimal alignmentsbased on measurement of core volumes and packing pair potentialsis investigated. The approach is benchmarked on three pairsof sequences which are non-trivial to align correctly, namelytwo immunoglobulin domains, plastocyanin with azurin and twodistant globin sequences. It is shown to be useful to reducea large ensemble of possible alignments down to a few whichcorrespond more closely to the correct (structure based) alignment.  相似文献   

17.
We describe an algorithm to predict tertiary structures of smallproteins. In contrast to most current folding algorithms, ituses very few energy parameters. Given the secondary structuralelements in the sequence—-helices and ß-strands—thealgorithm searches -the remaining conformational space of asimplified real-space representation of chains to find a minimumenergy of an exceedingly simple potential function. The potentialis based only on a single type of favorable interaction betweenhydrophobic residues, an unfavorable excluded volume term ofspatial overlaps and, for sheet proteins, an interstrand hydrogenbond interaction. Where appropriate, the known disulfide bondsare constrained by a square-law potential. Conformations aresearched by a genetic algorithm. The model predicts reasonablywell the known tertiary folds of seven out of the 10 small proteinswe consider. We draw two conclusions. First, for the proteinswe tested, this exceedingly simple potential function is noworse than others having hundreds of energy parameters in findingthe right general tertiary structures. Second, despite its simplicity,the potential function is not the weak link in this algorithm.Differences between our predicted structures and the correcttargets can be ascribed to shortcomings in our search strategy.This potential function may be useful for testing other conformationalsearch strategies.  相似文献   

18.
A new similarity score (-score) is proposed which is able tofind the correct protein structure among the very close alternativesand to distinguish between correct and deliberately misfoldedstructures. This score is based on the general principle `similarlikes similar', and it favors hydrophobic and hydrophilic contacts,and disfavors hydrophobic-to-hydrophilic contacts in proteins.The values of -scores calculated for the high-resolution proteinstructures from the representative set are compared with thoseof alternatives: (i) very close alternatives which are onlyslightly distorted by conformational energy minimization invacuo; (ii) alternatives with subsequently growing distortions,generated by molecular dynamics simulations in vacuo; (iii)structures derived by molecular dynamics simulation in solventat 300 K; (iv) deliberately misfolded protein models. In nearlyall tested cases the similarity score can successfully distinguishbetween experimental structure and its alternatives, even ifthe root mean square displacement of all heavy atoms is lessthan 1 Å. The confidence interval of the similarity scorewas estimated using the high-resolution X-ray structures ofdomain pairs related by non-crystallographic symmetry. The similarityscore can be used for the evaluation of the general qualityof the protein models, choosing the correct structures amongthe very close alternatives, characterization of models simulatingfolding/unfolding, etc.  相似文献   

19.
The model of the catalytic domain of Aspergillus awamori var.X100 glucoamylase was related to 14 other glucoamylase proteinsequences belonging to five subfamilies. Structural featuresof the different sequences were revealed by multisequence alignmentfollowing hydrophobic cluster analysis. The alignment agreedwith the hydrophobic microdomains, normally conserved throughoutevolution, evaluated from the 3-D model. Saccharomyces and Clostridiumglucoamylases lack the -helix exterior to the catalytic domain.A different catalytic base was found in the Saccharomyces glucoamylasesubfamily. The starch binding domain of fungal glucoamylaseshas identical structural features and substrate interactingresidues as the C-terminal domain of models of Bacillus circulanscyclodextrin glucosyltransferases. Three putative N-glycosylationsites were found in the same turns in glucoamylases of differentsubfamilies. O-Glycosylation is present at different levelsin the catalytic domain and in the linker between the catalyticand starch binding domains.  相似文献   

20.
Hydrophobic cluster analysis (HCA) is a protein sequence comparisonmethod based on -helical representations of the sequences wherethe size, shape and orientation of the clusters of hydrophobicresidues are primarily compared. The effectiveness of HCA hasbeen suggested to originate from its potential ability to focuson the residues forming the hydrophobic core of globular proteins.We have addressed the robustness of the bidimensional representationused for HCA in its ability to detect the regular secondarystructure elements of proteins. Various parameters have beenstudied such as those governing cluster size and limits, thehydrophobic residues constituting the clusters as well as thepotential shift of the cluster positions with respect to theposition of the regular secondary structure elements. The followingresults have been found to support the -helical bidimensionalrepresentation used in HCA: (i) there is a positive correlation(clearly above background noise) between the hydrophobic clustersand the regular secondary structure elements in proteins; (ii)the hydrophobic clusters are centred on the regular secondarystructure elements; (iii) the pitch of the helical representationwhich gives the best correspondence is that of an -helix. Thecorrespondence between hydrophobic clusters and regular secondarystructure elements suggests a way to implement variable gappenalties during the automatic alignment of protein sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号