首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
Optimal sequence threading can be used to recognize membersof a library of protein folds which are closely related in 3-Dstructure to the native fold of an input test sequence, evenwhen the test sequence is not significantly homologous to thesequence of any member of the fold library. The methods providean alignment between the residues of the test sequence and theresidue positions in a template fold. This alignment optimizesa score function, and the predicted fold is the highest scoringmember of the library of folds. Most score functions containa pairwise interaction energy term. This, coupled with the needto introduce gaps into the alignment, means that the optimizationproblem is NP hard. We report a comparison between two heuristicoptimization algorithms used in the literature, double dynamicprogramming and an iterative algorithm based on the so-calledfrozen approximation. These are compared in terms of both theranking of likely folds and the quality of the alignment produced.  相似文献   

2.
A new multiple sequence alignment procedure is presented. Severaldifferent multiple alignments are made using differing criteria.Having divided the sequences into strongly conserved regions(SCRs) and loosely conserved regions (LCRs), the ‘best’alignment for each LCR is chosen, independently of the otherLCRs, from a selection of possibilities in the multiple alignments.To help make this choice for each LCR, the secondary structureis predicted and shown alongside each different possible alignment.One advantage of this method over automatic, non-interactivemethods, is that the final alignment is not dependent on thechoice of a single set of scoring parameters. Another is that,by allowing interactive choice and by taking account of secondarystructural information, the final alignment is based more onbiological rather than mathematical factors. This method canproduce better alignments than any of the initial automaticmultiple alignment methods used.  相似文献   

3.
Homology modelling of the human eIF-5A protein has been performedby using a multiple predictions strategy. As the sequence identitybetween the target and the template proteins is nearly 30%,which is lower than the commonly used threshold to apply withconfidence the homology modelling method, we developed a specificpredictive scheme by combining different sequence analyses andpredictions, as well as model validation by comparison to structuralexperimental information. The target sequence has been usedto find homologues within sequence databases and a multiplealignment has been created. Secondary structure for each singleprotein has been predicted and compared on the basis of themultiple sequence alignment, in order to evaluate and adjustcarefully any gap. Therefore, comparative modelling has beenapplied to create the model of the protein on the basis of theoptimized sequence alignment. The quality of the model has beenchecked by computational methods and the structural featureshave been compared to experimental information, giving us agood validation of the reliability of the model and its correspondenceto the protein structure in solution. Last, the model was depositedin the Protein Data Bank to be accessible for studies on thestructure–function relationships of the human eIF-5A.  相似文献   

4.
A method of protein structure comparison developed previouslyis extended to incorporate other aspects of protein structurein addition to the inter-atomic vectors on which it was originallybased. Each additional aspect, which included hydrogen bonding,solvent exposure, torsional angles and sequence, was introducedseparately and evaluated for its ability to improve alignmentquality. The components were then combined, suitably weighted,to produce a more holistic comparison method. The method wastested on a group of remotely related ß/ type proteinsthat share a common feature in their overall chain fold. Theresults indicated that while the original inter-atomic vectorcomponent was sufficient to give the correct alignment of mostpairs of topologically equivalent proteins, the inclusion ofhydrogen bonds, torsion angles and a measure of solvent exposureled to improvements in the more difficult comparisons. Considerationof amino acid properties, including hydrophobicity, had no beneficialeffect. The failure of the latter component was not unexpectedconsidering the almost total lack of sequence similarity amongthe proteins considered.  相似文献   

5.
Multiple sequence alignment is a method for comparing two or more DNA or protein sequences. Most multiple sequence alignment methods rely on pairwise alignment and Smith-Waterman algorithm [Needleman and Wunsch, 1970; Smith and Waterman, 1981] to generate an alignment hierarchy. Therefore, as the number of sequences increases, the runtime increases exponentially. To resolve this problem, this paper presents a multiple sequence alignment method using a parallel processing suffix tree algorithm to search for common subsequences at one time without pairwise alignment. The cross-matched subsequences among the searched common subsequences may be generated and those cause inexact-matching. So the procedure of masking cross-matching pairs was suggested in this study. The proposed method, improved STC (Suffix Tree Clustering), is summarized as follows: (1) construction of suffix tree; (2) search and overlap of common subsequences; (3) grouping of subsequence pairs; (4) masking of cross-matching airs; and (5) clustering of gene sequences. The new method was successfully evaluated with 23 genes inMus musculus and 22 genes in three species, clustering nine and eight clusters, respectively. This paper was prepared at the 2004 Korea/Japan/Taiwan Chemical Engineering Conference held at Busan, Korea between November 3 and 4, 2004.  相似文献   

6.
Compensating changes in protein multiple sequence alignments   总被引:2,自引:0,他引:2  
A method was developed to identify compensating changes betweenresidues at positions in a multiple sequence alignment. (Forexample, one position might always contain a positively chargedresidue when the other is negatively charged and vice versa.)A correlation-based method was used to measure the compensationfound in the four residues at a pair of positions in any twosequences in a multiple alignment. All possible sequence pairingswere measured at the pair of positions and the resulting matrixanalysed to give a measure of cooperathity among the pairs.The basic method was sufficiently flexible to consider a numberof amino acid relatedness models based both on scalar and vectorialproperties. Pairs of compensating positions were selected bythe method and their mean separation (in a protein of knownstructure) was compared to both the mean pair-wise separationover all residues and the pairwise separation over an equivalentsample of pairs of residues selected on the basis of their conservationalone. The latter is an important control that has been omittedfrom previous studies. The results indicated that, at best,there was a slight effect (of marginal significance) leadingto the selection of closer pairs by the compensation measurewhen compared to the mean of all pairs. However, this was neveras good as the simpler measure based on conservation alone,which always found a significant majority of proteins with asample mean less than the overall mean  相似文献   

7.
Three major improvements to a previously described method forautomatic protein structure comparison are described. First,a limit to translations for the rigid-body superposition isnow assigned according to the dimensions of the structures beingcompared. Second, examination of the effect of the gap penaltyon the derivation of a sequence alignment corresponding to agiven structure superposition has led to a method to evaluatealternative structure-based sequence alignments. Third, thepairwise procedure has been generalized to multiple structurealignment. This implementation of rigid-body superposition canrecognize well documented distant relationships which hithertohave required consideration of additional features and propertiesas well as those relationships between proteins of differentsizes. A much larger common scaffold or framework between sixglobins can be extracted than that obtained using a standardalgorithm for multiple structure superposition  相似文献   

8.
A new method for predicting protein secondary structure from amino acid sequence has been developed. The method is based on multiple sequence alignment of the query sequence with all other sequences with known structure from the protein data bank (PDB) by using BLAST. The fragments of the alignments belonging to proteins from the PBD are then used for further analysis. We have studied various schemes of assigning weights for matching segments and calculated normalized scores to predict one of the three secondary structures: α-helix, β-sheet, or coil. We applied several artificial intelligence techniques: decision trees (DT), neural networks (NN) and support vector machines (SVM) to improve the accuracy of predictions and found that SVM gave the best performance. Preliminary data show that combining the fragment mining approach with GOR V (Kloczkowski et al, Proteins 49 (2002) 154-166) for regions of low sequence similarity improves the prediction accuracy.  相似文献   

9.
Consensus engineering has been used to increase the stability of a number of different proteins, either by creating consensus proteins from scratch or by modifying existing proteins so that their sequences more closely match a consensus sequence. In this paper we describe the first application of consensus engineering to the ab initio creation of a novel fluorescent protein. This was based on the alignment of 31 fluorescent proteins with >62% homology to monomeric Azami green (mAG) protein, and used the sequence of mAG to guide amino acid selection at positions of ambiguity. This consensus green protein is extremely well expressed, monomeric and fluorescent with red shifted absorption and emission characteristics compared to mAG. Although slightly less stable than mAG, it is better expressed and brighter under the excitation conditions typically used in single molecule fluorescence spectroscopy or confocal microscopy. This study illustrates the power of consensus engineering to create stable proteins using the subtle information embedded in the alignment of similar proteins and shows that the benefits of this approach may extend beyond stability.  相似文献   

10.
An approach is described for modelling the three-dimensionalstructure of a protein from the tertiary structures of severalhomologous proteins that have been determined by X-ray analysis.A method is developed for the simultaneous superposition ofseveral protein molecules and for the calculation of an ‘averagestructure’ or ‘framework’. Investigation ofthe convergence properties of this method, in the case of bothweighted and unweighted least squares, demonstrates that bothgive a unique answer and the latter is robust for an homologousfamily of proteins. Multi-dimensional scaling is used to subgroupthe proteins with respect to structural homology. The frameworkcalculated on the basis of the family of homologous proteins,or of an appropriate subgroup, is used to align fragments ofthe known protein structures of high sequence homology withthe unknown. This alignment provides a basis for model buildingthe tertiary structure. Different techniques for using the frameworkto model the mainchain of various globins and an immunoglobulindomain in the structurally conserved regions are in vestigated.  相似文献   

11.
A method for comparison of protein sequences based on theirprimary and secondary structure is described. Protein sequencesare annotated with predicted secondary structures (using a modifiedChou and Fasman method). Two lettered code sequences are generated(Xx, where X is the amino acid and x is its annotated secondarystructure). Sequences are compared with a dynamic programmingmethod (STRALIGN) that includes a similarity matrix for boththe amino acids and secondary structures. The similarity valuefor each paired two-lettered code is a linear combination ofsimilarity values for the paired amino acids and their annotatedsecondary structures. The method has been applied to eight globinproteins (28 pairs) for which the X-ray structure is known.For protein pairs with high primary sequence similarity (>45%),STRALIGN alignment is identical to that obtained by a dynamicprogramming method using only primary sequence information.However, alignment of protein pairs with lower primary sequencesimilarity improves significantly with the addition of secondarystructure annotation. Alignment of the pair with the least primarysequence similarity of 16% was improved from 0 to 37% ‘correct’alignment using this method. In addition, STRALIGN was successfullyapplied to seven pairs of distantly related cytochrome c proteins,and three pairs of distantly related picornavirus proteins.  相似文献   

12.
Secondary structure prediction for modelling by homology   总被引:1,自引:0,他引:1  
An improved method of secondary structure prediction has beendeveloped to aid the modelling of proteins by homology. Selecteddata from four published algorithms are scaled and combinedas a weighted mean to produce consensus algorithms. Each consensusalgorithm is used to predict the secondary structure of a proteinhomologous to the target protein and of known structure. Bycomparison of the predictions to the known structure, accuracyvalues are calculated and a consensus algorithm chosen as theoptimum combination of the composite data for prediction ofthe homologous protein. This customized algorithm is then usedto predict the secondary structure of the unknown protein. Inthis manner the secondary structure prediction is initiallytuned to the required protein family before prediction of thetarget protein. The method improves statistical secondary structureprediction and can be incorporated into more comprehensive systemssuch as those involving consensus prediction from multiple sequencealignments. Thirty one proteins from five families were usedto compare the new method to that of Garnier, Osguthorpe andRobson (GOR) and sequence alignment. The improvement over GORis naturally dependent on the similarity of the homologous protein,varying from a mean of 3% to 7% with increasing alignment significancescore.  相似文献   

13.
We have developed a new method for the prediction of the protein secondary structure from the amino acid sequence. The method is based on the most recent version (IV) of the standard GOR (J Mol Biol 120 (1978) 97) algorithm. A significant improvement is obtained by combining multiple sequence alignments with the GOR method. Additional improvement in the predictions is obtained by a simple correction of the results when helices or sheets are too short, or if helices and sheets are direct neighbors along the sequence (we require at least one residue of coil state between them). The imposition of the requirement that the prediction must be strong enough, i.e. that the difference between the probability of the predicted (most probable) state and the probability of the second most probable state must be larger than a certain minimum value also improves significantly secondary structure predictions. We have tested our method on 12 different proteins from the Protein Data Bank with known secondary structures. The average quality of the GOR prediction of the secondary structure for these 12 proteins without multiple sequence alignment was 63.4%. The multiple sequence alignments improve the average prediction to 71.9%. The correction for short helices and sheets and coil states separating sheets and helices improve further the average prediction to 74.4%. Setting the 10% minimum difference between the most probable and the second probable conformation leads to 77.0% accuracy of the prediction, while increasing this limit to 20% increases the average accuracy of the secondary structure prediction to 81.2%.  相似文献   

14.
A method using protein sequence divergence to predict the three-dimensionalstructure of the transmembrane domain of seven-helix membraneproteins is described. The key component in the multistep procedureis the calculation of a hydrophilic and lipophilic variabilityindex for each amino acid in an alignment of a family of homologousproteins. The variability profile, a plot of the calculatedvariability index versus alignment position, can be used topredict a tertiary model of the backbone conformation of thetransmembrane domain. This method was applied to bacteriorhodopsin(BR) and the model obtained was compared with the known structureof this protein. Using an alignment of the amino acid sequencesof BR and closely related (20% identity) proteins, the boundariesof the transmembrane regions, their secondary structures andorientations inside the membrane bilayer were predicted basedon the variability profile. Additional information about theshape of the helix bundle was also obtained from the averagevariability of each transmembrane helix with the assumptionthat the helices are packed sequentially and form a closed helixbundle. Correct features of the known structure of BR were foundin the model structure, suggesting that a similar strategy canbe used to predict transmembrane helices and the packing shapeof other membrane proteins with seven transmembrane helices,such as the opsins and other G-protein coupled receptors.  相似文献   

15.
Designing amino acid sequences to fold with good hydrophobic cores   总被引:3,自引:0,他引:3  
We present two methods for designing amino acid sequences ofproteins that will fold to have good hydrophobic cores. Giventhe coordinates of the desired target protein or polymer structure,the methods generate sequences of hydrophobic (H) and polar(P) monomers that are intended to fold to these structures.One method designs hydrophobic inside, polar outside; the otherminimizes an energy function in a sequence evolution process.The sequences generated by these methods agree at the levelof 60–80% of the sequence positions in 20 proteins inthe Protein Data Bank. A major challenge in protein design isto create sequences that can fold uniquely, i.e. to a singleconformation rather than to many. While an earlier lattice-basedsequence evolution method was shown not to design unique folders,our method generates unique folders in lattice model tests.These methods may also be useful in designing other types offoldable polymer not based on amino acids  相似文献   

16.
17.
Although it is well known that significant sequence similarity between proteins is reflected at the structural level, it is commonly assumed that any misaligned regions, as judged by the correct structure based alignment, are those where the local sequence identity is lower than the global. Recent studies have shown that this is not always the case and there can exist short stretches of high local identity which is not reflected in the structure based alignment. An analysis is presented of 290 pairs of homologous proteins with a view to quantifying the occurrence of these misleading local sequence alignments (MLSAs). It is found that such MLSAs are likely if the global sequence identity is less than 40% and can occur even when it is greater than 60%. The results have implications for automated homology modelling and also for the inference of function made by comparison.   相似文献   

18.
The question of protein homology versus analogy arises whenproteins share a common function or a common structural foldwithout any statistically significant amino acid sequence similarity.Even though two or more proteins do not have similar sequencesbut share a common fold and the same or closely related function,they are assumed to be homologs, descendant from a common ancestor.The problem of homolog identification is compounded in the caseof proteins of 100 or less amino acids. This is due to a limitednumber of basic single domain folds and to a likelihood of identifyingby chance sequence similarity. The latter arises from two conditions:first, any search of the currently very large protein databaseis likely to identify short regions of chance match; secondly,a direct sequence comparison among a small set of short proteinssharing a similar fold can detect many similar patterns of hydrophobicityeven if proteins do not descend from a common ancestor. In aneffort to identify distant homologs of the many ubiquitin proteins,we have developed a combined structure and sequence similarityapproach that attempts to overcome the above limitations ofhomolog identification. This approach results in the identificationof 90 probable ubiquitin-related proteins, including examplesfrom the two prokaryotic domains of life, Archaea and Bacteria. Received December 1, 2002; revised October 22, 2003; accepted October 24, 2003  相似文献   

19.
Many cell functions in all living organisms rely on protein-based molecular recognition involving disorder-to-order transitions upon binding by molecular recognition features (MoRFs). A well accepted computational tool for identifying likely protein-protein interactions is sequence alignment. In this paper, we propose the combination of sequence alignment and disorder prediction as a tool to improve the confidence of identifying MoRF-based protein-protein interactions. The method of reverse sequence alignment is also rationalized here as a novel approach for finding additional interaction regions, leading to the concept of a retro-MoRF, which has the reversed sequence of an identified MoRF. The set of retro-MoRF binding partners likely overlap the partner-sets of the originally identified MoRFs. The high abundance of MoRF-containing intrinsically disordered proteins in nature suggests the possibility that the number of retro-MoRFs could likewise be very high. This hypothesis provides new grounds for exploring the mysteries of protein-protein interaction networks at the genome level.  相似文献   

20.
Protein design experiments have shown that the use of specificsubsets of amino acids can produce foldable proteins. This promptsthe question of whether there is a minimal amino acid alphabetwhich could be used to fold all proteins. In this work we makean analogy between sequence patterns which produce foldablesequences and those which make it possible to detect structuralhomologs by aligning sequences, and use it to suggest the possiblesize of such a reduced alphabet. We estimate that reduced alphabetscontaining 10–12 letters can be used to design foldablesequences for a large number of protein families. This estimateis based on the observation that there is little loss of theinformation necessary to pick out structural homologs in a clusteredprotein sequence database when a suitable reduction of the aminoacid alphabet from 20 to 10 letters is made, but that this informationis rapidly degraded when further reductions in the alphabetare made.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号