首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Twilight zone of protein sequence alignments   总被引:8,自引:0,他引:8  
Sequence alignments unambiguously distinguish between proteinpairs of similar and non-similar structure when the pairwisesequence identity is high (>40% for long alignments). Thesignal gets blurred in the twilight zone of 20–35% sequenceidentity. Here, more than a million sequence alignments wereanalysed between protein pairs of known structures to re-definea line distinguishing between true and false positives for lowlevels of similarity. Four results stood out. (i) The transitionfrom the safe zone of sequence alignment into the twilight zoneis described by an explosion of false negatives. More than 95%of all pairs detected in the twilight zone had different structures.More precisely, above a cut-off roughly corresponding to 30%sequence identity, 90% of the pairs were homologous; below 25%less than 10% were. (ii) Whether or not sequence homology impliedstructural identity depended crucially on the alignment length.For example, if 10 residues were similar in an alignment oflength 16 (>60%), structural similarity could not be inferred.(iii) The `more similar than identical' rule (discarding allpairs for which percentage similarity was lower than percentageidentity) reduced false positives significantly. (iv) Usingintermediate sequences for finding links between more distantfamilies was almost as successful: pairs were predicted to behomologous when the respective sequence families had proteinsin common. All findings are applicable to automatic databasesearches.  相似文献   

2.
We have studied the question of how much extra predictive powerthe correlated mutational behaviour of pairs of amino acid residuesseparated along a sequence has concerning the likelihood ofthose residues being in contact in the folded protein. The mutationalbehaviour is deduced from multiple sequence alignments. Ourfindings are that there is, indeed, some valuable informationavailable from this source and that it is sufficient to makea significant improvement in our ability to predict contacts,when compared with earlier methods that do not take into accountthe correlations between the mutations. This improvement isapproximately twice as large as can be obtained by the moreeconomical method of simply averaging pair preferences overthe same sequence alignment. Even when using a method basedon pair preferences, a further significant improvement can bemade by penalizing more variable regions (on the reasonableassumption that invariant residues are relatively more likelyto be in contact), though we have found no way of improvingthe pair preference method to the extent that it matches themethod based on correlated behaviour. Our new method is thoughtto be the best data-based method of contact prediction developedso far, achieving, on average, an improvement over a random(i.e. information-free) prediction of a factor of five whenthe number of contacts predicted is chosen to match the numberthat actually occur.  相似文献   

3.
Judging the significance of alignments is still a major problemin sequence comparison. We present a method to delineate reliableregions within an alignment. This differs from standard approachesin that it does not attempt to attribute one significance valueto the alignment as a whole, but assesses alignment qualitylocally. An algorithm is provided that predicts which residuepairs in an alignment are likely to be correctly matched. Thepredictions are evaluated by comparison with alignments takenfrom tertiary structural superpositions.  相似文献   

4.
A methodology is proposed to solve a difficult modeling problemrelated to the recently sequenced P39 protein. This sequenceshares no similarity with any known 3D structure, but a foldis proposed by several threading tools. The difficulty in aligningthe target sequence on one of the proposed template structuresis overcome by combining the results of several available predictionmethods and by refining a rational consensus between them. Insilico validation of the obtained model and a preliminary cross-checkwith experimental features allow us to state that this borderlineprediction is at least reasonable. This model raises relevanthypotheses on the main structural features of the protein andallows the design of site-directed mutations. Knowing the geneticcontext of the P39 reading frame, we are now able to suggesta function for the P39 protein: it would act as a periplasmicsubstrate-binding protein.  相似文献   

5.
The residue pair preference profile (R3P) method is an inversefolding method that combines environmental profiles and pairpreference profiles. The method uses statistical preferencesfor residue pairs which score the likelihood of finding a profiledresidue to be paired with a residue within its local environmentAll pairs are characterized by their dihedral angles, secondarystructure and number of neighboring residues as a function ofresidue type. Each residue pair preference is expressed forall 20 amino acids of the profiled residue and is weighted bythe compatibility of the environment residue with its own localenvironment The R3P method produces an initial profile-sequencealignment which is then refined by converting the initial profileinto a profile of a target sequence threaded into the structureof the initial profile. We have tested this method by evaluatingalignments of sequences with known 3-D structures using structuralsuperposition alignments as reference. R3P-sequence alignmentsare 50% correct on average for sequences whose 3-D structurepairs superimpose with an r.m.s. deviation of 1.97 Å.The average improvement in correctness during this iterativerefinement is 14%. The R3P-sequence alignments are comparedwith sequence-sequence and 3-D profile-sequence alignments.When all three methods are combined, on average 50% of the alignmentsare correct for pairs of 3-D structures that superimpose within2.12 Å. A 3-D model of HisA is predicted with the combinedmethod.  相似文献   

6.
A major problem in predicting protein structure by homologymodelling is that the sequence alignment from which the modelis built may not be the best one in terms of the correct equivalencingof residues assessed by structural or functional criteria. Auseful strategy is to generate and examine a number of suboptimalalignments as better alignments can often be found away fromthe optimal. A procedure to filter rapidly suboptimal alignmentsbased on measurement of core volumes and packing pair potentialsis investigated. The approach is benchmarked on three pairsof sequences which are non-trivial to align correctly, namelytwo immunoglobulin domains, plastocyanin with azurin and twodistant globin sequences. It is shown to be useful to reducea large ensemble of possible alignments down to a few whichcorrespond more closely to the correct (structure based) alignment.  相似文献   

7.
Optimal sequence threading can be used to recognize membersof a library of protein folds which are closely related in 3-Dstructure to the native fold of an input test sequence, evenwhen the test sequence is not significantly homologous to thesequence of any member of the fold library. The methods providean alignment between the residues of the test sequence and theresidue positions in a template fold. This alignment optimizesa score function, and the predicted fold is the highest scoringmember of the library of folds. Most score functions containa pairwise interaction energy term. This, coupled with the needto introduce gaps into the alignment, means that the optimizationproblem is NP hard. We report a comparison between two heuristicoptimization algorithms used in the literature, double dynamicprogramming and an iterative algorithm based on the so-calledfrozen approximation. These are compared in terms of both theranking of likely folds and the quality of the alignment produced.  相似文献   

8.
The automatic identification of motifs associated with a givenfunction is an important challenge for molecular sequence analysis.A method is presented for the extraction of such patterns fromlarge sets of unaligned sequences with related but general function,for example, a set of heat shock proteins. In such a set ofproteins there can often be several subfamilies each characterizedby one or more distinct motifs. The aim is to develop computationaltools to identify these motifs. The algorithm presented locateshigh frequency words of length k with a given number of positions,r, fixed. Statistics for a binomial distribution are used toassess the significance of the words. The high-frequency wordsare clustered and highly populated clusters retained. The compositionof the clusters is displayed graphically. A set of motifs associatedwith the sequence family can automatically be extracted. Themethod is benchmarked on a set of 106 heat shock sequences anda set of 257 toxin sequences. It is shown to recover previouslyidentified motifs.  相似文献   

9.
Restriction enzymes (REases) are commercial reagents commonly used in DNA manipulations and mapping. They are regarded as very attractive models for studying protein-DNA interactions and valuable targets for protein engineering. Their amino acid sequences usually show no similarities to other proteins, with rare exceptions of other REases that recognize identical or very similar sequences. Hence, they are extremely hard targets for structure prediction and modeling. NlaIV is a Type II REase, which recognizes the interrupted palindromic sequence GGNNCC (where N indicates any base) and cleaves it in the middle, leaving blunt ends. NlaIV shows no sequence similarity to other proteins and virtually nothing is known about its sequence-structure-function relationships. Using protein fold recognition, we identified a remote relationship between NlaIV and EcoRV, an extensively studied REase, which recognizes the GATATC sequence and whose crystal structure has been determined. Using the 'FRankenstein's monster' approach we constructed a comparative model of NlaIV based on the EcoRV template and used it to predict the catalytic and DNA-binding residues. The model was validated by site-directed mutagenesis and analysis of the activity of the mutants in vivo and in vitro as well as structural characterization of the wild-type enzyme and two mutants by circular dichroism spectroscopy. The structural model of the NlaIV-DNA complex suggests regions of the protein sequence that may interact with the 'non-specific' bases of the target and thus it provides insight into the evolution of sequence specificity in restriction enzymes and may help engineer REases with novel specificities. Before this analysis was carried out, neither the three-dimensional fold of NlaIV, its evolutionary relationships or its catalytic or DNA-binding residues were known. Hence our analysis may be regarded as a paradigm for studies aiming at reducing 'white spaces' on the evolutionary landscape of sequence-function relationships by combining bioinformatics with simple experimental assays.  相似文献   

10.
Amino acid sequence patterns suggested to characterize specificrecurrent turn conformations in proteins are tested as to theirpredictive power in a database containing 75 proteins of knownstructure. Many of these patterns are found to be associatedwith local structures that differ from the motifs originallyused to derive them. It is therefore concluded that, while theycould be useful for improving predictions made by other methods,their stand-alone predictive power is poor. The issue of derivingand validating consensus sequence patterns for use in proteinstructure prediction is raised.  相似文献   

11.
A mutant of bovine pancreatic DNase I containing two additionalresidues in a loop next to C173 has been expressed in Escherichiacoli, purified and characterized biochemically. Modelling studiessuggest that the inserted arginine and glutamate side chainsof the modified loop sequence C173-R-E-G-T-V176 could contactthe bases 3' to the cleaved bond in the major groove of a boundDNA, and that up to 10 bp could interact with the enzyme andpotentially influence its cutting rate. The loop insertion mutanthas an 800-fold lower specific activity than wild-type and showsoverall cleavage characteristics similar to bovine pancreaticDNase I. Compared with the wild-type enzyme, the mutant showsa strongly enhanced preference for cutting the inverted repeat:5'-GACTT A AAGTC-3' CTGAA T TTCAG or close variants thereof.Unexpectedly for a minor groove binding protein, the preferredcutting sites in opposite strands are staggered by 1 bp in the5' direction, causing the cleavage of a TA and a TT step, respectively.This finding demonstrates that the sequence context is relativelymore important for the cutting frequency than the nature ofthe dinucleotide step of the cleaved bond, and clearly showsthat base recognition is involved in determining the sequenceselectivity of the mutant. The importance of the sequence 5'to the cleaved bond for the cutting rate suggests that the additionalmajor groove contacts may require a distortion of the DNA associatedwith a higher energy barrier, resulting in an increased selectivityfor flexible DNA sequences and a lower overall activity of themutant enzyme.  相似文献   

12.
Phosphoenolpyruvate carboxylase (PEPC) catalyzes the irreversible carboxylation of phosphoenolpyruvate (PEP) and plays a crucial role in fixing atmospheric CO(2) in C(4) and CAM plants. The enzyme is widespread in plants and bacteria and mostly regulated allosterically by both positive and negative effectors. Archaeal PEPCs (A-PEPCs) have unique characteristics in allosteric regulation and molecular mass, distinct from their bacterial and eukaryote homologues, and their amino acid sequences have become available only recently. In this paper, we generated a structure-based alignment of archaeal, bacterial and eukaryote PEPCs and built comparative models using a combination of fold recognition, sequence and structural analysis tools. Our comparative modeling analysis identified A-PEPC-specific strong interactions between the two loops involved in both allostery and catalysis, which explained why A-PEPC is not influenced by any allosteric activators. We also found that the side-chain located three residues before the C-terminus appears to play a key role in determining the sensitivity to allosteric inhibitors. In addition to these unique features, we revealed how archaeal, bacterial and eukaryote PEPCs would share a common catalytic mechanism and adopt a similar mode of tetramer formation, despite their divergent sequences. Our novel observations will help design more efficient molecules for ecological and industrial use.  相似文献   

13.
Amino acid substitution tables are used to estimate the extentto which amino acids in families of homologous proteins areexposed to the solvent. The approach depends on the comparisonof difference environment-dependent tables for solvent accessible/inaccessibleresidues with amino acid substitutions at each position in analigned set of sequences. The periodicity in the predicted accessible/inaccessibleresidues is calculated using a Fourier transform procedure modifiedfrom that used to calculate hydrophobic moments. a-Helices areidentified from the characteristic periodicities and the solventaccessible face of the helix is defined. The initial helix predictionsare refined using rules for identifying the N- and C-terminiof helices from sequence alignments. These rules have been definedfrom a study of protein structures. The combined method correctlypredicts 79% of the residues in helices and incorrectly predictsonly 12% of the nonhelical residues as helical. In addition,since the method is reliable at predicting the correct numberof helices in the correct position in the sequence and sinceit also predicts the internal face of each helix, the resultscan be used to postulate 3-D arrangements of the secondary structureelements.  相似文献   

14.
The Escherichia coli aspartate receptor is a dimer with twotransmembrane sequences per monomer that connect a periplasmicligand binding domain to a cytoplasmic signaling domain. Themethod of 'hydrophobic-biased' random mutagenesis, that we describehere, was used to construct mutant aspartate receptors in whicheither the entire transmembrane sequence or seven residues nearthe center of the transmembrane sequence were replaced withhydrophobic and polar random residues. Some of these receptorsresponded to aspartate in an in vivo chemotaxis assay, whileothers did not. The acceptable substitutions included hydrophobicto polar residues, small to larger residues, and large to smallerresidues. However, one mutant receptor that had only a few hydrophobicsubstitutions did not respond to aspartate. These results addto our understanding of sequence specificity in the transmembraneregions of proteins with more than one transmembrane sequence.This work also demonstrates a method of constructing familiesof mutant proteins containing random residues with chosen characteristics.  相似文献   

15.
A model of the lignin peroxidase LIII of Phlebia radiata wasconstructed on the basis of the structure of cytochrome c peroxidase(CCP). Because of the low percentage of amino acid identitybetween the CCP and the lignin peroxidase LIII of Phlebia radiata,alignment of the sequences was based on the generation of atemplate from a knowledge of the 3-D structure of CCP and consensussequences of lignin peroxidases. This approach gave an alignmentin which all the insertions in the lignin peroxidase were placedat loop regions of CCP, with a 21.1% identity for these twoproteins. The model was constructed using this alignment andthe computer program COMPOSER, which assembles the model asa series of rigid fragments derived from CCP and other proteins.Manual intervention was required for some of the longer loopregions. The -helices forming the structural framework, andespecially the haem environment of CCP, are conserved in theLIII model and the core is close packed without holes. A possiblesite of the substrate oxidation at the haem edge of LIII isdiscussed.  相似文献   

16.
A method using protein sequence divergence to predict the three-dimensionalstructure of the transmembrane domain of seven-helix membraneproteins is described. The key component in the multistep procedureis the calculation of a hydrophilic and lipophilic variabilityindex for each amino acid in an alignment of a family of homologousproteins. The variability profile, a plot of the calculatedvariability index versus alignment position, can be used topredict a tertiary model of the backbone conformation of thetransmembrane domain. This method was applied to bacteriorhodopsin(BR) and the model obtained was compared with the known structureof this protein. Using an alignment of the amino acid sequencesof BR and closely related (20% identity) proteins, the boundariesof the transmembrane regions, their secondary structures andorientations inside the membrane bilayer were predicted basedon the variability profile. Additional information about theshape of the helix bundle was also obtained from the averagevariability of each transmembrane helix with the assumptionthat the helices are packed sequentially and form a closed helixbundle. Correct features of the known structure of BR were foundin the model structure, suggesting that a similar strategy canbe used to predict transmembrane helices and the packing shapeof other membrane proteins with seven transmembrane helices,such as the opsins and other G-protein coupled receptors.  相似文献   

17.
A mechanism by which ligand binding to the extracellular domainof a growth factor receptor causes activation of its cytoplasmictyrosine kinase domain is that binding promotes receptor dimerization.Recently we proposed a model in which dimerization of the transmembrane-helices in one member of this family, rat neu, is mediatedby the presence of three specific residues. This paper showsthat a similar sequence motif is observed in 18 of the 20 transmembrane-helices of the tyrosine kinase family of growth factor receptors.The motif encompasses a five residue segment in which position0 (P0) requires a small side chain (Gly, Ala, Ser, Thr or Pro),P3 an aliphatic side chain (Ala, Val, Leu or Ile) and P4 onlythe smallest side chains (Gly or Ala). In addition other featuresof the transmembrane sequences are reported. It is concludedthat the dimerization of transmembrane -helices may be a generalmechanism of tyrosine kinase activation in this family of growthfactor receptors.  相似文献   

18.
A new multiple sequence alignment procedure is presented. Severaldifferent multiple alignments are made using differing criteria.Having divided the sequences into strongly conserved regions(SCRs) and loosely conserved regions (LCRs), the ‘best’alignment for each LCR is chosen, independently of the otherLCRs, from a selection of possibilities in the multiple alignments.To help make this choice for each LCR, the secondary structureis predicted and shown alongside each different possible alignment.One advantage of this method over automatic, non-interactivemethods, is that the final alignment is not dependent on thechoice of a single set of scoring parameters. Another is that,by allowing interactive choice and by taking account of secondarystructural information, the final alignment is based more onbiological rather than mathematical factors. This method canproduce better alignments than any of the initial automaticmultiple alignment methods used.  相似文献   

19.
Structural models of the variable domains of the murine anti-2-phenyloxazoloneIgG (Oxl idiotype) and its somatic variant, which has higheraffinity to the hapten 2-phenyloxazolone, were constructed bycomputer-aided model building using known structures of highlyhomologous immunoglobulins as templates. Molecular dynamicssimulations were used to dock the hapten between the VL andVH domains. The hapten is predicted to bind to slightly differentsites in the two models. Hypotheses concerning the role of anumber of preferred mutations in anti-oxazolone variants arepresented. These can be tested by mutagenesis and crystallography.In particular, the higher binding affinities of the differentantibody variants are shown to correlate with better complementarityof electrostatics. The molecular dynamic simulations also suggestthat two mobile tryptophans at the mouth of the pocket may playan important role in the binding of hapten.  相似文献   

20.
The loop exchange mutant chymosm 155–164 rhizopuspepsinwas expressed in Trichoderma reesei and exported into the mediumto yield a correctly folded and active product. The biochemicalcharacterization and crystal structure determination at 2.5Å resolution confirm that the mutant enzyme adopts a nativefold. However, the conformation of the mutated loop is unlikethat in native rhizopuspepsin and involves the chelation ofa water molecule in the loop. Kinetic analysis using two syntheticpeptide substrates (six and 15 residues long) and the naturalsubstrate, milk, revealed a reduction in the activity of themutant enzyme with respect to the native when acting on boththe long peptide substrate and milk. This may be a consequenceof the different charge distribution of the mutated loop, itsincreased size and/or its different conformation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号