首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Sequence weighting techniques are aimed at balancing redundantobserved information from subsets of similar sequences in multiplealignments. Traditional approaches apply the same weight toall positions of a given sequence, hence equal efficiency ofphylogenetic changes is assumed along the whole sequence. Thisrestrictive assumption is not required for the new method PSIC(position-specific independent counts) described in this paper.The number of independent observations (counts) of an aminoacid type at a given alignment position is calculated from theoverall similarity of the sequences that share the amino acidtype at this position with the help of statistical concepts.This approach allows the fast computation of position-specificsequence weights even for alignments containing hundreds ofsequences. The PSIC approach has been applied to profile extractionand to the fold family assignment of protein sequences withknown structures. Our method was shown to be very productivein finding distantly related sequences and more powerful thanHidden Markov Models or the profile methods in WiseTools andPSI-BLAST in many cases. The profile extraction routine is availableon the WWW (http://www.bork.embl-heidelberg.de/PSIC or http://www.imb.ac.ru/PSIC).  相似文献   

2.
Does a backwardly read protein sequence have a unique native state?   总被引:2,自引:0,他引:2  
Amino acid sequences of native proteins are generally not palindromic.Nevertheless, the protein molecule obtained as a result of readingthe sequence backwards, i.e. a retro-protein, obviously hasthe same amino acid composition and the same hydrophobicityprofile as the native sequence. The important questions whicharise in the context of retro-proteins are: does a retro-proteinfold to a well defined native-like structure as natural proteinsdo and, if the answer is positive, does a retro-protein foldto a structure similar to the native conformation of the originalprotein? In this work, the fold of retro-protein A, originatedfrom the retro-sequence of the B domain of Staphylococcal proteinA, was studied. As a result of lattice model simulations, itis conjectured that the retro-protein A also forms a three-helixbundle structure in solution. It is also predicted that thetopology of the retro-protein A three-helix bundle is that ofthe native protein A, rather than that corresponding to themirror image of native protein A. Secondary structure elementsin the retro-protein do not exactly match their counterpartsin the original protein structure; however, the amino acid sidechain contact pattern of the hydrophobic core is partly conserved.  相似文献   

3.
Degenerate codon libraries are frequently used in protein engineering and evolution studies but are often limited to targeting a small number of positions to adequately limit the search space. To mitigate this, codon degeneracy can be limited using heuristics or previous knowledge of the targeted positions. To automate design of libraries given a set of amino acid sequences, an algorithm (LibDesign) was developed that generates a set of possible degenerate codon libraries, their resulting size, and their score relative to a user-defined scoring function. A gene library of a specified size can then be constructed that is representative of the given amino acid distribution or that includes specific sequences or combinations thereof. LibDesign provides a new tool for automated design of high-quality protein libraries that more effectively harness existing sequence-structure information derived from multiple sequence alignment or computational protein design data.  相似文献   

4.
In the tobamovirus coat protein family, amino acid residuesat some spatially close positions are found to be substitutedin a coordinated manner [Altschuh et al. (1987) J. Mol. Biol.,193,693]. Therefore, these positions show an identical patternof amino acid substitutions when amino acid sequences of thesehomologous proteins are aligned. Based on this principle, coordinatedsubstitutions have been searched for in three additional proteinfamilies: serine proteases, cysteine proteases and the haemoglobins.Coordinated changes have been found in all three protein familiesmostly within structurally constrained regions. This methodworks with a varying degree of success depending on the functionof the proteins, the range of sequence similarities and thenumber of sequences considered. By relaxing the criteria forresidue selection, the method was adapted to cover a broaderrange of protein families and to study regions of the proteinshaving weaker structural constraints. The information derivedby these methods provides a general guide for engineering ofa large variety of proteins to analyse structure–functionrelationships.  相似文献   

5.
Restriction enzymes (REases) are commercial reagents commonly used in DNA manipulations and mapping. They are regarded as very attractive models for studying protein-DNA interactions and valuable targets for protein engineering. Their amino acid sequences usually show no similarities to other proteins, with rare exceptions of other REases that recognize identical or very similar sequences. Hence, they are extremely hard targets for structure prediction and modeling. NlaIV is a Type II REase, which recognizes the interrupted palindromic sequence GGNNCC (where N indicates any base) and cleaves it in the middle, leaving blunt ends. NlaIV shows no sequence similarity to other proteins and virtually nothing is known about its sequence-structure-function relationships. Using protein fold recognition, we identified a remote relationship between NlaIV and EcoRV, an extensively studied REase, which recognizes the GATATC sequence and whose crystal structure has been determined. Using the 'FRankenstein's monster' approach we constructed a comparative model of NlaIV based on the EcoRV template and used it to predict the catalytic and DNA-binding residues. The model was validated by site-directed mutagenesis and analysis of the activity of the mutants in vivo and in vitro as well as structural characterization of the wild-type enzyme and two mutants by circular dichroism spectroscopy. The structural model of the NlaIV-DNA complex suggests regions of the protein sequence that may interact with the 'non-specific' bases of the target and thus it provides insight into the evolution of sequence specificity in restriction enzymes and may help engineer REases with novel specificities. Before this analysis was carried out, neither the three-dimensional fold of NlaIV, its evolutionary relationships or its catalytic or DNA-binding residues were known. Hence our analysis may be regarded as a paradigm for studies aiming at reducing 'white spaces' on the evolutionary landscape of sequence-function relationships by combining bioinformatics with simple experimental assays.  相似文献   

6.
The evaluation of calculated protein structures is an importantstep in the protein design cycle. Known criteria for this assessmentof proteins are the polar and apolar, accessible and buriedsurface area, electrostatic interactions and other interactionsbetween the protein atoms (e.g. HO, S-S),atomic packing, analysisof amino acid environment and surface charge distribution. Weshow that a powerful test of accuracy of protein structure canbe derived by analysing the water contact of atoms and additionallytaking into account their polarity. On the basis of estimatedreference values of the polar fraction of typical globular proteinswith known structure (mean, SD and distribution), the evaluationof misfolded structures can be improved significantly. The referencevalues are derived by moving windows of different length (3–99amino acid residues) over the amino acid sequence. Model proteins,which are included in the Brookhaven protein structure databank,deliberately misfolded proteins, hypothetical proteins and predictedprotein structures are diagnosed as at least partially incorrectlyfolded. The local fault, mostly observed, is that polar groupsare buried too frequently in the interior of the protein. Thedatabase-derived quantities are useful in screening the designedproteins prior to experimentation and may also be useful inthe assessment of errors in the experimentally determined proteinstructures.  相似文献   

7.
Phosphoenolpyruvate carboxylase (PEPC) catalyzes the irreversible carboxylation of phosphoenolpyruvate (PEP) and plays a crucial role in fixing atmospheric CO(2) in C(4) and CAM plants. The enzyme is widespread in plants and bacteria and mostly regulated allosterically by both positive and negative effectors. Archaeal PEPCs (A-PEPCs) have unique characteristics in allosteric regulation and molecular mass, distinct from their bacterial and eukaryote homologues, and their amino acid sequences have become available only recently. In this paper, we generated a structure-based alignment of archaeal, bacterial and eukaryote PEPCs and built comparative models using a combination of fold recognition, sequence and structural analysis tools. Our comparative modeling analysis identified A-PEPC-specific strong interactions between the two loops involved in both allostery and catalysis, which explained why A-PEPC is not influenced by any allosteric activators. We also found that the side-chain located three residues before the C-terminus appears to play a key role in determining the sensitivity to allosteric inhibitors. In addition to these unique features, we revealed how archaeal, bacterial and eukaryote PEPCs would share a common catalytic mechanism and adopt a similar mode of tetramer formation, despite their divergent sequences. Our novel observations will help design more efficient molecules for ecological and industrial use.  相似文献   

8.
Improving protein stability in unnatural and suboptimal environmentsis a promising application of protein engineering technology.Carefully designed amino acid alterations may lead to dramaticpositive effects on the stability of proteins under highly perturbingconditions, such as in non-aqueous solvents. Applications ofbiocatalysts and proteins with specific binding capabilitiesin the chemical industry have been severely limited by constraintsplaced on the solvent environment. With the advent of convenientmethods for altering the amino acid composition and even synthesizingentirely new protein molecules, it is worthwhile to considerengineering proteins for stability in non-aqueous solvents.In order to identify the features that a protein would needfor stability in organic media, we have been studying the structureand properties of the hydrophobic protein crambin. Crambin isunique in that it is soluble and stable in very high concentrationsof polar organic solvents. Crambin and its water-soluble homologsoffer a powerful demonstration of protein engineering for non-aqueoussolvents. This paper describes the structural features thatcontribute to crambin's special properties. Based on these observationsand consideration of how non-aqueous solvents affect the interactionsimportant in protein folding, a set of rules for designing non-aqueoussolvent-stable proteins is proposed.  相似文献   

9.
The sequences of four--helical bundle proteins are characterizedby a pattern of hydrophilic and hydrophobic amino acids whichis repeated every seven residues. At each position of the heptadrepeat there are specific constraints on the amino acid propertieswhich result from the topology of the tertiary motif. Theseconstraints give rise to patterns of amino acid distributionwhich are distinct from those of other proteins. The distributionsin each of the heptad positions have been determined by a statisticalanalysis of structural and sequence data derived from sevenfamines of aligned protein sequences. The constitution of eachposition is dominated by a very small number of different aminoacids, with the core positions consisting overwhelmingly ofLeu and Ala. The positional preferences of the individual aminoacids can be generally interpreted in terms of residue propertiesand topological constraints. The potential for four-a-helixbundle folding is reflected primarily in the pattern of residueoccurrence in the heptad and not in the overall amino acid compositionof the protein. Possible applications of this analysis in structurepredictions, sequence alignments and in the rational designand engineering of four-a-helkal bundle proteins are discussed.  相似文献   

10.
The Escherichia coli aspartate receptor is a dimer with twotransmembrane sequences per monomer that connect a periplasmicligand binding domain to a cytoplasmic signaling domain. Themethod of 'hydrophobic-biased' random mutagenesis, that we describehere, was used to construct mutant aspartate receptors in whicheither the entire transmembrane sequence or seven residues nearthe center of the transmembrane sequence were replaced withhydrophobic and polar random residues. Some of these receptorsresponded to aspartate in an in vivo chemotaxis assay, whileothers did not. The acceptable substitutions included hydrophobicto polar residues, small to larger residues, and large to smallerresidues. However, one mutant receptor that had only a few hydrophobicsubstitutions did not respond to aspartate. These results addto our understanding of sequence specificity in the transmembraneregions of proteins with more than one transmembrane sequence.This work also demonstrates a method of constructing familiesof mutant proteins containing random residues with chosen characteristics.  相似文献   

11.
This paper describes peptide analogs and the design strategythat were used to facilitate the final construction of a denovo-designed protein (ALIN) whose stable tertiary fold hasbeen determined recently by NMR spectroscopy. Previous studieshave suggested that the main problem in the de novo design ofproteins is the attainment of a protein with a defined fold.To effectively overcome this mainchain multiconformation problem,three related steps, with experimental evaluation of the designhypotheses for each step, were pursued in the design process.Firstly, 15-residue sequences with experimentally verified highhelicities were selected for the helical regions. Secondly,hydrophobic and electrostatic interhelical interactions as wellas an interhelical disulfide bridge were designed to favor anantiparallel configuration of the helix axis. Finally, a loopwith sufficient flexibility was inserted to stabilize the helicesin the desired orientation. To assess the design strategy, peptidescorresponding to each design step were synthesized and theirstructures verified experimentally by far-UV CD. As anticipated,ALIN was the most helical, and the SSbridged dimeric peptideswere more helical than their monomeric counterparts. The van'tHoff enthalpy change for ALIN computed from the CD denaturationcurve and assuming a two-state model was 50 kJ/mol, a valueclose to that observed for helical coiled-coils. Overall, thisreport shows that small, simple proteins can be built usingthe current knowledge of protein structures.  相似文献   

12.
In search of the ideal protein sequence   总被引:1,自引:0,他引:1  
The inverse of a folding problem is to find the ideal sequencethat folds into a particular protein structure. This problemhas been addressed using the topology fingerprintbased threadingalgorithm, capable of calculating a score (energy) of an arbitrarysequence-structure pair. At first, the search is conducted byunconstrained minimization of the energy in sequence space.It is shown that using energy as the only design criterion leadsto spurious solutions with incorrect amino acid composition.The problem lies in the general features of the protein energysurface as a function of both structure and sequence. The proposedsolution is to design the sequence by maximizing the differencebetween its energy in the desired structure and in other knownprotein structures. Depending on the size of the database ofstructures ‘to avoid’, sequences bearing significantsimilarity to the native sequence of the target protein areobtained using this procedure.  相似文献   

13.
Making an alignment of the amino acid sequences is an essentialstep in the prediction of an unknown protein structure by modelbuilding from the known structure of a protein of the same family.To improve the accuracy of the alignments, we introduced theconcept of hydrophobic core scores, which restrains puttinginsertions/deletions in the hydrophobic core regions of theprotein. Eight pairs of protein sequences were aligned by thismethod, and the quality of the alignments were assessed byreference to those obtained by the structural superposition.The introduction of the hydrophobic core scores derived fromthe knowledge of the tertiary structure of one of each pairresulted in an improvement of the accuracy of the alignments.The quality of the alignment was found to depend on the homologyof the protein sequences.  相似文献   

14.
The hydrophobic part of the solvent-accessible surface of atypical monomeric globular protein consists of a single, largeinterconnected region formed from faces of apolar atoms andconstituting –60% of the solvent-accessible surface area.Therefore, the direct delineation of the hydrophobic surfacepatches on an atom-wise basis is impossible. Experimental dataindicate that, in a two-state hydration model, a protein canbe considered to be unified with its first hydration shell inits interaction with bulk water. We show that, if the surfacearea occupied by water molecules bound at polar protein atomsas generated by AUTOSOL is removed, only about two-thirds ofthe hydrophobic part of the protein surface remains accessibleto bulk solvent. Moreover, the organization of the hydrophobicpart of the solvent-accessible surface experiences a drasticchange, such that the single interconnected hydrophobic regiondisintegrates into many smaller patches, i.e. the physical definitionof a hydrophobic surface region as unoccupied by first hydrationshell water molecules can distinguish between hydrophobic surfaceclusters and small interconnecting channels. It is these remaininghydrophobic surface pieces that probably play an important rolein intraand intermolecular recognition processes such as ligandbinding, protein folding and protein–protein associationin solution conditions. These observations have led to the developmentof an accurate and quick analytical technique for the automaticdetermination of hydrophobic surface patches of proteins. Thistechnique is not aggravated by the limiting assumptions of themethods for generating explicit water hydration positions. Formationof the hydrophobic surface regions owing to the structure ofthe first hydration shell can be computationally simulated bya small radial increment in solvent-accessible polar atoms,followed by calculation of the remaining exposed hydrophobicpatches. We demonstrate that a radial increase of 0.35–0.50Å resembles the effect of tightly bound water on the organizationof the hydrophobic part of the solvent-accessible surface.  相似文献   

15.
Functioning of proteins efficiently at the solid-liquid interface is critical to not only biological but also modern man-made systems such as ELISA, liposomes and biosensors. Anchoring hydrophilic proteins poses a major challenge in this regard. Lipid modification, N-acyl-S-diacylglyceryl-Cys, providing an N-terminal hydrophobic membrane anchor is a viable solution that bacteria have successfully evolved but remains unexploited. Based on the current understanding of this ubiquitous and unique bacterial lipid modification it is possible to use Escherichia coli, the popular recombinant protein expression host, for converting a non-lipoprotein to a lipoprotein with a hydrophobic anchor at the N-terminal end. We report two strategies applicable to non-lipoproteins (with or without signal sequences) employing minimal sequence change. Taking periplasmic Shigella apyrase as an example, its signal sequence was engineered to include a lipobox, an essential determinant for lipid modification, or its mature sequence was fused to the signal sequence of abundant outer membrane lipoprotein, Lpp. Lipid modification was proved by membrane localization, electrophoretic mobility shift and mass spectrometric analysis. Substrate specificity and specific activity measurements indicated functional integrity after modification. In conclusion, a convenient protein engineering strategy for converting non-lipoprotein to lipoprotein for commercial application has been devised and tested successfully.  相似文献   

16.
In the TNC family of Ca-binding proteins (calmodulin, parvalbumin,intestinal calcium binding protein and troponin C) {small tilde}70 well-conserved amino acid sequences and six crystal structuresare known. We find a clear correlation between residue contactsin the structures and residue conservation in the sequences:residues with strong sidechain–sidechain contacts in thethree-dimensional structure tend to be the more conserved inthe sequence. This is one way to quantify the intuitive notionof the importance of sidechain interactions for maintainingprotein three-dimensional structure in evolution and may usefullybe taken into account in planning point mutations in proteinengineering.  相似文献   

17.
Independence divergence-generated binary trees of amino acids   总被引:1,自引:0,他引:1  
The discovery of the relationship between amino acids is importantin terms of the replacement ability, as used in protein engineeringhomology studies, and gaining a better understanding of theroles which various properties of the residues play in the creationof a unique, stable, 3-D protein structure. Amino acid sequencesof proteins edited by evolution are anything but random. Themeasure of nonrandomness, i.e. the level of editing, can becharacterized by an independence divergence value. This parameteris used to generate binary tree relationships between aminoacids. The relationships of residues presented in this paperare based on protein building features and not on the physico-chemicalcharacteristics of amino acids. This approach is not biasedby the tautology present in all sequence similarity-based relationshipstudies. The roles which various physico-chemical characteristicsplay in the determination of the relationships between aminoacids are also discussed.  相似文献   

18.
A method for comparison of protein sequences based on theirprimary and secondary structure is described. Protein sequencesare annotated with predicted secondary structures (using a modifiedChou and Fasman method). Two lettered code sequences are generated(Xx, where X is the amino acid and x is its annotated secondarystructure). Sequences are compared with a dynamic programmingmethod (STRALIGN) that includes a similarity matrix for boththe amino acids and secondary structures. The similarity valuefor each paired two-lettered code is a linear combination ofsimilarity values for the paired amino acids and their annotatedsecondary structures. The method has been applied to eight globinproteins (28 pairs) for which the X-ray structure is known.For protein pairs with high primary sequence similarity (>45%),STRALIGN alignment is identical to that obtained by a dynamicprogramming method using only primary sequence information.However, alignment of protein pairs with lower primary sequencesimilarity improves significantly with the addition of secondarystructure annotation. Alignment of the pair with the least primarysequence similarity of 16% was improved from 0 to 37% ‘correct’alignment using this method. In addition, STRALIGN was successfullyapplied to seven pairs of distantly related cytochrome c proteins,and three pairs of distantly related picornavirus proteins.  相似文献   

19.
Optimal sequence threading can be used to recognize membersof a library of protein folds which are closely related in 3-Dstructure to the native fold of an input test sequence, evenwhen the test sequence is not significantly homologous to thesequence of any member of the fold library. The methods providean alignment between the residues of the test sequence and theresidue positions in a template fold. This alignment optimizesa score function, and the predicted fold is the highest scoringmember of the library of folds. Most score functions containa pairwise interaction energy term. This, coupled with the needto introduce gaps into the alignment, means that the optimizationproblem is NP hard. We report a comparison between two heuristicoptimization algorithms used in the literature, double dynamicprogramming and an iterative algorithm based on the so-calledfrozen approximation. These are compared in terms of both theranking of likely folds and the quality of the alignment produced.  相似文献   

20.
The application of the mean force field in protein mutant stabilityprediction is explored. Based on protein main chain characteristics,including polar fraction, accessibility and dihedral angles,the mean force field was constructed to evaluate the compatibilitybetween an amino acid residue and its environment, from whicha position-dependent protein mutant profile was constructed.At each position along a protein sequence, the native residuewas replaced by the other 19 types of amino acid residues. Thematches were evaluated by energies from mean force field calculation,from which a mutant profile along the protein sequence was derived.General characteristics of such a profile were analyzed. Mutantstabilities for two sets of mutants in two proteins were foundto be reasonable compared with experimental data, which indicatesthat the present method can act as a guide in protein engineeringand as an effective scoring matrix in protein sequence–structurealignment studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号