首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Is there value in constructing side chains while searching proteinconformational space during an ab initio simulation? If so,what is the most computationally efficient method for constructingthese side chains? To answer these questions, four publishedapproaches were used to construct side chain conformations ona range of near-native main chains generated by ab initio proteinstructure prediction methods. The accuracy of these approacheswas compared with a naive approach that selects the most frequentlyobserved rotamer for a given amino acid to construct side chains.An all-atom conditional probability discriminatory functionis useful at selecting conformations with overall low all-atomroot mean square deviation (r.m.s.d.) and the discriminationimproves on sets that are closer to the native conformation.In addition, the naive approach performs as well as more sophisticatedmethods in terms of the percentage of  相似文献   

2.
Database-derived potentials, compiled from frequencies of sequenceand structure features, are often used for scoring the compatibilityof protein sequences and conformations. It is often believedthat these scores correspond to differences in free energy with,in addition, a term containing the partition function of thesystem. Since this function does not depend on the conformation,the potentials are considered to be valid for scoring the compatibilityof different conformations with a given sequence (‘forwardfolding’), but not of sequences with a given structure(‘inverted folding’). This interpretation is questionedhere. It is argued that when many body-effects, which dominatefrequencies compiled from the protein database, are correctedfor, the potentials approximate a physically meaningful freeenergy difference from which the partition function term cancelsout It is the difference between the free energy of a givensequence in a specific conformation and that of the same sequencein a denatured-like state. Two examples of denatured-like statesare discussed. Depending on the considered state, the free energydifference reduces to the commonly used scoring scheme, or containsadditional terms that depend on the sequence. In both cases,all the terms can be derived from sequence-structure frequenciesin the database. Such free energy difference, commonly definedas the folding free energy, is a measure of protein stabilityand can be used for scoring both forward and inverted proteinfolding. The implications for the use of knowledge-based potentialsin protein structure prediction are described. Finally, thedifficulty of designing tests that could validate the proposedapproach, and the inherent limitations of such tests, are discussed  相似文献   

3.
Mutations in the gene encoding for a de novo methyltransferase,DNMT3B, lead to an autosomal recessive Immunodeficiency, Centromericinstability and Facial anomalies (ICF) syndrome. To analysethe protein structure and consequences of ICF-causing mutations,we modelled the structure of the DNMT3B methyltransferase domainbased on Haemophilus haemolyticus protein in complex with thecofactor AdoMet and the target DNA sequence. The structuralmodel has a two-subdomain fold where the DNA-binding regionis situated between the subdomains on a surface cleft havingpositive electrostatic potential. The smaller subdomains ofthe methyltransferases differ in length and sequences and thereforeonly the target recognition domain loop was modelled to showthe location of an ICF-causing mutation. Based on the model,the DNMT3B recognizes the GC sequence and flips the cytosinefrom the double-stranded DNA to the catalytic pocket. The aminoacids in the cofactor and target cytosine binding sites andalso the electrostatic properties of the binding pockets areconserved. In addition, a registry of all known ICF-causingmutations, DNMT3Bbase, was constructed. The structural principlesof the pathogenic mutations based on the modelled structureand the analysis of  相似文献   

4.
A variety of different methods to generate diverse proteins,including random mutagenesis and recombination, are currentlyavailable and most of them accumulate the mutations on the targetgene of a protein, whose sequence space remains unchanged. Onthe other hand, a pool of diverse genes, which is generatedby random insertions, deletions and exchange of the homologousdomains with different lengths in the target gene, would presentthe protein lineages resulting in new fitness landscapes. Herewe report a method to generate a pool of protein variants withdifferent sequence spaces by employing green fluorescent protein(GFP) as a model protein. This process, designated functionalsalvage screen (FSS), comprises the following procedures: adefective GFP template expressing no fluorescence is first constructedby genetically disrupting a predetermined region(s) of the proteinand a library of GFP variants is generated from the defectivetemplate by incorporating the randomly fragmented genomic DNAfrom Escherichia coli into the defined region(s) of the targetgene, followed by screening of the functionally salvaged, fluorescence-emittingGFPs. Two approaches, sequence-directed and PCR-coupled methods,were attempted to generate the library of GFP variants withnew sequences derived from the genomic segments of E.coli. Thefunctionally salvaged GFPs were selected and analyzed in termsof the sequence space and functional properties. The resultsdemonstrate that the functional salvage process not only canbe a simple and effective method to create protein lineageswith new sequence spaces, but also can be useful in elucidatingthe involvement of a specific region(s) or domain(s) in thestructure and function of protein.  相似文献   

5.
The quality of three-dimensional homology models derived fromprotein sequences provides an independent measure of the suitabilityof a protein sequence for a certain fold. We have used automatedhomology modeling and model assessment tools to identify putativenuclear hormone receptor ligand-binding domains in the genomeof Caenorhabditis elegans. Our results indicate that the availabilityof multiple crystal structures is crucial to obtaining usefulmodels in this receptor family. The majority of annotated mammaliannuclear hormone receptors could be assigned to a ligand-bindingdomain fold by using the best model derived from any of fourtemplate structures. This strategy also assigned the ligand-bindingdomain fold to a number of C.elegans sequences without priorannotation. Interestingly, the retinoic acid receptor crystalstructure contributed most to the number of sequences that couldbe assigned to a ligand-binding domain fold. Several causesfor this can be suggested, including the high quality of thisprotein structure in terms of our assessment tools, similaritybetween the biological function or ligand of this receptor andthe modeled genes and gene duplication in C.elegans.  相似文献   

6.
Using the well-characterized antibody McPC603 as a model, wehad found that the Fv fragment can be isolated from Escherichiacoli as a functional protein in good yields, whereas the amountof the correctly folded Fab fragment of the same antibody producedunder identical conditions is significantly lower. In this paper,we analyse the reasons for this difference. We found that avariety of signal sequences function in the secretion of theisolated chains of the Fab fragment or in the co-secretion ofboth chains in E.coli. The low yield of functional Fab fragmentis not caused by inefficient expression or secretion in E.coli,but by inefficient folding and/or assembly in the periplasm.We compared the folding yields for the Fv and the Fab fragmentin the periplasm under various conditions. Several diagnosticframework variants were constructed and their folding yieldsmeasured. The results show that substitutions affecting cis-prolineresidues and those affecting various disulphide bonds in theprotein are by themselves insufficient to dramatically changethe partitioning of the folding pathway to the native structure,and the cause must lie in a facile aggregation of folding intermediatescommon to all structural variants. However, all structural variantscould be obtained in native form, demonstrating the generalutility of the secretory expression strategy.  相似文献   

7.
In search of the ideal protein sequence   总被引:1,自引:0,他引:1  
The inverse of a folding problem is to find the ideal sequencethat folds into a particular protein structure. This problemhas been addressed using the topology fingerprintbased threadingalgorithm, capable of calculating a score (energy) of an arbitrarysequence-structure pair. At first, the search is conducted byunconstrained minimization of the energy in sequence space.It is shown that using energy as the only design criterion leadsto spurious solutions with incorrect amino acid composition.The problem lies in the general features of the protein energysurface as a function of both structure and sequence. The proposedsolution is to design the sequence by maximizing the differencebetween its energy in the desired structure and in other knownprotein structures. Depending on the size of the database ofstructures ‘to avoid’, sequences bearing significantsimilarity to the native sequence of the target protein areobtained using this procedure.  相似文献   

8.
The SBASE domain library: a collection of annotated protein segments   总被引:2,自引:0,他引:2  
SBASE is a database of annotated protein domain sequences representingvarious structural, functional, ligand binding and topogenicsegments of proteins. The current release of SBASE contains27 211 entries which are provided with standardized names inorder to facilitate retrieval. SBASE is cross-referenced tothe major protein and nucleic acid databanks as well as to thePROSITE catalog of protein sequence patterns [Bairoch, A. (1992)Nucleic Acids Res., 20, Suppl., 2013–2118]. SBASE canbe used to establish domain homologies through database searchusing programs such as FASTA [Lipman and Pearson (1985) Science,227, 1436–1441], FASTDB [Brutlag et al. (1990) Comp. Appl.Biosci., 6, 237–245] or BLAST3 [Altschul and Lipman (1990)Proc. Natl. Acad. Sci. USA, 87, 5509–5513], which is especiallyuseful in the case of loosely defined domain types for whichefficient consensus patterns cannot be established. The useof SBASE is illustrated on the DNA binding protein Brain-4.The database and a set of search and retrieval tools are freelyavailable on request to the authors or by anonymous ‘ftp’file transfer from <ftp.icgeb.trieste.it>.  相似文献   

9.
We developed a new method which searches sequence segments responsiblefor the recognition of a given chemical structure. These segmentsare detected as those locally conserved among a sequence tobe analyzed (target sequence) and a set of sequences (referencesequences). Reference sequences are the sequences of functionallyrelated proteins, ligands of which contain a common chemicalsubstructure in their molecular structures. ‘Similaritygraphing’ cuts target sequences into segments, alignsthem with reference sequence pairwise, calculates the degreeof similarity for each alignment, and shows graphically cumulativesimilarity values on target sequence. Any locally conservedregions, short or long in length and weak or strong in similarity,are detected at their optimal conditions by adjusting threeparameters. The ‘enzyme-reaction database’ containschemical structures and their related enzymes. When a chemicalsubstructure is input into the database, sequences of the enzymesrelated to the input substructure are systematically searchedfrom the NBRF sequence database and output as reference sequences.Examples of analysis using similarity graphing in combinationwith the enzyme-reaction database showed a great potentialityin the systematic analysis of the relationships between sequencesand molecular recognitions for protein engineering.  相似文献   

10.
A general protein sequence alignment methodology for detectinga priori unknown common structural and functional regions isdescribed. The method proposed in this paper is based on twobasic requirements for a meaningful alignment. First, each sequenceor segment of a sequence is characterized by a multivariatephysicochemical profile. Second, the alignment is performedby considering all the sequences simultaneously, and the algorithmdetects those regions that form a set of similar profiles. Inorder to test the structural meaning of the alignment obtainedfrom the sequences, quantitative comparisons are performed withstructurally conserved regions (SCR) determined from the X-raystructures of three serine proteases. Results suggest that thelimits of the SCR may be predicted from the similarities betweenthe physicochemical profiles of the sequences. The proceduresare not completely automated. The final step requires a visualscreening of alternative pathways in order to determine an optimalalignment.  相似文献   

11.
In vitro molecular evolution is regarded as a hill-climbingon a fitness landscape in sequence space, where the ‘fitness’is a quantitative measure of a certain physicochemical propertyof a biopolymer. We analyzed a ‘cross-section’ ofthe enzymatic activity landscape of dihydrofolate reductase(DHFR) by using a method of analysis of a fitness landscape.We limited the sequence space of interest to the five-dimensionalsequence space, where the coordinate corresponds to the 1st,16th, 20th, 42nd and 92nd site in the DHFR sequence. Thirtysix mutants mapped into the limited sequence space were takenin the analysis. As a result, the cross-section is of the roughMt Fuji type based on the mutational additivity. The ratio ofthe mean slope to the roughness is 2.8 and the Z-score of theoriginal ratio against a distribution of random references is7.0, which indicates a large statistical significance. The existenceof such a cross-section was discussed in terms of the occurrenceprobability of sets of five sites distantly separated from eachother on the DHFR 3D structure. Our results support the effectivenessof the evolution strategy which exploits the accumulation ofadvantageous single point mutations in such a cross-section.  相似文献   

12.
Computational sequence design methods are used to engineer proteins with desired properties such as increased thermal stability and novel function. In addition, these algorithms can be used to identify an envelope of sequences that may be compatible with a particular protein fold topology. In this regard, we hypothesized that sequence-property prediction, specifically secondary structure, could be significantly enhanced by using a large database of computationally designed sequences. We performed a large-scale test of this hypothesis with 6511 diverse protein domains and 50 designed sequences per domain. After analysis of the inherent accuracy of the designed sequences database, we realized that it was necessary to put constraints on what fraction of the native sequence should be allowed to change. With mutational constraints, accuracy was improved vs. no constraints, but the diversity of designed sequences, and hence effective size of the database, was moderately reduced. Overall, the best three-state prediction accuracy (Q(3)) that we achieved was nearly a percentage point improved over using a natural sequence database alone, well below the theoretical possibility for improvement of 8-10 percentage points. Furthermore, our nascent method was used to augment the state-of-the-art PSIPRED program by a percentage point.  相似文献   

13.
Lei Huang 《Polymer》2006,47(5):1755-1762
Starting from a kinetically foldable criterion for designing fast-folding structures, we have investigated the foldabilities of all possible sequences coded in two letters through an exhaustive enumeration of model chains of a 16-mer protein that we performed using a simple off-lattice model. From a set of 32,896 sequences, we found only 145 sequences that were foldable. Through a comparison of the geometrical similarities of those foldable sequences, we reduced the corresponding 145 native structures to a structural set of 69 good candidates for target structures in the de novo design of fast-folding sequences. We make the following conclusions: (1) a preferred proportion of compositions exist for sequence design. (2) Foldable sequences having different numbers of hydrophobic residues possess very similar sequences. (3) The stability of some special structures toward mutations may be the origin of common protein structures; our results demonstrate that the presence of hydrophobic residues in certain positions of a sequence can result in firm and mutation-resistant skeletons. It appears that a simple, but robust, chain topology and structural symmetry lead to high designability.  相似文献   

14.
Does a backwardly read protein sequence have a unique native state?   总被引:2,自引:0,他引:2  
Amino acid sequences of native proteins are generally not palindromic.Nevertheless, the protein molecule obtained as a result of readingthe sequence backwards, i.e. a retro-protein, obviously hasthe same amino acid composition and the same hydrophobicityprofile as the native sequence. The important questions whicharise in the context of retro-proteins are: does a retro-proteinfold to a well defined native-like structure as natural proteinsdo and, if the answer is positive, does a retro-protein foldto a structure similar to the native conformation of the originalprotein? In this work, the fold of retro-protein A, originatedfrom the retro-sequence of the B domain of Staphylococcal proteinA, was studied. As a result of lattice model simulations, itis conjectured that the retro-protein A also forms a three-helixbundle structure in solution. It is also predicted that thetopology of the retro-protein A three-helix bundle is that ofthe native protein A, rather than that corresponding to themirror image of native protein A. Secondary structure elementsin the retro-protein do not exactly match their counterpartsin the original protein structure; however, the amino acid sidechain contact pattern of the hydrophobic core is partly conserved.  相似文献   

15.
Quantifying the local reliability of a sequence alignment   总被引:4,自引:0,他引:4  
We present a method for attributing a measure of reliabilityto a residue pair in an optimal alignment of two protein sequences.Validation based on a database of structurally correct alignments[Pascarella and Argos (1992) Protein Engng, 5, 121–137]shows that correctly aligned parts of a sequence alignment systematicallyreceive high scores in this measure. The higher the sequencesimilarity between two sequences, the larger is the fractionfound of the correct parts of the alignment. We used these observationsto design a program that draws a reliability curve along anoptimal alignment reflecting the chances for each residue pairto be aligned correctly.  相似文献   

16.
A methodology is proposed to solve a difficult modeling problemrelated to the recently sequenced P39 protein. This sequenceshares no similarity with any known 3D structure, but a foldis proposed by several threading tools. The difficulty in aligningthe target sequence on one of the proposed template structuresis overcome by combining the results of several available predictionmethods and by refining a rational consensus between them. Insilico validation of the obtained model and a preliminary cross-checkwith experimental features allow us to state that this borderlineprediction is at least reasonable. This model raises relevanthypotheses on the main structural features of the protein andallows the design of site-directed mutations. Knowing the geneticcontext of the P39 reading frame, we are now able to suggesta function for the P39 protein: it would act as a periplasmicsubstrate-binding protein.  相似文献   

17.
The automatic identification of motifs associated with a givenfunction is an important challenge for molecular sequence analysis.A method is presented for the extraction of such patterns fromlarge sets of unaligned sequences with related but general function,for example, a set of heat shock proteins. In such a set ofproteins there can often be several subfamilies each characterizedby one or more distinct motifs. The aim is to develop computationaltools to identify these motifs. The algorithm presented locateshigh frequency words of length k with a given number of positions,r, fixed. Statistics for a binomial distribution are used toassess the significance of the words. The high-frequency wordsare clustered and highly populated clusters retained. The compositionof the clusters is displayed graphically. A set of motifs associatedwith the sequence family can automatically be extracted. Themethod is benchmarked on a set of 106 heat shock sequences anda set of 257 toxin sequences. It is shown to recover previouslyidentified motifs.  相似文献   

18.
Naturally-occurring phytases having the required level of thermostabilityfor application in animal feeding have not been found in naturethus far. We decided to de novo construct consensus phytasesusing primary protein sequence comparisons. A consensus enzymebased on 13 fungal phytase sequences had normal catalytic properties,but showed an unexpected 15–22°C increase in unfoldingtemperature compared with each of its parents. As a first steptowards understanding the molecular basis of increased heatresistance, the crystal structure of consensus phytase was determinedand compared with that of Aspergillus niger phytase. Aspergillusniger phytase unfolds at much lower temperatures. In most cases,consensus residues were indeed expected, based on comparisonsof both three-dimensional structures, to contribute more tophytase stabilization than non-consensus amino acids. For someconsensus amino acids, predicted by structural comparisons todestabilize the protein, mutational analysis was performed.Interestingly, these consensus residues in fact increased theunfolding temperature of the consensus phytase. In summary,for fungal phytases apparently an unexpected direct link betweenprotein sequence conservation and protein stability exists.  相似文献   

19.
A model of the 3-D structure of a major house dust mite allergenDer p I associated with hypersensitivity reactions in humanswas built from its amino acid sequence and its homology to threeknown structures, papain, actinidin and papaya proteinase flof the cysteine proteinase family. Comparative modelling usingCOMPOSER was used to arrive at an initial model. This was refinedusing interactive graphics and energy minimization with theAMBER force field incorporated in SYBYL (Tripos Associates).Compatibility of the Der p I amino add sequence with the cysteineproteinase fold was checked using an environment-dependent aminoadd propensity table incorporated into a new program HARMONYwith a variable length windowing facility. A fiveresidue windowwas used to probe local conformational integrity. Propensitieswere derived from a structural alignment database of homologousproteins using a robust entropy-driven smoothing procedure.Der p I shares essential structural and mechanistic featureswith other papain-like cysteine proteinases, including cathepsinB. The active-site t iolate-imidazolium ion pair comprises theside chains of Cys34 and Hisl70. A cystine disulfide not presentin other known structures bridges residue 4 of an N-terminalextension and the core residue 117. Two conserved disulfidebridges are formed by residues 31 and 71 and residues 65 and103. Model building of peptide substrate analogue complexessuggests a preference for phenylalanyl or bask residues at theP2 position, whilst selectivity may be of minor importance atthe S1 subsite. The electrostatic influences on the Der p Iactive-site ion pair and extended peptide binding region aremarkedly different from those in known structures. A highlyimmunogenic surface exposed region (residues 107–131),comprising several overlapping T cell epitope sites, has noshared sequence identity with human liver cathepsin B and containsthree insertion-deletion sites. The structure provides a basisfor testing the substrate specificity of Der p I and the potentialrole of proteinase activity in hypersensitivity reactions. Thesestudies may offer a new treatment strategy by hyposensitizationwith inactive mutants or mutants with significantly alteredproteinase activity, either alone or complexed with antibody.  相似文献   

20.
We propose a new assessment, called the best-five test, for the pseudo- energy potential empirically derived from the protein structural database. The object of the test is the three-dimensional (3D) profiles of proteins, which are directly connected to the pseudo-energy potentials. In the 3D profile, the fitness of each amino acid type is ranked at each residue site of a protein. A site whose native residue type is ranked within the best-five out of 20 amino acids is regarded as satisfactory and the ratio of the satisfactory sites over all the sites of all the proteins examined is indicative of the efficiency of the pseudo-energy potential employed. We applied the test to our potential function consisting of four terms; side-chain packing, hydration, backbone hydrogen-bonding and local conformation, by setting various kinds of definitions for each term. Through this test, the validity of the minus average operation is confirmed, where the energy level of potential functions is adjusted by referring to the random- environmental state of the proteins. Especially in the side-chain packing function, the success ratio increases from about 30 to 50% with this operation. Failure without the operation is ascribed to bulky hydrophobic residues, which almost always occupy higher ranking positions in the 3D profile table. A maximum success ratio of 55.6% was attained with the final potential set consisting of the above four terms. The efficiency of the final set was further checked in the fold- recognition test for distantly related proteins. The best-five test is a new use of the 3D profile table for assessing the ability of the pseudo-energy potentials.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号