首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 734 毫秒
1.
Designing amino acid sequences to fold with good hydrophobic cores   总被引:3,自引:0,他引:3  
We present two methods for designing amino acid sequences ofproteins that will fold to have good hydrophobic cores. Giventhe coordinates of the desired target protein or polymer structure,the methods generate sequences of hydrophobic (H) and polar(P) monomers that are intended to fold to these structures.One method designs hydrophobic inside, polar outside; the otherminimizes an energy function in a sequence evolution process.The sequences generated by these methods agree at the levelof 60–80% of the sequence positions in 20 proteins inthe Protein Data Bank. A major challenge in protein design isto create sequences that can fold uniquely, i.e. to a singleconformation rather than to many. While an earlier lattice-basedsequence evolution method was shown not to design unique folders,our method generates unique folders in lattice model tests.These methods may also be useful in designing other types offoldable polymer not based on amino acids  相似文献   

2.
The directed evolution of proteins has benefited greatly from site-specific methods of diversification such as saturation mutagenesis. These techniques target diversity to a number of chosen positions that are usually non-contiguous in the protein's primary structure. However, the number of targeted positions can be large, thus leading to impractically large library size, wherein almost all library variants are inactive and the likelihood of selecting desirable properties is extremely small. We describe a versatile combinatorial method for the partial diversification of large sets of residues. Our library oligonucleotides comprise randomized codons that are flanked by wild-type sequences. Adding these oligonucleotides to an assembly PCR of wild-type gene fragments incorporates the randomized cassettes, at their target sites, into the reassembled gene. Varying the oligonucleotides concentration resulted in library variants that carry a different average number of mutated positions that comprise a random subset of the entire set of diversified codons. This method, dubbed Incorporating Synthetic Oligos via Gene Reassembly (ISOR), was used to create libraries of a cytosine-C5 methyltransferase wherein 45 individual positions were randomized. One library, containing an average of 5.6 mutated residues per gene, was selected, and mutants with wild-type-like activities isolated. We also created libraries of serum paraoxonase PON1 harboring insertions and deletions (indels) in various areas surrounding the active site. Screening these libraries yielded a range of mutants with altered substrate specificities and indicated that certain regions of this enzyme have a surprisingly high tolerance to indels.  相似文献   

3.
In the tobamovirus coat protein family, amino acid residuesat some spatially close positions are found to be substitutedin a coordinated manner [Altschuh et al. (1987) J. Mol. Biol.,193,693]. Therefore, these positions show an identical patternof amino acid substitutions when amino acid sequences of thesehomologous proteins are aligned. Based on this principle, coordinatedsubstitutions have been searched for in three additional proteinfamilies: serine proteases, cysteine proteases and the haemoglobins.Coordinated changes have been found in all three protein familiesmostly within structurally constrained regions. This methodworks with a varying degree of success depending on the functionof the proteins, the range of sequence similarities and thenumber of sequences considered. By relaxing the criteria forresidue selection, the method was adapted to cover a broaderrange of protein families and to study regions of the proteinshaving weaker structural constraints. The information derivedby these methods provides a general guide for engineering ofa large variety of proteins to analyse structure–functionrelationships.  相似文献   

4.
Combinatorial libraries of synthetic DNA are increasingly being used to identify and evolve proteins with novel folds and functions. An effective strategy for maximizing the diversity of these libraries relies on the assembly of large genes from smaller fragments of synthetic DNA. To optimize library assembly and screening, it is desirable to remove from the synthetic libraries any sequences that contain unintended frameshifts or stop codons. Although genetic selection systems can be used to accomplish this task, the tendency of individual segments to yield misfolded or aggregated products can decrease the effectiveness of these selections. Furthermore, individual protein domains may misfold when removed from their native context. We report the development and characterization of an in vivo system to preselect sequences that encode uninterrupted gene segments regardless of the foldedness of the encoded polypeptide. In this system, the inserted synthetic gene segment is separated from an intein/thymidylate synthase (TS) reporter domain by a polyasparagine linker, thereby permitting the TS reporter to fold and function independently of the folding and function of the segment-encoded polypeptide. TS-deficient Escherichia coli host cells survive on selective medium only if the insert is uninterrupted and in-frame, thereby allowing selection and amplification of desired sequences. We demonstrate that this system can be used as a highly effective preselection tool for the production of large, diverse and high-quality libraries of de novo protein sequences.  相似文献   

5.
Cassette mutagenesis is a method of protein engineering whichgenerates a wide diversity of genetic variants that can be subjectedto either selection or screening. As long as the target sequenceto be modified is kept short (corresponding to four to six aminoacids), complete combinatorial libraries can be produced. Amajor problem arises when longer peptides are to be engineeredfor desired functions. In such situations the production ofa limited collection of variants can be helpful; thus, biasedrandom mutagenesis and ‘doping schemes’ have beenreported previously. Here we describe a computer algorithm thatenables the determination of the degree of phosphoramidite contaminationof nucleotide precursor reservoirs. Through simulation of biologicaltranslation, the algorithm allows the prediction of the effectof contamination levels on the number of mutations to occurfor any given peptide sequence. In this study the cholinergicbinding site was used as a model sequence (22 amino acids).Considerations, based on the computer program, are discussedregarding the efficient design of phage-display combinatoriallibraries.  相似文献   

6.
A new efficient in vitro mutagenesis method for the generationof complete random mutant libraries, containing all possiblesingle base substitution mutations in a cloned gene is described.The method is based on controlled use of polymerases. Four populationsof DNA molecules are first generated by primer elongation sothat they terminate randomly, but always just before a knowntype of base (before A, C, G or T respectively). Each of thefour populations is then mutagenized in a separate misincorporationreaction, where the correct base can now be omitted. The regenerationof wild-type sequences can thus be efficiently avoided. Also,the misincorporating nucleotide concentrations can be optimizedto give the three possible single mutations in close to equalratio. The mutagenesis can be precisely localized within a predeterminedtarget region of any size, and vector sequences remain intact.We have mutagenized the DNA coding for the -fragment of Escherichiacoli ß-galactosidase, and identified 176 differentbase substitution mutations by sequencing. The present methodgives mutant yields of 40–60%, when the mutants containabout one amino acid change per protein molecule. All typesof base substitution mutations can be generated and deletionsare rare. The efficiency of this method permits the use of relativelyelaborate screening systems to isolate mutants of either structuralgenes or regulatory regions.  相似文献   

7.
The sequences of four--helical bundle proteins are characterizedby a pattern of hydrophilic and hydrophobic amino acids whichis repeated every seven residues. At each position of the heptadrepeat there are specific constraints on the amino acid propertieswhich result from the topology of the tertiary motif. Theseconstraints give rise to patterns of amino acid distributionwhich are distinct from those of other proteins. The distributionsin each of the heptad positions have been determined by a statisticalanalysis of structural and sequence data derived from sevenfamines of aligned protein sequences. The constitution of eachposition is dominated by a very small number of different aminoacids, with the core positions consisting overwhelmingly ofLeu and Ala. The positional preferences of the individual aminoacids can be generally interpreted in terms of residue propertiesand topological constraints. The potential for four-a-helixbundle folding is reflected primarily in the pattern of residueoccurrence in the heptad and not in the overall amino acid compositionof the protein. Possible applications of this analysis in structurepredictions, sequence alignments and in the rational designand engineering of four-a-helkal bundle proteins are discussed.  相似文献   

8.
An amino acid index is a set of 20 numerical values representingany of the different physicochemical and biochemical propertiesof amino adds. As a follow-up to the previous study, we haveincreased the size of the database, which currently contains402 published indices, and re-performed the single-linkage clusteranalysis. The results basically confirmed the previous findings.Another important feature of amino acids that can be representednumerically is the similarity between them. Thus, a similaritymatrix, also called a mutation matrix, is a set of 20x20 numericalvalues used for protein sequence alignments and similarity searches.We have collected 42 published matrices, performed hierarchicalcluster analyses and identified several clusters correspondingto the nature of the data set and the method used for constructingthe mutation matrix. Further, we have tried to reproduce eachmutation matrix by the combination of amino acid indices inorder to understand which properties of amino acids are reflectedmost. There was a relationship between the PAM units of Dayhoff'smutation matrix and the volume and hydrophobicity of amino adds.The database of 402 amino acid indices and 42 amino acid mutationmatrices is made publicly available on the Internet.  相似文献   

9.
Amino acid composition of protein termini are biased in different manners   总被引:1,自引:0,他引:1  
An exhaustive statistical analysis of the amino acid sequencesat the carboxyl (C) and amino (N) termini of proteins and ofcoding nucleic acid sequences at the 5' side of the stop codonswas undertaken. At the N ends, Met and Ala residues are over-representedat the first (+1) position whereas at positions 2 and 5 Thris preferred. These peculiarities at N-termini are most probablyrelated to the mechanism of initiation of translation (for Met)and to the mechanisms governing the life-span of proteins viaregulation of their degradation (for Ala and Thr). We assumethat the C-terminal bias facilitates fixation of the C endson the protein globule by a preference for charged and Cys residues.The terminal biases, a novel feature of protein structure, haveto be taken into account when molecular evolution, three-dimensionalstructure, initiation and termination of translation, proteinfolding and life-span are concerned. In addition, the bias ofprotein termini composition is an important feature which shouldbe considered in protein engineering experiments.  相似文献   

10.
The instabilities of the native structures of mutant proteinswith an amino acid exchange are estimated by using the contactenergy and the number of contacts for each type of amino acidpair, which were estimated from 18 192 residue–residuecontacts observed in 42 crystals of globular proteins. Theywere then used to evaluate a transition probability matrix ofcodon substitutions and a log relatedness odds matrix, whichis used as a scoring matrix to measure the similarity betweenprotein sequences. To consider amino acid substitutions in homologousproteins, base mutation rates and the effects of the geneticcode are also taken into account. The average fitness of anamino acid exchange is approximated to be proportional to thestructural stability of the mutant protein, which is then approximatedby the average energy change of the protein native structureexpected for the ammo acid exchange with neglect of the energychange of the denatured state. In global and local homologysearches, this scoring matrix tends to yield significantly higheralignment scores than either the unitary matrix or the geneticcode matrix, and also may yield higher alignment scores fordistantly related protein pairs than MDM78. One of advantagesof this scoring matrix is that the equilibrium frequencies ofcodons and also base mutation rates can be adjusted.  相似文献   

11.
The automatic identification of motifs associated with a givenfunction is an important challenge for molecular sequence analysis.A method is presented for the extraction of such patterns fromlarge sets of unaligned sequences with related but general function,for example, a set of heat shock proteins. In such a set ofproteins there can often be several subfamilies each characterizedby one or more distinct motifs. The aim is to develop computationaltools to identify these motifs. The algorithm presented locateshigh frequency words of length k with a given number of positions,r, fixed. Statistics for a binomial distribution are used toassess the significance of the words. The high-frequency wordsare clustered and highly populated clusters retained. The compositionof the clusters is displayed graphically. A set of motifs associatedwith the sequence family can automatically be extracted. Themethod is benchmarked on a set of 106 heat shock sequences anda set of 257 toxin sequences. It is shown to recover previouslyidentified motifs.  相似文献   

12.
Haloalkane dehalogenases catalyse environmentally importantdehalogenation reactions. These microbial enzymes representobjects of interest for protein engineering studies, attemptingto improve their catalytic efficiency or broaden their substratespecificity towards environmental pollutants. This paper presentsthe results of a comparative study of haloalkane dehalogenasesoriginating from different organisms. Protein sequences andthe models of tertiary structures of haloalkane dehalogenaseswere compared to investigate the protein fold, reaction mechanismand substrate specificity of these enzymes. Haloalkane dehalogenasescontain the structural motifs of /ß-hydrolases and epoxidaseswithin their sequences. They contain a catalytic triad withtwo different topological arrangements. The presence of a structurallyconserved oxyanion hole suggests the two-step reaction mechanismpreviously described for haloalkane dehalogenase from Xanthobacterautotrophicus GJ10. The differences in substrate specificityof haloalkane dehalogenases originating from different speciesmight be related to the size and geometry of an active siteand its entrance and the efficiency of the transition stateand halide ion stabilization by active site residues. Structurallyconserved motifs identified within the sequences can be usedfor the design of specific primers for the experimental screeningof haloalkane dehalogenases. Those amino acids which were predictedto be functionally important represent possible targets forfuture site-directed mutagenesis experiments.  相似文献   

13.
The evaluation of calculated protein structures is an importantstep in the protein design cycle. Known criteria for this assessmentof proteins are the polar and apolar, accessible and buriedsurface area, electrostatic interactions and other interactionsbetween the protein atoms (e.g. HO, S-S),atomic packing, analysisof amino acid environment and surface charge distribution. Weshow that a powerful test of accuracy of protein structure canbe derived by analysing the water contact of atoms and additionallytaking into account their polarity. On the basis of estimatedreference values of the polar fraction of typical globular proteinswith known structure (mean, SD and distribution), the evaluationof misfolded structures can be improved significantly. The referencevalues are derived by moving windows of different length (3–99amino acid residues) over the amino acid sequence. Model proteins,which are included in the Brookhaven protein structure databank,deliberately misfolded proteins, hypothetical proteins and predictedprotein structures are diagnosed as at least partially incorrectlyfolded. The local fault, mostly observed, is that polar groupsare buried too frequently in the interior of the protein. Thedatabase-derived quantities are useful in screening the designedproteins prior to experimentation and may also be useful inthe assessment of errors in the experimentally determined proteinstructures.  相似文献   

14.
Site-directed protein recombination as a shortest-path problem   总被引:2,自引:0,他引:2  
Protein function can be tuned using laboratory evolution, in which one rapidly searches through a library of proteins for the properties of interest. In site-directed recombination, n crossovers are chosen in an alignment of p parents to define a set of p(n + 1) peptide fragments. These fragments are then assembled combinatorially to create a library of p(n+1) proteins. We have developed a computational algorithm to enrich these libraries in folded proteins while maintaining an appropriate level of diversity for evolution. For a given set of parents, our algorithm selects crossovers that minimize the average energy of the library, subject to constraints on the length of each fragment. This problem is equivalent to finding the shortest path between nodes in a network, for which the global minimum can be found efficiently. Our algorithm has a running time of O(N(3)p(2) + N(2)n) for a protein of length N. Adjusting the constraints on fragment length generates a set of optimized libraries with varying degrees of diversity. By comparing these optima for different sets of parents, we rapidly determine which parents yield the lowest energy libraries.  相似文献   

15.
A computer program, which runs on MS-DOS personal computers,is described that assists in the design of synthetic genes codingfor proteins. The goal of the program is the design of a genewhich (0 contains as many unique restriction sites as possibleand (ii) uses a specific codon usage. The gene designed accordingto the criteria above is (i) suitable for ‘modular mutagenesis’experiments and (ii) optimized for expression. The program 'reverse-translates'protein sequences into degenerated DNA sequences, generatesa map of potential restriction sites and locates sequence positionswhere unique restriction sites can be accommodated. The nucleicacid sequence is then ‘refined’ according to a specificcodon usage to remove any degeneration. Unique restriction sites,if potentially present, can be ‘forced’ into thedegenerated nucleic acid sequence by using 'priority codes'assigned to different restriction sequences.  相似文献   

16.
Sequence weighting techniques are aimed at balancing redundantobserved information from subsets of similar sequences in multiplealignments. Traditional approaches apply the same weight toall positions of a given sequence, hence equal efficiency ofphylogenetic changes is assumed along the whole sequence. Thisrestrictive assumption is not required for the new method PSIC(position-specific independent counts) described in this paper.The number of independent observations (counts) of an aminoacid type at a given alignment position is calculated from theoverall similarity of the sequences that share the amino acidtype at this position with the help of statistical concepts.This approach allows the fast computation of position-specificsequence weights even for alignments containing hundreds ofsequences. The PSIC approach has been applied to profile extractionand to the fold family assignment of protein sequences withknown structures. Our method was shown to be very productivein finding distantly related sequences and more powerful thanHidden Markov Models or the profile methods in WiseTools andPSI-BLAST in many cases. The profile extraction routine is availableon the WWW (http://www.bork.embl-heidelberg.de/PSIC or http://www.imb.ac.ru/PSIC).  相似文献   

17.
In search of the ideal protein sequence   总被引:1,自引:0,他引:1  
The inverse of a folding problem is to find the ideal sequencethat folds into a particular protein structure. This problemhas been addressed using the topology fingerprintbased threadingalgorithm, capable of calculating a score (energy) of an arbitrarysequence-structure pair. At first, the search is conducted byunconstrained minimization of the energy in sequence space.It is shown that using energy as the only design criterion leadsto spurious solutions with incorrect amino acid composition.The problem lies in the general features of the protein energysurface as a function of both structure and sequence. The proposedsolution is to design the sequence by maximizing the differencebetween its energy in the desired structure and in other knownprotein structures. Depending on the size of the database ofstructures ‘to avoid’, sequences bearing significantsimilarity to the native sequence of the target protein areobtained using this procedure.  相似文献   

18.
Consensus engineering has been used to increase the stability of a number of different proteins, either by creating consensus proteins from scratch or by modifying existing proteins so that their sequences more closely match a consensus sequence. In this paper we describe the first application of consensus engineering to the ab initio creation of a novel fluorescent protein. This was based on the alignment of 31 fluorescent proteins with >62% homology to monomeric Azami green (mAG) protein, and used the sequence of mAG to guide amino acid selection at positions of ambiguity. This consensus green protein is extremely well expressed, monomeric and fluorescent with red shifted absorption and emission characteristics compared to mAG. Although slightly less stable than mAG, it is better expressed and brighter under the excitation conditions typically used in single molecule fluorescence spectroscopy or confocal microscopy. This study illustrates the power of consensus engineering to create stable proteins using the subtle information embedded in the alignment of similar proteins and shows that the benefits of this approach may extend beyond stability.  相似文献   

19.
We tested whether it is possible to alter the substrate specificity of cholesterol oxidase for similarly sized sterols, i.e. cholesterol, beta-sitosterol and stigmasterol. Using existing X-ray crystal structures, we made a model of the predicted Michaelis complex of cholesterol and cholesterol oxidase. Based on this model, we identified five residues that are in direct contact with the steroid tail, Met58, Leu82, Val85, Met365 and Phe433. We prepared seven mutant libraries that contained the codon NYS (N = A, C, G, T; Y = C, Y; S = C, G) at one, two or three of the targeted positions by cassette mutagenesis. The libraries were screened for catalytic activity against three different sterols under k(cat)(*)/K(m)(*) conditions with 25 mol% sterol/DOPC unilamellar vesicles. The results of our screens suggest that specific packing interactions are not realized in the transition state of binding and that loss of active site water may be the predominant source of binding energy.  相似文献   

20.
Construction of stabilized proteins by combinatorial consensus mutagenesis   总被引:4,自引:0,他引:4  
We constructed stabilized variants of beta-lactamase (BLA) from Enterobacter cloacae by combinatorial recruitment of consensus mutations. By aligning the sequences of 38 BLA homologs, we identified 29 positions where the E.cloacae gene differs from the consensus sequence of lactamases and constructed combinatorial libraries using mixtures of mutagenic oligonucleotides encompassing all 29 positions. Screening of 90 random isolates from these libraries identified 15 variants with significantly increased thermostability. The stability of these isolates suggest that all tested mutations make additive contributions to protein stability. A statistical analysis of sequence and stability data identified 11 mutations that made stabilizing contributions and eight mutations that destabilized the protein. A second-generation library recombining these 11 stabilizing mutations led to the identification of BLA variants that showed further stabilization. The most stable variant had a mid-point of thermal denaturation (Tm) that was 9.1 degrees C higher than the starting molecule and contained eight consensus mutations. Incubation of three stabilized BLA variants with several proteases showed that all tested isolates have significantly increased resistance to proteolysis. Our data demonstrate that combinatorial consensus mutagenesis (CCM) allows the rapid generation of protein variants with improved thermal and proteolytic stability.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号