首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
We introduce a completely automatic and objective procedurefor the comparison of protein structures. A genetic algorithmis used to search for a near optimal solution of the rigid-bodysuperposition of two whole protein structures. The specificationof an initial set of equivalences is not required. Topologicalequivalences in the final structural alignment are defined bya conventional dynamic programming routine, which is commonlyused to compare protein sequences. A least-squares fitting algorithmis then used to optimize the fit between the final set of equivalences.We have applied our method to the comparison of ribonucleicacid structures, as well as protein structures. The structuralalignments are generally consistent with those previously published.In fact, on most occasions our method defines at least the samenumber of topological equivalences as other procedures, butalways with a lower r.m.s. distance between them.  相似文献   

3.
Restriction enzymes (REases) are commercial reagents commonly used in DNA manipulations and mapping. They are regarded as very attractive models for studying protein-DNA interactions and valuable targets for protein engineering. Their amino acid sequences usually show no similarities to other proteins, with rare exceptions of other REases that recognize identical or very similar sequences. Hence, they are extremely hard targets for structure prediction and modeling. NlaIV is a Type II REase, which recognizes the interrupted palindromic sequence GGNNCC (where N indicates any base) and cleaves it in the middle, leaving blunt ends. NlaIV shows no sequence similarity to other proteins and virtually nothing is known about its sequence-structure-function relationships. Using protein fold recognition, we identified a remote relationship between NlaIV and EcoRV, an extensively studied REase, which recognizes the GATATC sequence and whose crystal structure has been determined. Using the 'FRankenstein's monster' approach we constructed a comparative model of NlaIV based on the EcoRV template and used it to predict the catalytic and DNA-binding residues. The model was validated by site-directed mutagenesis and analysis of the activity of the mutants in vivo and in vitro as well as structural characterization of the wild-type enzyme and two mutants by circular dichroism spectroscopy. The structural model of the NlaIV-DNA complex suggests regions of the protein sequence that may interact with the 'non-specific' bases of the target and thus it provides insight into the evolution of sequence specificity in restriction enzymes and may help engineer REases with novel specificities. Before this analysis was carried out, neither the three-dimensional fold of NlaIV, its evolutionary relationships or its catalytic or DNA-binding residues were known. Hence our analysis may be regarded as a paradigm for studies aiming at reducing 'white spaces' on the evolutionary landscape of sequence-function relationships by combining bioinformatics with simple experimental assays.  相似文献   

4.
Predictions of protein secondary structure using current methodsare often unrealistic, i.e. the predicted -helices or ß-strandsare too short. To improve the realism, various heuristic ‘filtering’or ‘smoothing’ methods are used. They are more orless intuitive and are based on ad hoc corrections. We presenta regularization method to obtain a realistic secondary structurefrom predicted propensities. It is based on the known dynamicprogramming algorithm and is quite objective. It can be usedwith any prediction method which yields propensities. The regularizedpredictions conserve well the overall prediction accuracy andimprove the ‘protein-likeness’ of the prediction.  相似文献   

5.
In recent protein structure prediction research there has beena great deal of interest in using amino acid interaction preferences(e.g. contact potentials or potentials of mean force) to align(‘thread’) a protein sequence to a known structuralmotif. An important open question is whether a polynomial timealgorithm for finding the globally optimal threading is possible.We identify the two critical conditions governing this question:(i) variable-length gaps are admitted into the alignment, and(ii) interactions between amino acids from the sequence areadmitted into the score function. We prove that if both theseconditions are allowed then the protein threading decision problem(does there exist a threading with a score K?) is NP-complete(in the strong sense, i.e. is not merely a number problem) andthe related problem of finding the globally optimal proteinthreading is NP-hard. Therefore, no polynomial time algorithmis possible (unless P = NP). This result augments existing proofsthat the direct protein folding problem is NP-complete by providingthe corresponding proof for the ‘inverse’ proteinfolding problem. It provides a theoretical basis for understandingalgorithms currently in use and indicates that computationalstrategies from other NP-complete problems may be useful forpredictive algorithms.  相似文献   

6.
We performed a systematic exploration of the use of structural information derived from small angle X-ray scattering (SAXS) measurements to improve fold recognition. SAXS data provide the Fourier transform of the histogram of atomic pair distances (pair distribution function) for a given protein and hence can serve as a structural constraint on methods used to determine the native conformational fold of the protein. Here we used it to construct a similarity-based fitness score with which to evaluate candidate structures generated by a threading procedure. In order to combine the SAXS scores with the standard energy scores and other 1D profile-based scores used in threading, we made use both of a linear regression method and of a neural network-based technique to obtain optimal combined fitness scores and applied them to the ranking of candidate structures. Our results show that the use of SAXS data with gapless threading significantly improves the performance of fold recognition.  相似文献   

7.
With the advance of modern molecular biology it has become increasingly clear that few cellular processes are unaffected by protein phosphorylation. Therefore, computational identification of phosphorylation sites is very helpful to accelerate the functional understanding of huge available protein sequences obtained from genomic and proteomic studies. Using a genetic algorithm integrated neural network (GANN), a new bioinformatics method named GANNPhos has been developed to predict phosphorylation sites in proteins. Aided by a genetic algorithm to optimize the weight values within the network, GANNPhos has demonstrated a high accuracy of 81.1, 76.7 and 73.3% in predicting phosphorylated S, T and Y sites, respectively. When benchmarked against Back-Propagation neural network and Support Vector Machine algorithms, GANNPhos gives better performance, suggesting the GANN program can be used for other prediction tasks in the field of protein bioinformatics.  相似文献   

8.
A genetic algorithm (GA) combined with a tabu search (TA) hasbeen applied as a minimization method to rake the appropriateassociated sites for some biomolecular systems. In our dockingprocedure, surface complementarity and energetic complementarityof a ligand with its receptor have been considered separatelyin a two-stage docking method. The first stage was to find aset of potential associated sites mainly based on surface complementarityusing a genetic algorithm combined with a tabu search. Thisstep corresponds with the process of finding the potential bindingsites where pharmacophores will bind. In the second stage, severalhundreds of GA minimization steps were performed for each associatedsite derived from the first stage mainly based on the energeticcomplementarity. After calculations for both of the two stages,we can offer several solutions of associated sites for everycomplex. In this paper, seven biomolecular systems, includingfive bound complexes and two unbound complexes, were chosenfrom the Protein Data Bank (PDB) to test our method. The calculatedresults were very encouraging—the hybrid minimizationalgorithm successfully reaches the correct solutions near thebest binded modes for these protein complexes. The docking resultsnot only predict the bound complexes very well, but also geta relatively accurate complexed conformation for unbound systems.For the five bound complexes, the results show that surfacecomplementarity is enough to find the precise binding modes,the top solution from the tabu list generally corresponds tothe correct binding mode. For the two unbound complexes, dueto the conformational changes upon binding, it seems more difficultto get their correct binding conformations. The predicted resultsshow that the correct binding mode also corresponds to a relativelylarge surface complementarity score. In these two test cases,the correct solution can be found in the top several solutionsfrom the tabu list. For unbound complexes, the interaction energyfrom energetic complementarity is very important, it can beused to filter these solutions from the surface complementarity.After the evaluation of the energetic complementarity, the conformationsand orientations close to the crystallographically determinedstructures are resolved. In most cases, the smallest root meansquare distance (r.m.s.d.) from the GA combined with TA solutionsis in a relatively small region. Our program of automatic dockingis really a universal one among the procedures used for thetheoretical study of molecular recognition.  相似文献   

9.
A methodology is proposed to solve a difficult modeling problemrelated to the recently sequenced P39 protein. This sequenceshares no similarity with any known 3D structure, but a foldis proposed by several threading tools. The difficulty in aligningthe target sequence on one of the proposed template structuresis overcome by combining the results of several available predictionmethods and by refining a rational consensus between them. Insilico validation of the obtained model and a preliminary cross-checkwith experimental features allow us to state that this borderlineprediction is at least reasonable. This model raises relevanthypotheses on the main structural features of the protein andallows the design of site-directed mutations. Knowing the geneticcontext of the P39 reading frame, we are now able to suggesta function for the P39 protein: it would act as a periplasmicsubstrate-binding protein.  相似文献   

10.
An automatic algorithm for defining topological equivalencesin protein structures is presented. The algorithm is based ona dynamic programming technique and self-consistent scoringmethod. We have used it to align pairs of similar protein structuresof several protein families and to identify recurrent structuraldomains in aspartic proteinase 2APR. Its ability to find suboptimalpaths permits a thorough comparison of proteins at each levelin the hierarchy of the protein structure: secondary structure,super-secondary structure, domain and entire globular structure.The algorithm has been extended to the structure alignment ofribonucleic acid and can be extended to the structure alignmentof any linear polymer.  相似文献   

11.
We present a statistical analysis of protein structures basedon interatomic Ca distances. The overall distance distributionsreflect in detail the contents of sequence-specific substructuresmaintained by local interactions (such as -helixes) and longerrange interactions (such as disulfide bridges and ß-sheets).We also show that a volume scaling of the distances makes distancedistributions for protein chains of different length superimposable.Distance distributions were also calculated specifically foramino acids separated by a given number of residues. Specificfeatures in these distributions are visible for sequence separationsof up to 20 amino acid residues. A simple representation, whichpreserves most of the information in the distance distributions,was obtained using six parameters only. The parameters giverise to canonical distance intervals and when predicting coarse-graineddistance constraints by methods such as data-driven artificialneural networks, these should preferably be selected from theseintervals. We discuss the use of the six parameters for determiningor reconstructing 3-D protein structures.  相似文献   

12.
A major problem in predicting protein structure by homologymodelling is that the sequence alignment from which the modelis built may not be the best one in terms of the correct equivalencingof residues assessed by structural or functional criteria. Auseful strategy is to generate and examine a number of suboptimalalignments as better alignments can often be found away fromthe optimal. A procedure to filter rapidly suboptimal alignmentsbased on measurement of core volumes and packing pair potentialsis investigated. The approach is benchmarked on three pairsof sequences which are non-trivial to align correctly, namelytwo immunoglobulin domains, plastocyanin with azurin and twodistant globin sequences. It is shown to be useful to reducea large ensemble of possible alignments down to a few whichcorrespond more closely to the correct (structure based) alignment.  相似文献   

13.
In search of the ideal protein sequence   总被引:1,自引:0,他引:1  
The inverse of a folding problem is to find the ideal sequencethat folds into a particular protein structure. This problemhas been addressed using the topology fingerprintbased threadingalgorithm, capable of calculating a score (energy) of an arbitrarysequence-structure pair. At first, the search is conducted byunconstrained minimization of the energy in sequence space.It is shown that using energy as the only design criterion leadsto spurious solutions with incorrect amino acid composition.The problem lies in the general features of the protein energysurface as a function of both structure and sequence. The proposedsolution is to design the sequence by maximizing the differencebetween its energy in the desired structure and in other knownprotein structures. Depending on the size of the database ofstructures ‘to avoid’, sequences bearing significantsimilarity to the native sequence of the target protein areobtained using this procedure.  相似文献   

14.
In the course of molecular modeling or mutant prediction oneoften wants quick answers to questions such as: ‘Are thereany residues in a beta-strand that point into an internal cavity,and are highly mutable?;’ ‘Are there large polarresidues in a helix that make a contact with a hydrophobic residuein a sheet, and don't make the maximal number of hydrogen bonds?’or ‘Which hydrophobic residues are in a helix with a largehydrophobic moment, and make a contact with a co-factor, butat the same time still have a large accessible surface?’.I describe here a method to get answers to these kinds of questionsin a very quick and easy manner. The method described is partlybased on the principles used in the design of relational databases,and its mode of operation is similar to the query methods usedin a relational database environment. Although designed foraiding in molecular modeling, its applicability is much moregeneral. The method has been implemented as part of a largemolecular modeling package which copes with the numerous problemsin systematic handling of protein structures, e.g. residue numbering.This also implies that many normal tools such as graphical analyses,I/O facilities, etc. are available on-line.  相似文献   

15.
Src homology 2 (SH2) domains are small protein modules of -100amino acids that are found in many proteins involved in intracellularsignal transduction. They mediate protein-protein interactionsand modulate enzyme activity by their ability to bind to specificsequence patterns that contain a phosphorylated tyrosine. Asthe three-dimensional structures of the phosphatidylinositol(PI) 3-kinase, Lck, Src and Abl SH2 domains have been shownto be similar, we have modelled other SH2 domains that showdistinct sequence specificity to allow comparative analysisof SH2-phosphopeptide interactions. The SH2 domains of PLC-Nterm.,Nck, Grb2, GAP and Abl have been model-built with high-affinityphosphopeptides fitted into the putative binding sites. Foreach SH2 domain a detailed analysis of the peptide-protein interactionwas performed. It is apparent that specificity is mainly conferredby three to five residues downstream from the phosphotyrosineresidue (Y*), especially, although not exclusively, peptideposition Y* + 3. The SH2 pocket that binds the Y* + 3 residueis mainly composed of three sections: part of strand (ßEgoing into loop EF, part of B and loop BG. The residues thatconstitute the Y* +3 binding pocket show variability that seemsto determine which amino acid binds preferentially. Residueposition ßE4 seems to play a vital role in the SH2specificity. This study shows that the development of modellingprotocols for SH2 domains whose structure has not been determinedcan prove very useful in predicting which residues are involvedin conferring the affinity and binding specificity of thesedomains towards distinct phosphotyrosine-containing sequences.  相似文献   

16.
Accurate assignments of secondary structures in proteins arecrucial for a useful comparison with theoretical predictions.Three major programs which automatically determine the locationof helices and strands are used for this purpose, namely DSSP,P-Curve and Define. Their results have been compared for a non-redundantdatabase of 154 proteins. On a residue per residue basis, thepercentage match score is only 63% between the three methods.While these methods agree on the overall number of residuesin each of the three states (helix, strand or coil), they differon the number of helices or strands, thus implying a wide discrepancyin the length of assigned structural elements. Moreover, thelength distribution of helices and strands points to the existenceof artefacts inherent to each assignment algorithm. To overcomethese difficulties a consensus assignment is proposed whereeach residue is assigned to the state determined by at leasttwo of the three methods. With this assignment the artefactsof each algorithm are attenuated. The residues assigned in thesame state by the three methods are better predicted than theothers. This assignment will thus be useful for analysing thesuccess rate of prediction methods more accurately.  相似文献   

17.
Relatively little has been known about the structure of alpha-helical membrane proteins, since until recently few structures had been crystallized. These limited data have restricted structural analyses to the prediction of secondary structure, rather than tertiary folds. In order to address this, this paper describes an analysis of the 23 available membrane protein structures. A number of findings are made that are of particular relevance to transmembrane helix packing: (1) on average lipid-tail-accessible transmembrane residues are significantly more hydrophobic, less conserved and contain different residue types to buried residues; (2) charged residues are not always buried and, when accessible to membrane lipid tails, few are paired with another charge and instead they often interact with phospholipid head-groups or with other residue types; (3) a significant proportion of lipid-tail-accessible charged and polar residues form hydrogen bonds only with residues one turn away in the same helix (intra-helix); (4) pore-lining residues are usually hydrophobic and it is difficult to distinguish them from buried residues in terms of either residue type or conservation; and (5) information was gained about the proportion of helices that tend to contribute to lining a pore and the resulting pore diameter. These findings are discussed with relevance to the prediction of membrane protein 3D structure.  相似文献   

18.
The EcoRV DNA methyltransferase (M·EcoRV) is an -adeninemethyltransferase. We have used two different programs to predictthe secondary structure of M·EcoRV. The resulting consensusprediction was tested by a mutant profiling analysis. 29 neutralmutations of M·EcoRV were generated by five cycles ofrandom mutagenesis and selection for active variants to increasethe reliability of the prediction and to get a secondary structureprediction for some ambiguously predicted regions. The predictedconsensus secondary structure elements could be aligned to thecommon topology of the structures of the catalytic domains ofM·HhaI and M·TaqI. In a complementary approachwe have isolated nine catalytically inactive single mutants.Five of these mutants contain an amino acid exchange withinthe catalytic domain of M·EcoRV (Val20-Ala, Lys81Arg,Cys192Arg, Asp193Gly, Trp231Arg). The Trp231Arg mutant bindsDNA similarly to wild-type M·EcoRV, but is catalyticallyinactive. Hence this mutant behaves like a bona fide activesite mutant. According to the structure prediction, Trp231 islocated in a loop at the putative active site of M·EcoRV.The other inactive mutants were insoluble. They contain aminoacid exchanges within the conserved amino acid motifs X, IIIor IV in M·EcoRV confirming the importance of these regions.  相似文献   

19.
Sequence weighting techniques are aimed at balancing redundantobserved information from subsets of similar sequences in multiplealignments. Traditional approaches apply the same weight toall positions of a given sequence, hence equal efficiency ofphylogenetic changes is assumed along the whole sequence. Thisrestrictive assumption is not required for the new method PSIC(position-specific independent counts) described in this paper.The number of independent observations (counts) of an aminoacid type at a given alignment position is calculated from theoverall similarity of the sequences that share the amino acidtype at this position with the help of statistical concepts.This approach allows the fast computation of position-specificsequence weights even for alignments containing hundreds ofsequences. The PSIC approach has been applied to profile extractionand to the fold family assignment of protein sequences withknown structures. Our method was shown to be very productivein finding distantly related sequences and more powerful thanHidden Markov Models or the profile methods in WiseTools andPSI-BLAST in many cases. The profile extraction routine is availableon the WWW (http://www.bork.embl-heidelberg.de/PSIC or http://www.imb.ac.ru/PSIC).  相似文献   

20.
An algorithm for predicting protein /ß-sheet topologiesfrom secondary structure and topological folding rules (constraints)has been developed and implemented in Prolog. This algorithm(CBS1) is based on constraint satisfaction and employs forwardpruned breadth-first search and rotational invariance. CBS1showed a 37-fold increase in efficiency over an exhaustive generateand test algorithm giving the same solution for a typical sheetof five strands whose topology was predicted from secondarystructure with four topological folding constraints. Prologspecifications of a range of putative protein folding ruleswere then used to (i) replicate published protein topology predictionsand (ii) validate these rules against known protein structuresof nucleotide-binding domains. This demonstrated that (i) manualtechniques for topology prediction can lead to non-exhaustivesearch and (ii) most of these protein folding principles wereviolated by specific proteins. Various extensions to the algorithmare discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号