首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The success rates reported for secondary structural class prediction with different methods are contradictory. On one side, the problem of recognizing the secondary structural class of a protein knowing only its amino acid composition appears completely solved by simply applying jury decision with an elliptically scaled distance function. Chou and coworkers repeatedly (see Crit. Rev. Biochem. Mol. Biol. 30:275-349, 1995) published prediction accuracies near 100%. On the other hand, traditional secondary structure prediction techniques achieve success rates of about 70% for the secondary structural state per residue and about 75% for structural class only with extensive input information (full sequence of the query protein, its amino acid composition and length, multiple alignments with homologous sequences). In this article, we resolve the paradox and consider (1) the question of the secondary structural class definition, (2) the role of the representativity of the test set of protein tertiary structure for the current state of the Protein Data Bank (PDB); and (3) we estimate the real impact of amino acid composition on secondary structural class. We formulate three objective criteria for a reasonable definition of secondary structural classes and show that only the criterion of Nakashima et al. (J. Biochem. 99:153-162, 1986) complies with all of them. Only this definition matches the distribution of secondary structural content in representative PDB subsets, whereas other criteria leave many proteins (up to 65% of all PDB entries) simply unassigned. We review critically specialized secondary-structural class prediction methods, especially those of Chou and coworkers, which claim almost 100% accuracy using only amino acid composition, and resolve the paradox that these prediction accuracies are better than those from secondary structure predictions from multiple alignments. We show (i) that these techniques rely on a preselection of test sets which removes irregular proteins and other proteins without any class assignment (about 35% of all PDB entries); and (ii) that even for preselected representative test sets, the success rate drops to 60% and lower for a 4-type classification (alpha, beta, alpha + beta, alpha/beta). The prediction accuracies fall to about 50% if the secondary structural class definition of Nakashima et al. is applied and only few irregular proteins are preselected and removed from automatically generated, representative subsets of the PDB. We have applied two new vector decomposition methods for secondary structural content prediction from amino acid composition alone, with and without consideration of amino acid compositional coupling in the learning set of tertiary structures respectively, to the problem of class prediction and achieve about 60% correct assignment among four classes (alpha, beta, mixed, irregular) as well as single sequence-based secondary structure prediction methods like GORIII and COMBI. Our results demonstrate that 60% correctness is the upper limit for a 4-type class prediction from amino acid composition alone for an unknown query protein and that consideration of compositional coupling does not improve the prediction success. The prediction program SSCP offering secondary structural class assignment for query compositions and sequences has been made available as a World Wide Web and E-mail service.  相似文献   

2.
MOTIVATION: Evolutionary models of amino acid sequences can be adapted to incorporate structure information; protein structure biologists can use phylogenetic relationships among species to improve prediction accuracy. Results : A computer program called PASSML ('Phylogeny and Secondary Structure using Maximum Likelihood') has been developed to implement an evolutionary model that combines protein secondary structure and amino acid replacement. The model is related to that of Dayhoff and co-workers, but we distinguish eight categories of structural environment: alpha helix, beta sheet, turn and coil, each further classified according to solvent accessibility, i.e. buried or exposed. The model of sequence evolution for each of the eight categories is a Markov process with discrete states in continuous time, and the organization of structure along protein sequences is described by a hidden Markov model. This paper describes the PASSML software and illustrates how it allows both the reconstruction of phylogenies and prediction of secondary structure from aligned amino acid sequences. AVAILABILITY: PASSML 'ANSI C' source code and the example data sets described here are available at http://ng-dec1.gen.cam.ac.uk/hmm/Passml.html and 'downstream' Web pages. CONTACT: P.Lio@gen.cam.ac.uk  相似文献   

3.
A quantitative procedure is described for the comparison of secondary structure of homologous proteins. Standard predictive methods are used to generate probability profiles from pairs of homologous amino acid sequences; correlation coefficients (R) are then computed between each pair of amino acids for alpha-helix (R alpha), extended structure (R beta), turn (R(t)), and coil (R(c)). R values are >0.2 for correctly aligned homologous sequences. Unrelated or incorrectly aligned sequences give R values near zero. Lack of correlation for a segment of otherwise well-correlated sequences is used to identify structural divergence, which is then evaluated graphically by using difference profiles. A combination of these techniques correctly predicts secondary structural differences between melittin or beta-endorphin and their respective synthetic analogs. The method is potentially useful to describe evolutionary changes in protein secondary structure as well as in the design of peptide analogs.  相似文献   

4.
The eukaryotic acidic ribosomal P proteins, contrary to the standard r-proteins which are rapidly degraded in the cytoplasm, are found forming a large cytoplasmic pool that exchanges with the ribosome-bound proteins during translation. The native structure of the P proteins in solution is therefore an essential determinant of the protein-protein interactions that take place in the exchange process. In this work, the structure of the ribosomal acidic protein YP2beta from Saccharomyces cerevisiae has been investigated by fluorescence spectroscopy, circular dichroism (CD), nuclear magnetic resonance (NMR), and sedimentation equilibrium techniques. We have established the fact that YP2beta bears a 22% alpha-helical secondary structure and a noncompact tertiary structure under physiological conditions (pH 7.0 and 25 degrees C); the hydrophobic core of the protein appears to be solvent-exposed, and very low cooperativity is observed for heat- or urea-induced denaturation. Moreover, the 1H-NMR spectra show a small signal dispersion, and virtually all the amide protons exchange with the solvent on a very short time scale, which is characteristic of an open structure. At low pH, YP2beta maintains its secondary structure content, but there is no evidence for tertiary structure. 2,2,2-Trifluoroethanol (TFE) induces a higher amount of alpha-helical structure but also disrupts any trace of the remaining tertiary fold. These results indicate that YP2beta may have a flexible structure in the cytoplasmic pool, with some of the characteristics of a "molten globule", and also point out the physiological relevance of such flexible protein states in processes other than protein folding.  相似文献   

5.
Human profilin is a 15-kDa protein that plays a major role in the signaling pathway leading to cytoskeletal rearrangement. Essentially complete assignment of the 1H, 13C, and 15N resonances of human profilin have been made by analysis of multidimensional, double- and triple-resonance nuclear magnetic resonance (NMR) experiments. The deviation of the 13C alpha and 13C beta chemical shifts from their respective random coil values were analyzed and correlate well with the secondary structure determined from the NMR data. Twenty structures of human profilin were refined in the program X-PLOR using a total of 1186 experimentally derived conformational restraints. The structures converged to a root mean squared distance deviation of 1.5 A for the backbone atoms. The resultant conformational ensemble indicates that human profilin is an alpha/beta protein comprised of a seven-stranded, antiparallel beta-sheet and three helices. The secondary structure elements for human profilin are quite similar to those found in Acanthamoeba profilin I [Archer, S. J., Vinson, V. K., Pollard, T. D., & Torchia, D. A. (1993), Biochemistry 32, 6680-6687], suggesting that the three-dimensional structure of Acanthamoeba profilin I should be analogous to that determined here for human profilin. The structure determination of human profilin has facilitated the sequence alignment of lower eukaryotic and human profilins and provides a framework upon which the various functionalities of profilin can be explored. At least one element of the actin-binding region of human profilin is an alpha-helix. Two mechanisms by which phosphatidylinositol 4,5-bisphosphate can interfere with actin-binding by human profilin are proposed.  相似文献   

6.
This paper discusses the implementation of a three-dimensional (3D) structure motif search of proteins. Each protein structure is represented by a set of secondary structure elements (SSEs) which involves alpha-helix segments and beta-strand segments. In describing it, every SSE is further reduced into a two-node graph that consists of the starting amino acid residue, the ending residue and a pseudo-bond between them. The searching algorithm is based on a graph theoretical clique-finding algorithm that has been used for 3D substructure searching in small organic molecules. The program SS3D-P2 was validated using proteins that have well-known 3D motifs, and it correctly found the Greek key motif within an eye lens protein, crystallin, that consists of four anti-parallel beta strands. The program was also successfully applied to searching for the more complex 3D motif, TIM-type beta-barrel motif, with a protein structure database from the Protein Data Bank.  相似文献   

7.
A number of methods exist for the prediction of protein secondary structure from primary sequence. One method identifies variable charged and conserved hydrophobic residues within large multiple alignments as a means of indicating outside and inside sites respectively in the protein structure. These sites are then manually fitted to secondary structure templates to generate a secondary structure prediction. Using the existing theoretical bases of this method, we present an algorithm (STAMA) which automatically carries out the initial variation/conservation analysis of the alignment. We also test the accuracy of complete predictions carried out by manual fitting of the STAMA-derived assignments to structure templates, using five large multiple alignments each including a protein of known structure. The method was found on average to predict only 57% of residues in the correct secondary structure, and was only as accurate as predictions carried out using the established and automated method of Garnier, Osguthorpe and Robson (1978) applied to a single sequence. When used in conjunction with other secondary structure prediction methods, however, the resulting consensus predictions were found to be very accurate, with 78% of the elements (alpha helices or beta strands) for which a consensus could be obtained being predicted correctly. The algorithm presented here, plus the assessment of the accuracy of prediction generated by this method, should enable this predictive approach to receive informed general use.  相似文献   

8.
A new, efficient method for the assembly of protein tertiary structure from known, loosely encoded secondary structure restraints and sparse information about exact side chain contacts is proposed and evaluated. The method is based on a new, very simple method for the reduced modeling of protein structure and dynamics, where the protein is described as a lattice chain connecting side chain centers of mass rather than Calphas. The model has implicit built-in multibody correlations that simulate short- and long-range packing preferences, hydrogen bonding cooperativity and a mean force potential describing hydrophobic interactions. Due to the simplicity of the protein representation and definition of the model force field, the Monte Carlo algorithm is at least an order of magnitude faster than previously published Monte Carlo algorithms for structure assembly. In contrast to existing algorithms, the new method requires a smaller number of tertiary restraints for successful fold assembly; on average, one for every seven residues as compared to one for every four residues. For example, for smaller proteins such as the B domain of protein G, the resulting structures have a coordinate root mean square deviation (cRMSD), which is about 3 A from the experimental structure; for myoglobin, structures whose backbone cRMSD is 4.3 A are produced, and for a 247-residue TIM barrel, the cRMSD of the resulting folds is about 6 A. As would be expected, increasing the number of tertiary restraints improves the accuracy of the assembled structures. The reliability and robustness of the new method should enable its routine application in model building protocols based on various (very sparse) experimentally derived structural restraints.  相似文献   

9.
The backbone dynamics of the pleckstrin homology (PH) domain from dynamin were studied by 15N NMR relaxation (R1 and R2) and steady state heteronuclear 15N [1H] nuclear Overhauser effect measurements at 500 and 600 MHz, at protein concentrations of 1.7 mM and 300 microM, and by molecular dynamics (MD) simulations. The analysis was performed using the model-free approach. The method was extended in order to account for observed partial (equilibrium) dimerization of the protein at NMR concentrations. A model is developed that takes into account both rapid monomer-dimer exchange and anisotropy of the over-all rotation of the dimer. The data show complex dynamics of the dynamin PH domain. Internal motions in elements of the secondary structure are restricted, as inferred from the high value of the order parameter (S2 approximately 0.9) and from the local correlation time < 100 ps. Of the four extended loop regions that are disordered in the NMR-derived solution structure of the protein, loops beta 1/beta 2 and beta 5/beta 6 are involved in a large-amplitude (S2 down to 0.2 to 0.3) subnanosecond to nanosecond time-scale motion. Reorientation of the loops beta 3/beta 4 and beta 6/beta 7, in contrast, is restricted, characterized by the values of order parameter S2 approximately 0.9 more typical of the protein core. These loops, however, are involved in much slower processes of motion resulting in a conformational exchange on a microsecond to submillisecond time scale. The motions of the terminal regions (residues 1 to 10, 122 to 125) are practically unrestricted (S2 down to 0.05, characteristic times in nanosecond time scale), suggesting that these parts of the sequence do not participate in the protein fold. The analysis shows a larger sensitivity of the 15N relaxation data to protein microdynamic parameters (S2, tau loc) when protein molecular mass (tau c) increases. The use of negative values of the steady state 15N[1H] NOEs as an indicator of the residues not belonging to the folded structure is suggested. The amplitudes of local motion observed in the MD simulation are in a good-agreement with the NMR data for the amide NH groups located in the protein core.  相似文献   

10.
11.
As the structural database continues to expand, new methods are required to analyse and compare protein structures. Whereas the recognition, comparison, and classification of folds is now more or less a solved problem, tools for the study of constellations of small numbers of residues are few and far between. In this paper, two programs are described for the analysis of spatial motifs in protein structures. The first, SPASM, can be used to find the occurrence of a motif consisting of arbitrary main-chain and/or side-chains in a database of protein structures. The program also has a unique capability to carry out "fuzzy pattern matching" with relaxed requirements on the types of some or all of the matching residues. The second program, RIGOR, scans a single protein structure for the occurrence of any of a set of pre-defined motifs from a database. In one application, spatial motif recognition combined with profile analysis enabled the assignment of the structural and functional class of an uncharacterised hypothetical protein in the sequence database. In another application, the occurrence of short left-handed helical segments in protein structures was investigated, and such segments were found to be fairly common. Potential applications of the techniques presented here lie in the analysis of (newly determined) structures, in comparative structural analysis, in the design and engineering of novel functional sites, and in the prediction of structure and function of uncharacterised proteins.  相似文献   

12.
We describe a method for predicting the three-dimensional (3-D) structure of proteins from their sequence alone. The method is based on the electrostatic screening model for the stability of the protein main-chain conformation. The free energy of a protein as a function of its conformation is obtained from the potentials of mean force analysis of high-resolution x-ray protein structures. The free energy function is simple and contains only 44 fitted coefficients. The minimization of the free energy is performed by the torsion space Monte Carlo procedure using the concept of hierarchic condensation. The Monte Carlo minimization procedure is applied to predict the secondary, super-secondary, and native 3-D structures of 12 proteins with 28-110 amino acids. The 3-D structures of the majority of local secondary and super-secondary structures are predicted accurately. This result suggests that control in forming the native-like local structure is distributed along the entire protein sequence. The native 3-D structure is predicted correctly for 3 of 12 proteins composed mainly from the alpha-helices. The method fails to predict the native 3-D structure of proteins with a predominantly beta secondary structure. We suggest that the hierarchic condensation is not an appropriate procedure for simulating the folding of proteins made up primarily from beta-strands. The method has been proved accurate in predicting the local secondary and super-secondary structures in the blind ab initio 3-D prediction experiment.  相似文献   

13.
14.
The structure of the 129-residue protein hen lysozyme has been determined in solution by two-dimensional 1H nuclear magnetic resonance methods. 1158 NOE distance restraints, and 68 phi and 24 chi 1 dihedral angle restraints were employed in conjunction with distance geometry and simulated annealing procedures. The overall C alpha root-mean-square deviation from the average for 16 calculated structures is 1.8(+/- 0.2) A, but excluding 14 residues in exposed disordered regions, this value reduces to 1.3(+/- 0.2) A. Regions of secondary structure, and the four alpha-helices in particular, are well defined (C alpha root-mean-square deviation 0.8(+/- 0.3) A for helices). The main-chain fold is closely similar to structures of the protein in the crystalline state. Furthermore, many of the internal side-chains are found in well-defined conformational states in the solution structures, and these correspond well with the conformational states found in the crystal. The general high level of definition of mainchain and many internal side-chains in the solution structures is reinforced by the results of an analysis of coupling constants and ring current shifts. Many side-chains on the surface, however, are highly disordered amongst the set of solution structures. In certain cases this disorder has been shown to be dynamic in origin by the examination of 3J alpha beta coupling constants.  相似文献   

15.
The nuclear magnetic resonance (NMR) structure of the 15 kDa pathogenesis-related protein P14a, which displays antifungicidal activity and is induced in tomato leaves as a response to pathogen infection, was determined using 15N/13C doubly labeled and unlabeled protein samples. In all, 2030 conformational constraints were collected as input for the distance geometry program DIANA. After energy-minimization with the program OPAL the 20 best conformers had an average root-mean-square deviation value relative to the mean coordinates of 0.88 A for the backbone atoms N, C(alpha) and C', and 1.30 A for all heavy atoms. P14a contains four alpha-helices (I to IV) comprising residues 4 to 17, 27 to 40, 64 to 72 and 93 to 98, a short 3(10)-helix of residues 73 to 75 directly following helix III, and a mixed, four-stranded beta-sheet with topology +3x, -2x, +1, containing the residues 24-25, 53 to 58, 104 to 111 and 117 to 124. These regular secondary structure elements form a novel, complex alpha + beta topology in which the alpha-helices I, III and IV and the 3(10)-helix are located above the plane defined by the beta-sheet, and the alpha-helix II lies below this plane. The alpha-helices and beta-strands are thus arranged in three stacked layers, which are stabilized by two distinct hydrophobic cores associated with the two layer interfaces, giving rise to an "alpha-beta-alpha sandwich". The three-dimensional structure of P14a provides initial leads for identification of the so far unknown active sites and the mode of action of the protein, which is of direct interest for the generation of transgenic plants with improved host defense properties.  相似文献   

16.
Galactosyltransferases are enzymes which transfer galactose from UDP-Gal to various acceptors with either retention of the anomeric configuration to form alpha1,2-, alpha1,3-, alpha1,4-, and alpha1, 6-linkages, or inversion of the anomeric configuration to form beta1, 3-, beta1,4-, and beta1-ceramide linkages. During the last few years, several (c)DNA sequences coding for galactosyltransferases became available. We have retrieved these sequences and conducted sequence similarity studies. On the basis of both the nature of the reaction catalyzed and the protein sequence identity, these enzymes can be classified into twelve groups. Using a sensitive graphics method for protein comparison, conserved structural features were found in some of the galactosyltransferase groups, and other classes of glycosyltransferases, resulting in the definition of five families. The lengths and locations of the conserved regions as well as the invariant residues are described for each family. In addition, the DxD motif that may be important for substrate recognition and/or catalysis is demonstrated to occur in all families but one.  相似文献   

17.
18.
We describe a program, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases. Two previously described tRNA detection programs are used as fast, first-pass prefilters to identify candidate tRNAs, which are then analyzed by a highly selective tRNA covariance model. This work represents a practical application of RNA covariance models, which are general, probabilistic secondary structure profiles based on stochastic context-free grammars. tRNAscan-SE searches at approximately 30 000 bp/s. Additional extensions to tRNAscan-SE detect unusual tRNA homologues such as selenocysteine tRNAs, tRNA-derived repetitive elements and tRNA pseudogenes.  相似文献   

19.
A computer program (SANDOCK) has been developed for the automated docking of small ligands to a target protein. It uses a guided matching algorithm to fit ligand atoms into the protein binding pocket. The protein is described by a modified Lee-Richard's dotted surface with each dot coded by chemical property and accessibility. Orientations of the ligand in the active site are generated such that a chemical and a shape complementary between the ligand and the active site cavity have to be fulfilled. The generated fits are evaluated with scoring functions which account for van der Waals, hydrophobic and hydrogen bonding interactions. This newly developed docking program can efficiently screen very large databases in a reasonable time and has been used to successfully identify novel ligands. The X-ray structure of a thrombin-ligand complex predicted by SANDOCK is described. The ligand binds to thrombin with a Kd of 65 microM and has an rmsd of 0.7 A for all ligand atoms from the predicted binding mode by SANDOCK.  相似文献   

20.
Describes the use of concept mapping to develop a pictorial multivariate conceptual framework of staff views of a program of supported employment (SE) for individuals with severe mental illness. The SE program involves extended individualized supported employment for clients through a mobile job support worker who maintains contact with the client after job placement and supports the client in various ways. Participants were 14 staff members of a psychiatric rehabilitation agency with assignments associated with the SE program. They brainstormed a large number of specific program activity statements (N?=?96), sorted and rated the statements, and interpreted the map that was produced through multidimensional scaling and hierarchical cluster analysis. The resulting map enabled identification of 4 issues that should be included in any theory of SE programs: the specific activity sequences that characterize the program itself, the pattern of local program evolution, the definition of program staff roles, and the influence of key contextual factors such as the client's family or the program's administrative structure. The implications of concept mapping methodology for theory development and program evaluation are considered. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号