首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A new algorithm is reported which builds an alignment between two protein structures. The algorithm involves a combinatorial extension (CE) of an alignment path defined by aligned fragment pairs (AFPs) rather than the more conventional techniques using dynamic programming and Monte Carlo optimization. AFPs, as the name suggests, are pairs of fragments, one from each protein, which confer structure similarity. AFPs are based on local geometry, rather than global features such as orientation of secondary structures and overall topology. Combinations of AFPs that represent possible continuous alignment paths are selectively extended or discarded thereby leading to a single optimal alignment. The algorithm is fast and accurate in finding an optimal structure alignment and hence suitable for database scanning and detailed analysis of large protein families. The method has been tested and compared with results from Dali and VAST using a representative sample of similar structures. Several new structural similarities not detected by these other methods are reported. Specific one-on-one alignments and searches against all structures as found in the Protein Data Bank (PDB) can be performed via the Web at http://cl.sdsc.edu/ce.html.   相似文献   

2.
Evaluation and improvements in the automatic alignment of protein sequences   总被引:1,自引:0,他引:1  
The accuracy of protein sequence alignment obtained by applyinga commonly used global sequence comparison algorithm is assessed.Alignments based on the superposition of the three-dimensionalstructures are used as a standard for testing the automatic,sequence-based methods. Alignments obtained from the globalcomparison of five pairs of homologous protein sequences studiedgave 54% agreement overall for residues in secondary structures.The inclusion of information about the secondary structure ofone of the proteins in order to limit the number of gaps insertedin regions of secondary structure, improved this figure to 68%.A similarity score of greater than six standard deviation unitssuggests that an alignment which is greater than 75% correctwithin secondary structural regions can be obtained automaticallyfor the pair of sequences.  相似文献   

3.
A method for comparison of protein sequences based on theirprimary and secondary structure is described. Protein sequencesare annotated with predicted secondary structures (using a modifiedChou and Fasman method). Two lettered code sequences are generated(Xx, where X is the amino acid and x is its annotated secondarystructure). Sequences are compared with a dynamic programmingmethod (STRALIGN) that includes a similarity matrix for boththe amino acids and secondary structures. The similarity valuefor each paired two-lettered code is a linear combination ofsimilarity values for the paired amino acids and their annotatedsecondary structures. The method has been applied to eight globinproteins (28 pairs) for which the X-ray structure is known.For protein pairs with high primary sequence similarity (>45%),STRALIGN alignment is identical to that obtained by a dynamicprogramming method using only primary sequence information.However, alignment of protein pairs with lower primary sequencesimilarity improves significantly with the addition of secondarystructure annotation. Alignment of the pair with the least primarysequence similarity of 16% was improved from 0 to 37% ‘correct’alignment using this method. In addition, STRALIGN was successfullyapplied to seven pairs of distantly related cytochrome c proteins,and three pairs of distantly related picornavirus proteins.  相似文献   

4.
In search of the ideal protein sequence   总被引:1,自引:0,他引:1  
The inverse of a folding problem is to find the ideal sequencethat folds into a particular protein structure. This problemhas been addressed using the topology fingerprintbased threadingalgorithm, capable of calculating a score (energy) of an arbitrarysequence-structure pair. At first, the search is conducted byunconstrained minimization of the energy in sequence space.It is shown that using energy as the only design criterion leadsto spurious solutions with incorrect amino acid composition.The problem lies in the general features of the protein energysurface as a function of both structure and sequence. The proposedsolution is to design the sequence by maximizing the differencebetween its energy in the desired structure and in other knownprotein structures. Depending on the size of the database ofstructures ‘to avoid’, sequences bearing significantsimilarity to the native sequence of the target protein areobtained using this procedure.  相似文献   

5.
In order to detect a motif of local structures in different protein conformations, the Delaunay tessellation is applied to protein structures represented by C(alpha) atoms only. By the Delaunay tessellation the interior space of the protein is uniquely divided up into Delaunay tetrahedra whose vertices are the C(alpha) atom positions. Some edges of the tetrahedra are virtual bonds connecting adjacent residues' C(alpha) atoms along the polypeptide chain and others indicate interactions between residues nearest neighbouring in space. The rules are proposed to assign a code, i.e., a string of digits, to each tetrahedron to characterize the local structure constructed by the vertex residues of one relevant tetrahedron and four surrounding it. Many sets comprised of the local structures with the same code are obtained from 293 proteins, each of which has relatively low sequence similarity with the others. The local structures in each set are similar enough to each other to represent a motif. Some of them are parts of secondary or supersecondary structures, and others are irregular, but definite structures. The method proposed here can find motifs of local structures in the Protein Data Bank much more easily and rapidly than other conventional methods, because they are represented by codes. The motifs detected in this method can provide more detailed information about specific interactions between residues in the local structures, because the edges of the Delaunay tetrahedra are regarded to express interactions between residues nearest neighbouring in space.   相似文献   

6.
We present an algorithm that is able to propose compact modelsof protein 3D structures, only starting from the predictionof the nature and length of regular secondary structures. Helicesare modeled by cylinders and sheets by helicoid surfaces, allstrands of a sheet being considered as a single block. It meansthat relative topology of the strands inside one sheet is aprerequisite. Loops are only considered as constraints, givenby the maximal distance between their C extremities accordingto their sequence length. Unconnected regular secondary structuresare reduced to a single point, the center of their hydrophobicfaces. These centers are then repeatedly moved in order to obtaina compact hydrophobic core. To prevent secondary structuresfrom interpenetrating, a repulsive term is introduced in thefunction whose minimization leads to the compact structure.This RUSSIA (Rigid Unconnected Secondary Structure Assembly)algorithm has the advantage of relying on a small number ofvariables and therefore many initial conformations can be tested.Flexibility is produced in the following way: helices or sheetsare allowed to rotate around the direction leading to the centerof the model; residues in a sheet can slide along the main directionof the strand where they are embedded. RUSSIA is fast and simpleand it produces on a test set several neighbor good models withan r.m.s. to the native structures in the range 1.4–3.7Å. These models can be further treated by statisticalpotentials used in threading approaches in order to detect thebest candidate. The limits of the present method are the following:small proteins with few secondary structures are excluded; multidomain proteins must be split into several compact globulardomains from their sequences; sheets of more than five strandsand completely buried helices are not treated. In this firstpaper the algorithm is developed and in Part II, which follows,some applications are presented and the program is evaluated. Received July 25, 2003; revised October 25, 2003; accepted October 30, 2003  相似文献   

7.
An algorithm for predicting protein /ß-sheet topologiesfrom secondary structure and topological folding rules (constraints)has been developed and implemented in Prolog. This algorithm(CBS1) is based on constraint satisfaction and employs forwardpruned breadth-first search and rotational invariance. CBS1showed a 37-fold increase in efficiency over an exhaustive generateand test algorithm giving the same solution for a typical sheetof five strands whose topology was predicted from secondarystructure with four topological folding constraints. Prologspecifications of a range of putative protein folding ruleswere then used to (i) replicate published protein topology predictionsand (ii) validate these rules against known protein structuresof nucleotide-binding domains. This demonstrated that (i) manualtechniques for topology prediction can lead to non-exhaustivesearch and (ii) most of these protein folding principles wereviolated by specific proteins. Various extensions to the algorithmare discussed.  相似文献   

8.
We have developed a variable gap penalty function for use inthe comparison program COMPARER which aligns protein sequenceson the basis of their 3-D structures. For deletions and insertions,components are a function of structural features of individualamino acid residues (e.g. secondary structure and accessibility).We have also obtained relative weights for different featuresused in the comparison by examining the equivalent residuesin weight matrices and in alignments for pairs of 3-D structureswhere the equivalences are relatively unambiguous. We have usedthe new parameters and the varible gap penalty function in COMPARERto align protein structures in the Brookhaven Data Bank. Thevariable gap penalty function is useful especially in avoidinggaps in secondary structure elements and the new feature weightsgive improved alignments. The alignments for both azurins andplastocyanins and N- and C-terminal lobes for aspartic proteinasesare discussed  相似文献   

9.
Fold recognition methods aim to use the information in the known protein structures (the targets) to identify that the sequence of a protein of unknown structure (the probe) will adopt a known fold. This paper highlights that the structural similarities sought by these methods can be divided into two types: remote homologues and analogues. Homologues are the result of divergent evolution and often share a common function. We define remote homologues as those that are not easily detectable by sequence comparison methods alone. Analogues do not have a common ancestor and generally do not have a common function. Several sets of empirical matrices for residue substitution, secondary structure conservation and residue accessibility conservation have previously been derived from aligned pairs of remote homologues and analogues (Russell et al., J. Mol. Biol., 1997, 269, 423-439). Here a method for fold recognition, FOLDFIT, is introduced that uses these matrices to match the sequences, secondary structures and residue accessibilities of the probe and target. The approach is evaluated on distinct datasets of analogous and remotely homologous folds. The accuracy of FOLDFIT with the different matrices on the two datasets is contrasted to results from another fold recognition method (THREADER) and to searches using mutation matrices in the absence of any structural information. FOLDFIT identifies at top rank 12 out of 18 remotely homologous folds and five out of nine analogous folds. The average alignment accuracies for residue and secondary structure equivalencing are much higher for homologous folds (residue approximately 42%, secondary structure approximately 78%) than for analogues folds (approximately 12%, approximately 47%). Sequence searches alone can be successful for several homologues in the testing sets but nearly always fail for the analogues. These results suggest that the recognition of analogous and remotely homologous folds should be assessed separately. This study has implications for the development and comparative evaluation of fold recognition algorithms.   相似文献   

10.
The hydration properties of a protein are important determinants of its structure and function. Here, modular neural networks are employed to predict ordered hydration sites using protein sequence information. First, secondary structure and solvent accessibility are predicted from sequence with two separate neural networks. These predictions are used as input together with protein sequences for networks predicting hydration of residues, backbone atoms and sidechains. These networks are trained with protein crystal structures. The prediction of hydration is improved by adding information on secondary structure and solvent accessibility and, using actual values of these properties, residue hydration can be predicted to 77% accuracy with a Matthews coefficient of 0.43. However, predicted property data with an accuracy of 60-70% result in less than half the improvement in predictive performance observed using the actual values. The inclusion of property information allows a smaller sequence window to be used in the networks to predict hydration. It has a greater impact on the accuracy of hydration site prediction for backbone atoms than for sidechains and for non-polar than polar residues. The networks provide insight into the mutual interdependencies between the location of ordered water sites and the structural and chemical characteristics of the protein residues.   相似文献   

11.
Protein similarity estimations can be achieved using reduced dimensional representations and we describe a new application for the generation of two-dimensional maps from the three-dimensional structure. The code for the dimensionality reduction is based on the concept of pseudo-random generation of two-dimensional coordinates and Monte Carlo-like acceptance criteria for the generated coordinates. A new method for calculating protein similarity is developed by introducing a distance-dependent similarity field. Similarity of two proteins is derived from similarity field indices between amino acids based on various criteria such as hydrophobicity, residue replacement factors and conformational similarity, each showing a one factor Gaussian dependence. Results on comparisons of misfolded protein models with data sets of correctly folded structures show that discrimination between correctly folded and misfolded structures is possible. Tests were carried out on five different proteins, comparing a misfolded protein structure with members of the same topology, architecture, family and domain according to the CATH classification.  相似文献   

12.
Proteins with similar folds often display common patterns ofresidue variability. A widely discussed question is how thesepatterns can be identified and deconvoluted to predict proteinstructure. In this respect, correlated mutation analysis (CMA)has shown considerable promise. CMA compares multiple membersof a protein family and detects residues that remain constantor mutate in tandem. Often this behavior points to structuralor functional interdependence between residues. CMA has beenused to predict pairs of amino acids that are distant in theprimary sequence but likely to form close contacts in the nativethree-dimensional structure. Until now these methods have usedevolutionary or biophysical models to score the fit betweenresidues. We wished to test whether empirical methods, derivedfrom known protein structures, would provide useful predictivepower for CMA. We analyzed 672 known protein structures, derivedcontact likelihood scores for all possible amino acid pairs,and used these scores to predict contacts. We then tested themethod on 118 different protein families for which structureshave been solved to atomic resolution. The mean performancewas almost seven times better than random prediction. Used inconcert with secondary structure prediction, the new CMA methodcould supply restraints for predicting still undetermined structures.  相似文献   

13.
Molecular simulations able to exactly represent solvated chargedproteins are helpful in understanding protein dynamics, structureand function. In the present study we have used two differentstarting structures of papain (a typical, stable, globular proteinof intermediate net charge) and different modeling proceduresto evaluate some effects of counterions in simulations. A numberof configurations have been generated and relaxed for each systemby various combinations of constrained simulated annealing andmolecular dynamics procedures, using the AMBER force field.The analysis of trajectories shows that the simulations of solvatedproteins are moderately sensitive to the presence of counterions.However, this sensitivity is highly dependent on the startingmodel and different procedures of equilibration used. The neutralizedsystems tend to evince smaller root mean square deviations regardlessof the system investigated and the simulation procedure used.The results of parameterized fitting of the simulated structuresto the crystallographic data, giving quantitative measure ofthe total charge influence on the stability of various elementsof the secondary structure, revealed a clear scatter of differentreactions of various systems' secondary structures to counterionsaddition: some systems apparently were stabilized when neutralized,while the others were not. Thus, one cannot unequivocally state,despite consideration of specific simulation conditions, whetherprotein secondary structures are more stable when they haveneutralized charges. This suggests that caution should be takenwhen claiming the stabilizing effect of counterions in simulationsother than those involving small, unstable polypeptides or highlycharged proteins.  相似文献   

14.
15.
We present a comprehensive analysis of amino acid substitution patterns (sets of residues in a position of a multiple alignment) and conservation of physicochemical properties in alignments of protein sequences. Of the one million possible substitution patterns, only a few hundred account for the majority of aligned positions. Very similar distributions of substitution patterns are observed in all but one of the diverse databases of multiple alignments. In these substitution patterns we analyzed the conservation of 511 physicochemical and steric amino acid properties. Highest conservation was observed in those steric and transfer free energy-related properties that are crucial for folding. The best conserved steric properties include the minimal width of the side chains and their interactions with other residues. Among the hydrophobicity-related properties, charge and those properties that provide information on propensities to form secondary structures or side chain conformation, appear to be better conserved than pure hydrophobicity measures. Physicochemical sequence analysis based on the most conserved properties is expected to aid searching a protein sequence query against a database of multiple alignments, prediction of secondary and tertiary structures and protein engineering.   相似文献   

16.
Giuseppe Colacicco 《Lipids》1970,5(7):636-649
The influence of lipid and protein on the properties of the air-water interface is analyzed with the view to formulate a mechanism of interaction of protein with lipid monolayers. The increase in surface pressure (ΔΠ) and the quantity of protein incorporated in the lipid film after injection of protein under lipid monolayers were studied as a function of both lipid structure and protein structure. With rabbit γ-globulin, the values of ΔΠ were cholesterol > phosphatidyl choline > sphingomyelin. Similar results were obtained with ribonuclease, lysozyme and serum albumin. The quantities of protein found in films of either cholesterol or phosphatidyl choline (egg lecithin) were much larger than those calculated from a geometric model in which a protein monolayer occupies the area made available by the compressed lipid. Arguments are produced against penetration based on simple mechanisms of compressibility of the lipid film. The mechanisms operating in the incorporation of protein into lipid monolayers are grouped into three categories: (a) free penetration, typical of lecithin; (b) binding-mediated penetration, typical of cholesterol and some glycosphingolipids; and (c) binding-inhibited penetration, typical of the albumin-ganglioside system and a specific lipid hapten-antibody system. A model is described in which nonspecific protein interacts with polymeric lecithin structures (surface micelles). In the sequence of events X»Y»Z, the globular protein X is activated into the expanded or extended form Y by contact with the lipid and then restructured into a compact form Z with release of water and free energy. The resulting lipid-protein assembly has a mosaic structure in which lipid and protein polar surfaces are exposed to water. Accessibility of lecithin to phospholipase A is consistent with the model and with current views on the state of protein in biological membranes; according to such views, protein is more likely structured inside the lipid milieu and not simply denatured on the lipid-water interface.  相似文献   

17.
18.
Relatively little has been known about the structure of alpha-helical membrane proteins, since until recently few structures had been crystallized. These limited data have restricted structural analyses to the prediction of secondary structure, rather than tertiary folds. In order to address this, this paper describes an analysis of the 23 available membrane protein structures. A number of findings are made that are of particular relevance to transmembrane helix packing: (1) on average lipid-tail-accessible transmembrane residues are significantly more hydrophobic, less conserved and contain different residue types to buried residues; (2) charged residues are not always buried and, when accessible to membrane lipid tails, few are paired with another charge and instead they often interact with phospholipid head-groups or with other residue types; (3) a significant proportion of lipid-tail-accessible charged and polar residues form hydrogen bonds only with residues one turn away in the same helix (intra-helix); (4) pore-lining residues are usually hydrophobic and it is difficult to distinguish them from buried residues in terms of either residue type or conservation; and (5) information was gained about the proportion of helices that tend to contribute to lining a pore and the resulting pore diameter. These findings are discussed with relevance to the prediction of membrane protein 3D structure.  相似文献   

19.
The fusion protein of respiratory syncytial virus (RSV-F) isresponsible for fusion of virion with host cells and infectionof neighbouring cells through the formation of syncytia. A three-dimensionalmodel structure of RSV-F was derived by homology modelling fromthe structure of the equivalent protein in Newcastle diseasevirus (NDV). Despite very low sequence homology between thetwo structures, most features of the model appear to have highcredibility, although a few small regions in RSV-F whose secondarystructure is predicted to be different to that in NDV are likelyto be poorly modelled. The organization of individual residuesidentified in escape mutants against monoclonal antibodies correlateswell with known antigenic sites. The location of residues involvedin point mutations in several drug-resistant variants is alsoexamined.  相似文献   

20.
The interaction between lead and yeast hexokinase has been studied. Lead provokes a large variation in the aggregation state of the protein, forming bigger structures of high molecular mass. This phenomenon is characterized by a small modification in the tridimensional structure and a great variation in the secondary structure. There is a loss in α‐helix which is compensated by an enhancement in β‐sheet. The polypeptide chain is more stable in the β‐sheet structure corresponding to the aggregate forms. During this change the enzyme maintains a high level of activity in the monomer and also in the aggregate form. This implies that the enzyme function is not greatly affected by the change, and active sites are retained without important modifications. According to kinetic measurements the ATP site is more affected than the glucose site. There is a mixed type inhibition with a main competitive component when glucose acts as a variable substrate. © 2001 Society of Chemical Industry  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号