首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Eukaryotic ribosomal proteins are highly conserved across widely divergent species, suggesting that strong functional constraints prevent divergence of important amino acid motifs. Using this as a basis, an evolutionary approach could be used to identify putative functional motifs. We obtained the DNA sequence of the ribosomal protein L18 from the evolutionary divergent protozoan parasite, Trypanosoma brucei. Analysis of this sequence showed that it had 46% and 43% identity with the human and yeast sequences, respectively, and 30% of amino acid residues were identical across all the species analysed. Using these data, amino acids essential to the structure and function of ribosomal protein L18 can be inferred and could provide valuable information for molecular modelling and mutational studies.  相似文献   

2.
Similar conserved structures appear in apparently unrelated protein families. Thus, the superfamily of insulin shows an evolutionary relationship with the alpha-conotoxins of marine fish-hunting snails as indicated by methods of protein comparison. In order to reach statistical significance, the A-chains of different insulins, insulin-like growth factors, relaxins, insulin related peptides from invertebrates were drawn for comparison. These data were correlated with sequences from randomly chosen proteins. The alpha-conotoxins show identity scores up to 37.5% and similarity up to 56.2% toward the members of the insulin-superfamily. These scores conform to values achieved by comparing the relaxin and the insulin/IGF-sequences. The data show clearly that the identity and similarity values obtained in the comparison with the insulins are significantly higher than the scores of randomly chosen protein primary structures. According to our calculated data, this hormone system regulating metabolism and growth in vertebrates and the mentioned toxin-receptor system share the same evolutionary ancestor. However, this statistical approach has to be substantiated on gene level.  相似文献   

3.
We describe a database of protein structure alignments for homologous families. The database HOMSTRAD presently contains 130 protein families and 590 aligned structures, which have been selected on the basis of quality of the X-ray analysis and accuracy of the structure. For each family, the database provides a structure-based alignment derived using COMPARER and annotated with JOY in a special format that represents the local structural environment of each amino acid residue. HOMSTRAD also provides a set of superposed atomic coordinates obtained using MNYFIT, which can be viewed with a graphical user interface or used for comparative modeling studies. The database is freely available on the World Wide Web at: http://www-cryst.bioc.cam. ac.uk/-homstrad/, with search facilities and links to other databases.  相似文献   

4.
In a similar manner to sequence database searching, it is also possible to compare three-dimensional protein structure. Such methods can be extremely useful because a structural similarity may represent a distant evolutionary relationship that is undetectable by sequence analysis. In this review, we summarise the most popular structure comparison methods, show how they can be used for database searching, and then describe some of the most advanced attempts to develop comprehensive protein structure classifications. With such data, it is possible to identify distant evolutionary relationships, provide libraries of unique folds for structure prediction, estimate the total number of folds that exist, and investigate the preference for certain types of structures over others.  相似文献   

5.
HSSP is a derived database merging structural three dimensional (3-D) and sequence one dimensional(1-D) information. For each protein of known 3-D structure from the Protein Data Bank (PDB), the database has a multiple sequence alignment of all available homologues and a sequence profile characteristic of the family. The list of homologues is the result of a database search in Swissprot using a position-weighted dynamic programming method for sequence profile alignment (MaxHom). The database is updated frequently. The listed homologues are very likely to have the same 3-D structure as the PDB protein to which they have been aligned. As a result, the database is not only a database of aligned sequence families, but also a database of implied secondary and tertiary structures covering 27% of all Swissprot-stored sequences.  相似文献   

6.
Matrix metalloproteinases: structures, evolution, and diversification   总被引:1,自引:0,他引:1  
A comprehensive sequence alignment of 64 members of the family of matrix metalloproteinases (MMPs) for the entire sequences, and subsequently the catalytic and the hemopexin-like domains, have been performed. The 64 MMPs were selected from plants, invertebrates, and vertebrates. The analyses disclosed that as many as 23 distinct subfamilies of these proteins are known to exist. Information from the sequence alignments was correlated with structures, both crystallographic as well as computational, of the catalytic domains for the 23 representative members of the MMP family. A survey of the metal binding sites and two loops containing variable sequences of amino acids, which are important for substrate interactions, are discussed. The collective data support the proposal that the assembly of the domains into multidomain enzymes was likely to be an early evolutionary event. This was followed by diversification, perhaps in parallel among the MMPs, in a subsequent evolutionary time scale. Analysis indicates that a retrograde structure simplification may have accounted for the evolution of MMPs with simple domain constituents, such as matrilysin, from the larger and more elaborate enzymes.  相似文献   

7.
We apply a simple method for aligning protein sequences on the basis of a 3D structure, on a large scale, to the proteins in the scop classification of fold families. This allows us to assess, understand, and improve our automatic method against an objective, manually derived standard, a type of comprehensive evaluation that has not yet been possible for other structural alignment algorithms. Our basic approach directly matches the backbones of two structures, using repeated cycles of dynamic programming and least-squares fitting to determine an alignment minimizing coordinate difference. Because of simplicity, our method can be readily modified to take into account additional features of protein structure such as the orientation of side chains or the location-dependent cost of opening a gap. Our basic method, augmented by such modifications, can find reasonable alignments for all but 1.5% of the known structural similarities in scop, i.e., all but 32 of the 2,107 superfamily pairs. We discuss the specific protein structural features that make these 32 pairs so difficult to align and show how our procedure effectively partitions the relationships in scop into different categories, depending on what aspects of protein structure are involved (e.g., depending on whether or not consideration of side-chain orientation is necessary for proper alignment). We also show how our pairwise alignment procedure can be extended to generate a multiple alignment for a group of related structures. We have compared these alignments in detail with corresponding manual ones culled from the literature. We find good agreement (to within 95% for the core regions), and detailed comparison highlights how particular protein structural features (such as certain strands) are problematical to align, giving somewhat ambiguous results. With these improvements and systematic tests, our procedure should be useful for the development of scop and the future classification of protein folds.  相似文献   

8.
Two homologous sequences, which have diverged beyond the point where their homology can be recognised by a simple direct comparison, can be related through a third sequence that is suitably intermediate between the two. High scores, for a sequence match between the first and third sequences and between the second and the third sequences, imply that the first and second sequences are related even though their own match score is low. We have tested the usefulness of this idea using a database that contains the sequences of 971 protein domains whose structures are known and whose residue identities with each other are some 40% or less (PDB40D). On the basis of sequence and structural information, 2143 pairs of these sequences are known to have an evolutionary relationship. FASTA, in an all-against-all comparison of the sequences in the database, detected 320 (15%) of these relationships as well as three false positive (i.e. 1% error rate). Using intermediate sequences found by FASTA matches of PDB40D sequences to those in the large non-redundant OWL database we could detect 550 evolutionary relationships with an error rate of 1%. This means the intermediate sequence procedure increases the ability to recognise the evolutionary relationships amongst the PDB40D sequences by 70%.  相似文献   

9.
10.
Previously proposed methods for protein secondary structure prediction from multiple sequence alignments do not efficiently extract the evolutionary information that these alignments contain. The predictions of these methods are less accurate than they could be, because of their failure to consider explicitly the phylogenetic tree that relates aligned protein sequences. As an alternative, we present a hidden Markov model approach to secondary structure prediction that more fully uses the evolutionary information contained in protein sequence alignments. A representative example is presented, and three experiments are performed that illustrate how the appropriate representation of evolutionary relatedness can improve inferences. We explain why similar improvement can be expected in other secondary structure prediction methods and indeed any comparative sequence analysis method.  相似文献   

11.
A method was developed to compare protein structures and to combine them into a multiple structure consensus. Previous methods of multiple structure comparison have only concatenated pairwise alignments or produced a consensus structure by averaging coordinate sets. The current method is a fusion of the fast structure comparison program SSAP and the multiple sequence alignment program MULTAL. As in MULTAL, structures are progressively combined, producing intermediate consensus structures that are compared directly to each other and all remaining single structures. This leads to a hierarchic "condensation," continually evaluated in the light of the emerging conserved core regions. Following the SSAP approach, all interatomic vectors were retained with well-conserved regions distinguished by coherent vector bundles (the structural equivalent of a conserved sequence position). Each bundle of vectors is summarized by a resultant, whereas vector coherence is captured in an error term, which is the only distinction between conserved and variable positions. Resultant vectors are used directly in the comparison, which is weighted by their error values, giving greater importance to the matching of conserved positions. The resultant vectors and their errors can also be used directly in molecular modeling. Applications of the method were assessed by the quality of the resulting sequence alignments, phylogenetic tree construction, and databank scanning with the consensus. Visual assessment of the structural superpositions and consensus structure for various well-characterized families confirmed that the consensus had identified a reasonable core.  相似文献   

12.
A rigorous Bayesian analysis is presented that unifies protein sequence-structure alignment and recognition. Given a sequence, explicit formulae are derived to select (1) its globally most probable core structure from a structure library; (2) its globally most probable alignment to a given core structure; (3) its most probable joint core structure and alignment chosen globally across the entire library; and (4) its most probable individual segments, secondary structure, and super-secondary structures across the entire library. The computations involved are NP-hard in the general case (3D-3D). Fast exact recursions for the restricted sequence singleton-only (1D-3D) case are given. Conclusions include: (a) the most probable joint core structure and alignment is not necessarily the most probable alignment of the most probable core structure, but rather maximizes the product of core and alignment probabilities; (b) use of a sequence-independent linear or affine gap penalty may result in the highest-probability threading not having the lowest score; (c) selecting the most probable core structure from the library (core structure selection or fold recognition only) involves comparing probabilities summed over all possible alignments of the sequence to the core, and not comparing individual optimal (or near-optimal) sequence-structure alignments; and (d) assuming uninformative priors, core structure selection is equivalent to comparing the ratio of two global means.  相似文献   

13.
A new method to detect remote relationships between protein sequences and known three-dimensional structures based on direct energy calculations and without reliance on statistics has been developed. The likelihood of a residue to occupy a given position on the structural template was represented by an estimate of the stabilization free energy made after explicit prediction of the substituted side chain conformation. The profile matrix derived from these energy values and modified by increasing the residue self-exchange values successfully predicted compatibility of heat-shock protein and globin sequences with the three-dimensional structures of actin and phycocyanin, respectively, from a full protein sequence databank search. The high sensitivity of the method makes it a unique tool for predicting the three-dimensional fold for the rapidly growing number of protein sequences.  相似文献   

14.
Using a variety of techniques, including sequence alignment, secondary structure prediction, molecular mechanics and molecular dynamics, we have constructed a model for the three-dimensional structure of P-450arom (human aromatase) based on that of P-450cam, the only cytochrome P-450 enzyme for which the crystal structure is known. The predicted structure is found to be in good agreement with current experimental data; both direct, from site-directed mutagenesis studies, and indirect, from the consideration of the structures and activities of known substrates and inhibitors.  相似文献   

15.
Factor VIIa (FVIIa) is a soluble four-domain plasma serine protease coagulation factor that forms a tight complex with the two extracellular domains of the transmembrane protein tissue factor in the initiating step of blood coagulation. To date, there is no crystal structure for free FVIIa. X-ray and neutron scattering data in solution for free FVIIa and the complex between FVIIa and soluble tissue factor (sTF) had been obtained for comparison with crystal structures of the FVIIa-sTF complex and of free factor IXa (FIXa). The solution structure of free FVIIa as derived from scattering data is consistent with the extended domain arrangement of FVIIa seen in the crystal structure of its complex with sTF, but is incompatible with the bent, less extended domain conformation seen in the FIXa crystal structure. The FVIIa scattering curve is also compatible with a subset of 317 possible extended structures derived from a constrained automated conformational search of 15 625 FVIIa domain models. Thus, the scattering data support extended domain models for FVIIa free in solution. Similar analyses showed that the solution scattering derived and crystal structures of the FVIIa-sTF complex were in good agreement. An automated constrained search for allowed structures for the complex in solution based on scattering curves showed that only a small family of compact models gave good agreement, namely those in which FVIIa and sTF interact closely over a large surface area. The general utility of this approach for structural analysis of heterodimeric complexes in solution is discussed. Analytical ultracentrifugation data and the modeling of these data were consistent with the scattering results. It is concluded that in solution FVIIa has an extended or elongated domain structure, which allows rapid interaction with sTF over a large surface area to form a high-affinity complex.  相似文献   

16.
The three-dimensional structure of bacterial sphingomyelinase (SMase) was predicted using a protein fold recognition method; the search of a library of known structures showed that the SMase sequence is highly compatible with the mammalian DNase I structure, which suggested that SMase adopts a structure similar to that of DNase I. The amino acid sequence alignment based on the prediction revealed that, despite the lack of overall sequence similarity (less than 10% identity), those residues of DNase I that are involved in the hydrolysis of the phosphodiester bond, including two histidine residues (His 134 and His 252) of the active center, are conserved in SMase. In addition, a conserved pentapeptide sequence motif was found, which includes two catalytically critical residues, Asp 251 and His 252. A sequence database search showed that the motif is highly specific to mammalian DNase I and bacterial SMase. The functional roles of SMase residues identified by the sequence comparison were consistent with the results from mutant studies. Two Bacillus cereus SMase mutants (H134A and H252A) were constructed by site-directed mutagenesis. They completely abolished their catalytic activity. A model for the SMase-sphingomyelin complex structure was built to investigate how the SMase specifically recognizes its substrate. The model suggested that a set of residues conserved among bacterial SMases, including Trp 28 and Phe 55, might be important in the substrate recognition. The predicted structural similarity and the conservation of the functionally important residues strongly suggest a distant evolutionary relationship between bacterial SMase and mammalian DNase I. These two phosphodiesterases must have acquired the specificity for different substrates in the course of evolution.  相似文献   

17.
18.
A novel computer modeling approach suitable for the structure analysis of small bioactive peptides has been developed. This approach involves identification of conformational patterns in protein structure data bank based on the sequence homology with the bioactive peptide. The models built on the basis of this homology and having common conformational patterns are analyzed under the structural constraints derived from the activity data of various synthetic analogs of the peptide. Application of this procedure to the gonadotropin-releasing hormone (GnRH) resulted in a library of possible structures for GnRH, 9 among which shared a common beta-turn. Further analysis of the structures containing the beta-turn motif, in the context of the structure-activity data, led to a model for the active conformation of GnRH. The topology of the putative receptor binding site of the hormone is defined by a contiguous surface formed through an appropriate juxtaposition of the N-terminal pGlu1, the guanidyl group of Arg8, aromatic side chain of Trp3, and the Gly10-NH2 at the C-terminal end.  相似文献   

19.
A number of methods exist for the prediction of protein secondary structure from primary sequence. One method identifies variable charged and conserved hydrophobic residues within large multiple alignments as a means of indicating outside and inside sites respectively in the protein structure. These sites are then manually fitted to secondary structure templates to generate a secondary structure prediction. Using the existing theoretical bases of this method, we present an algorithm (STAMA) which automatically carries out the initial variation/conservation analysis of the alignment. We also test the accuracy of complete predictions carried out by manual fitting of the STAMA-derived assignments to structure templates, using five large multiple alignments each including a protein of known structure. The method was found on average to predict only 57% of residues in the correct secondary structure, and was only as accurate as predictions carried out using the established and automated method of Garnier, Osguthorpe and Robson (1978) applied to a single sequence. When used in conjunction with other secondary structure prediction methods, however, the resulting consensus predictions were found to be very accurate, with 78% of the elements (alpha helices or beta strands) for which a consensus could be obtained being predicted correctly. The algorithm presented here, plus the assessment of the accuracy of prediction generated by this method, should enable this predictive approach to receive informed general use.  相似文献   

20.
A combinatorial sequence space (CSS) model was introduced to represent sequences as a set of overlapping k-tuples of some fixed length which correspond to points in the CSS. The aim was to analyze clusterization of protein sequences in the CSS and to test various hypotheses about the possible evolutionary basis of this clusterization. The authors developed an easy-to-use technique which can reveal and analyze such a clusterization in a multidimensional CSS. Application of the technique led to an unexpectedly high clusterization of points in the CSS corresponding to k-tuples from known proteins. The clusterization could not be inferred from nonuniform amino acid frequencies or be explained by the influence of homologous data. None of the tested possible evolutionary and structural factors could explain the clusterization observed either. It looked as if certain protein sequence variations occurred and were fixed in the early course of evolution. Subsequent evolution (predominantly neutral) allowed only a limited number of changes and permitted new variants which led to preservation of certain k-tuples during the course of evolution. This was consistent with the theory of exon shuffling and protein block structure evolution. Possible applications of sequence space features found were also discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号