首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We apply a simple method for aligning protein sequences on the basis of a 3D structure, on a large scale, to the proteins in the scop classification of fold families. This allows us to assess, understand, and improve our automatic method against an objective, manually derived standard, a type of comprehensive evaluation that has not yet been possible for other structural alignment algorithms. Our basic approach directly matches the backbones of two structures, using repeated cycles of dynamic programming and least-squares fitting to determine an alignment minimizing coordinate difference. Because of simplicity, our method can be readily modified to take into account additional features of protein structure such as the orientation of side chains or the location-dependent cost of opening a gap. Our basic method, augmented by such modifications, can find reasonable alignments for all but 1.5% of the known structural similarities in scop, i.e., all but 32 of the 2,107 superfamily pairs. We discuss the specific protein structural features that make these 32 pairs so difficult to align and show how our procedure effectively partitions the relationships in scop into different categories, depending on what aspects of protein structure are involved (e.g., depending on whether or not consideration of side-chain orientation is necessary for proper alignment). We also show how our pairwise alignment procedure can be extended to generate a multiple alignment for a group of related structures. We have compared these alignments in detail with corresponding manual ones culled from the literature. We find good agreement (to within 95% for the core regions), and detailed comparison highlights how particular protein structural features (such as certain strands) are problematical to align, giving somewhat ambiguous results. With these improvements and systematic tests, our procedure should be useful for the development of scop and the future classification of protein folds.  相似文献   

2.
The sequences of related proteins can diverge beyond the point where their relationship can be recognised by pairwise sequence comparisons. In attempts to overcome this limitation, methods have been developed that use as a query, not a single sequence, but sets of related sequences or a representation of the characteristics shared by related sequences. Here we describe an assessment of three of these methods: the SAM-T98 implementation of a hidden Markov model procedure; PSI-BLAST; and the intermediate sequence search (ISS) procedure. We determined the extent to which these procedures can detect evolutionary relationships between the members of the sequence database PDBD40-J. This database, derived from the structural classification of proteins (SCOP), contains the sequences of proteins of known structure whose sequence identities with each other are 40% or less. The evolutionary relationships that exist between those that have low sequence identities were found by the examination of their structural details and, in many cases, their functional features. For nine false positive predictions out of a possible 432,680, i.e. at a false positive rate of about 1/50,000, SAM-T98 found 35% of the true homologous relationships in PDBD40-J, whilst PSI-BLAST found 30% and ISS found 25%. Overall, this is about twice the number of PDBD40-J relations that can be detected by the pairwise comparison procedures FASTA (17%) and GAP-BLAST (15%). For distantly related sequences in PDBD40-J, those pairs whose sequence identity is less than 30%, SAM-T98 and PSI-BLAST detect three times the number of relationships found by the pairwise methods.  相似文献   

3.
CATH--a hierarchic classification of protein domain structures   总被引:1,自引:0,他引:1  
BACKGROUND: Protein evolution gives rise to families of structurally related proteins, within which sequence identities can be extremely low. As a result, structure-based classifications can be effective at identifying unanticipated relationships in known structures and in optimal cases function can also be assigned. The ever increasing number of known protein structures is too large to classify all proteins manually, therefore, automatic methods are needed for fast evaluation of protein structures. RESULTS: We present a semi-automatic procedure for deriving a novel hierarchical classification of protein domain structures (CATH). The four main levels of our classification are protein class (C), architecture (A), topology (T) and homologous superfamily (H). Class is the simplest level, and it essentially describes the secondary structure composition of each domain. In contrast, architecture summarises the shape revealed by the orientations of the secondary structure units, such as barrels and sandwiches. At the topology level, sequential connectivity is considered, such that members of the same architecture might have quite different topologies. When structures belonging to the same T-level have suitably high similarities combined with similar functions, the proteins are assumed to be evolutionarily related and put into the same homologous superfamily. CONCLUSIONS: Analysis of the structural families generated by CATH reveals the prominent features of protein structure space. We find that nearly a third of the homologous superfamilies (H-levels) belong to ten major T-levels, which we call superfolds, and furthermore that nearly two-thirds of these H-levels cluster into nine simple architectures. A database of well-characterised protein structure families, such as CATH, will facilitate the assignment of structure-function/evolution relationships to both known and newly determined protein structures.  相似文献   

4.
We report the latest release (version 1.4) of the CATH protein domains database (http://www.biochem.ucl.ac.uk/bsm/cath). This is a hierarchical classification of 13 359 protein domain structures into evolutionary families and structural groupings. We currently identify 827 homologous families in which the proteins have both structual similarity and sequence and/or functional similarity. These can be further clustered into 593 fold groups and 32 distinct architectures. Using our structural classification and associated data on protein functions, stored in the database (EC identifiers, SWISS-PROT keywords and information from the Enzyme database and literature) we have been able to analyse the correlation between the 3D structure and function. More than 96% of folds in the PDB are associated with a single homologous family. However, within the superfolds, three or more different functions are observed. Considering enzyme functions, more than 95% of clearly homologous families exhibit either single or closely related functions, as demonstrated by the EC identifiers of their relatives. Our analysis supports the view that determining structures, for example as part of a 'structural genomics' initiative, will make a major contribution to interpreting genome data.  相似文献   

5.
Two homologous sequences, which have diverged beyond the point where their homology can be recognised by a simple direct comparison, can be related through a third sequence that is suitably intermediate between the two. High scores, for a sequence match between the first and third sequences and between the second and the third sequences, imply that the first and second sequences are related even though their own match score is low. We have tested the usefulness of this idea using a database that contains the sequences of 971 protein domains whose structures are known and whose residue identities with each other are some 40% or less (PDB40D). On the basis of sequence and structural information, 2143 pairs of these sequences are known to have an evolutionary relationship. FASTA, in an all-against-all comparison of the sequences in the database, detected 320 (15%) of these relationships as well as three false positive (i.e. 1% error rate). Using intermediate sequences found by FASTA matches of PDB40D sequences to those in the large non-redundant OWL database we could detect 550 evolutionary relationships with an error rate of 1%. This means the intermediate sequence procedure increases the ability to recognise the evolutionary relationships amongst the PDB40D sequences by 70%.  相似文献   

6.
In a similar manner to sequence database searching, it is also possible to compare three-dimensional protein structure. Such methods can be extremely useful because a structural similarity may represent a distant evolutionary relationship that is undetectable by sequence analysis. In this review, we summarise the most popular structure comparison methods, show how they can be used for database searching, and then describe some of the most advanced attempts to develop comprehensive protein structure classifications. With such data, it is possible to identify distant evolutionary relationships, provide libraries of unique folds for structure prediction, estimate the total number of folds that exist, and investigate the preference for certain types of structures over others.  相似文献   

7.
As the structural database continues to expand, new methods are required to analyse and compare protein structures. Whereas the recognition, comparison, and classification of folds is now more or less a solved problem, tools for the study of constellations of small numbers of residues are few and far between. In this paper, two programs are described for the analysis of spatial motifs in protein structures. The first, SPASM, can be used to find the occurrence of a motif consisting of arbitrary main-chain and/or side-chains in a database of protein structures. The program also has a unique capability to carry out "fuzzy pattern matching" with relaxed requirements on the types of some or all of the matching residues. The second program, RIGOR, scans a single protein structure for the occurrence of any of a set of pre-defined motifs from a database. In one application, spatial motif recognition combined with profile analysis enabled the assignment of the structural and functional class of an uncharacterised hypothetical protein in the sequence database. In another application, the occurrence of short left-handed helical segments in protein structures was investigated, and such segments were found to be fairly common. Potential applications of the techniques presented here lie in the analysis of (newly determined) structures, in comparative structural analysis, in the design and engineering of novel functional sites, and in the prediction of structure and function of uncharacterised proteins.  相似文献   

8.
A method for protein structure prediction has been developed, which evaluates the compatibility of an amino acid sequence with known 3-dimensional structures and identifies the most likely structure. The method was applied to a large number of sequences in a database, and the structures of the following proteins were predicted: (1) shikimate kinase (SKase), (2) the hydrophilic subunit of mannose permease (IIABMan), (3) rat tyrosine aminotransferase (Tyr AT), and (4) threonine dehydratase (TDH). The functional and evolutionary implications of the predictions are discussed. (1) The structural similarity between SKase and adenylate kinase was predicted. Alignment of their sequences reveals that the ATP-binding type A sequence motif and 2 ATP-binding arginine residues are conserved. The prediction suggests a similarity in their functional mechanisms as well as an evolutionary relationship. (2) The structural similarity between IIABMan and galactose/glucose-binding protein (GGBP) was predicted. The IIA and IIB domains are aligned with the N- and C-terminal domains of GGBP, respectively. The 2 phosphorylated residues, His 10 and His 175, of IIABMan are threaded onto loops located in the substrate-binding cleft of GGBP. The prediction accounts for the phosphoryl transfer from His 10 to His 175, and to the sugar substrate. (3) The structural similarity between rat Tyr AT and Escherichia coli aspartate AT was predicted, as well as (4) the structural similarity between TDH and the tryptophan synthase beta subunit. Predictions (3) and (4) support the previous predictions based on observations of the functional similarities between the proteins.  相似文献   

9.
The three dimensional structures for representatives of nearly half of all protein families are now available in public databases. Thus, no matter which protein one investigates, it is increasingly likely that the 3D structure of a homolog will be known and may reveal unsuspected structure-function relationships. The goal of Entrez's 3D-structure database is to make this information accessible and usable by molecular biologists (http://www.ncbi.nlm.nih.gov/Entrez). To this end Entrez provides two major analysis tools, a search engine based on sequence and structure 'neighboring' and an integrated visualization system for sequence and structure alignments. From a protein's sequence 'neighbors' one may rapidly identify other members of a protein family, including those where 3D structure is known. By comparing aligned sequences and/or structures in detail, using the visualization system, one may identify conserved features and perhaps infer functional properties. Here we describe how these analysis tools may be used to investigate the structure and function of newly discovered proteins, using the PTEN gene product as an example.  相似文献   

10.
The relationships between intersubjectivity and attachment are beginning to be explored within the psychoanalytic and developmental literature. We contribute to this comparative effort by exploring the different evolutionary origins of attachment and intersubjectivity. Five interlocking themes are central to this article. First, from an evolutionary perspective, attachment and intersubjectivity serve different functions. The main function of attachment is to seek protection, whereas the main function of intersubjectivity is to communicate, at intuitive and automatic levels, with members of the same species and to facilitate social understanding. Second, to survive in changing and highly competitive environments, an evolutionary strategy emerged among our human ancestors based on developing high levels of cooperation within small bands of hunters and gatherers. In turn, high levels of cooperation and social complexity put selective pressures toward developing effective modes of communication and more complex forms of social understanding (mindreading/mentalizing/intersubjective abilities). These abilities far surpass mindreading abilities among our closest Great Ape relatives. Third, we provide further evidence for this hypothesis showing that in comparison with other Great Apes, young children show qualitatively different levels of collaboration and altruism. Fourth, we provide an overview of the development of attachment and intersubjective abilities during the first 2 years of life that support the hypothesis of a cooperative origin of intersubjectivity. Fifth, we return to the main theme of this article showing three ways in which attachment and intersubjective abilities can be distinguished. We conclude by exploring some clinical implications of this cooperative–intersubjective model of human development. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

11.
Three short protein sequences have been guided by computer to folds resembling their crystal structures. Initially, peptide fragment conformations ranging in size from 9 to 25 residues were selected from a database of known protein structures. A fragment was selected if it was compatible with a segment of the sequence to be folded, as judged by three-dimensional profile scores. By linking the selected fragment conformations together, hundreds of trial structures were generated of the same length and sequence as the protein to be folded. These starting trial structures were then improved by an evolutionary algorithm. Selection pressure for improving the structures was provided by an energy function that was designed to guide the conformational search procedure toward the correct structure. We find that by evolution of only 400 structures for fewer than 1400 generations, the overall fold of some small helical proteins can be computed from the sequence, with deviations from observed structures of 2.5-4.0 A for C alpha atoms.  相似文献   

12.
Recently, sex differences in the structures of the human hypothalamus and adjacent brain structures have been observed that seem to be related to gender, to gender problems such as transsexuality, and to sexual orientation, that is, heterosexuality and homosexuality. Although these observations have yet to be confirmed, and their exact functional implications are far from clear, they open up a whole new field of physiological structural-functional relationships in human brain research that has so far focused mainly on such relationships in pathology.  相似文献   

13.
ProClass is a protein family database that organizes non-redundant sequence entries into families defined collectively by PROSITE patterns and PIR superfamilies. By combining global similarities and functional motifs into a single classification scheme, ProClass helps to reveal domain and family relationships and classify multi-domain proteins. The database currently consists of more than 120 000 sequence entries, approximately 60% of which is classified into about 3500 families. To maximize family information retrieval, the database provides links to various protein family/domain and structural class databases and contains multiple motif alignments of all PROSITE patterns as well as global alignments of PIR superfamilies. The motif sequences are retrieved from both PIR-International and SWISS-PROT databases, including a large number of new members detected by our GeneFIND family identification system. ProClass can be used to support full-scale genomic annotation, because of its high classification rate. The ProClass database is available for on-line search and record retrieval from our WWW server at http://diana.uthct.edu/proclass.html  相似文献   

14.
Aminoacyl-tRNA synthetases (AARSs) are the key components of the protein biosynthesis machinery. They are responsible for maintaining the fidelity of transfer of genetic information from DNA into protein. The database is a compilation of amino acid sequences of all aminoacyl-tRNA synthetases known to date. It contains 422 primary structures of the AARSs available as separate entries or alignments of related proteins. The database is available via the World Wide Web at http://rose.man.poznan.pl/aars/index.html  相似文献   

15.
The parasitic bacterium Mycoplasma genitalium has a small, reduced genome with close to a basic set of genes. As a first step toward determining the families of protein domains that form the products of these genes, we have used the multiple sequence programs PSI-BLAST and GEANFAMMER to match the sequences of the 467 gene products of M. genitalium to the sequences of the domains that form proteins of known structure [Protein Data Bank (PDB) sequences]. PDB sequences (274) match all of 106 M. genitalium sequences and some parts of another 85; thus, 41% of its total sequences are matched in all or part. The evolutionary relationships of the PDB domains that match M. genitalium are described in the structural classification of proteins (SCOP) database. Using this information, we show that the domains in the matched M. genitalium sequences come from 114 superfamilies and that 58% of them have arisen by gene duplication. This level of duplication is more than twice that found by using pairwise sequence comparisons. The PDB domain matches also describe the domain structure of the matched sequences: just over a quarter contain one domain and the rest have combinations of two or more domains.  相似文献   

16.
17.
Psychology from the standpoint of a generalist.   总被引:1,自引:0,他引:1  
Describes the tenets of a liberalized scientific psychology. Such a science is empirical, deterministic, and analytic. Psychology is the science of behavior. Mentalistic concepts are inferences from behavior, and they play a centrally important role. Intuition, common sense, and personal experience provide hypotheses for this science. The elementist–holist controversy disappears with the understanding that the wholes of science differ at different levels of analysis. Free will can be brought within the scope of determinism. Overt behavior is the product of potentials laid down by nature–nurture interactions and conditions of the moment. Behavior is so complexly determined that individual uniqueness is an expected consequence. In this scheme of things, scientific values control the science of psychology, and humanistic values control the actions of the psychologists who create this science and apply it. Over the years, the process of change in psychology has been evolutionary rather than revolutionary. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

18.
19.
Structural classifications aid the interpretation of proteins by describing degrees of structural and evolutionary relatedness. They have also recently revealed strikingly skewed distributions at all levels; for example, a small number of folds are far more common than others, and just a few superfamilies are known to have diverged widely. The classifications also provide an indication of the total number of superfamilies in nature.  相似文献   

20.
The present paper describes AMmtDB, a database collecting the multi-aligned sequences of vertebrate mitochondrial genes coding for proteins and tRNAs, as well as the multiple alignment of the mammalian mtDNA main regulatory region (D-loop) sequences. The genes coding for proteins are multi-aligned based on the translated sequences and both the nucleotide and amino acid multi-alignments are provided. As far as the genes coding for tRNAs are concerned, the multi-alignments based on the primary and the secondary structures are both provided; for the mammalian D-loop multi-alignments we report the conserved regions of the entire D-loop (CSB1, CSB2, CSB3, the central region, ETAS1 and ETAS2) as defined by Sbisà et al. [ Gene (1997), 205, 125-140). A flatfile format for AMmtDB has been designed allowing its implementation in SRS (http://bio-www.ba.cnr.it:8000/BioWWW/#AMMTDB ). Data selected through SRS can be managed using GeneDoc or other programs for the management of multi-aligned data depending on the user's operative system. The multiple alignments have been produced with CLUSTALV and PILEUP programs and then carefully optimized manually.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号