首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Most non-communicable diseases are associated with dysfunction of proteins or protein complexes. The relationship between sequence and structure has been analyzed for a long time, and the analysis of the sequences organization in domains and motifs remains an actual research area. Here, we propose a mathematical method for revealing the hierarchical organization of protein sequences. The method is based on the pentapeptide as a unit of protein sequences. Employing the frequency of occurrence of pentapeptides in sequences of natural proteins and a special mathematical approach, this method revealed a hierarchical structure in the protein sequence. The method was applied to 24,647 non-homologous protein sequences with sizes ranging from 50 to 400 residues from the NRDB90 database. Statistical analysis of the branching points of the graphs revealed 11 characteristic values of y (the width of the inscribed function), showing the relationship of these multiple fragments of the sequences. Several examples illustrate how fragments of the protein spatial structure correspond to the elements of the hierarchical structure of the protein sequence. This methodology provides a promising basis for a mathematically-based classification of the elements of the spatial organization of proteins. Elements of the hierarchical structure of different levels of the hierarchy can be used to solve biotechnological and medical problems.  相似文献   

2.
Okadaic acid (OA) is a marine polyether cytotoxin that was first isolated from the marine sponge Halichondria okadai. OA is a potent inhibitor of protein serine/threonine phosphatases (PP) 1 and 2A, and the structural basis of phosphatase inhibition has been well investigated. However, the role and mechanism of OA retention in the marine sponge have remained elusive. We have solved the crystal structure of okadaic acid binding protein 2.1 (OABP2.1) isolated from H. okadai; it has strong affinity for OA and limited sequence homology to other proteins. The structure revealed that OABP2.1 consists of two α‐helical domains, with the OA molecule deeply buried inside the protein. In addition, the global fold of OABP2.1 was unexpectedly similar to that of aequorin, a jellyfish photoprotein. The presence of structural homologues suggested that, by using similar protein scaffolds, marine invertebrates have developed diverse survival systems adapted to their living environments.  相似文献   

3.
4.
Homology modelling of the human eIF-5A protein has been performedby using a multiple predictions strategy. As the sequence identitybetween the target and the template proteins is nearly 30%,which is lower than the commonly used threshold to apply withconfidence the homology modelling method, we developed a specificpredictive scheme by combining different sequence analyses andpredictions, as well as model validation by comparison to structuralexperimental information. The target sequence has been usedto find homologues within sequence databases and a multiplealignment has been created. Secondary structure for each singleprotein has been predicted and compared on the basis of themultiple sequence alignment, in order to evaluate and adjustcarefully any gap. Therefore, comparative modelling has beenapplied to create the model of the protein on the basis of theoptimized sequence alignment. The quality of the model has beenchecked by computational methods and the structural featureshave been compared to experimental information, giving us agood validation of the reliability of the model and its correspondenceto the protein structure in solution. Last, the model was depositedin the Protein Data Bank to be accessible for studies on thestructure–function relationships of the human eIF-5A.  相似文献   

5.
A predicted three-dimensional structure of the two N-terminalextracellular domains of human CD4 antigen, a cell surface glycoprotein,is reported. This region of CD4, particularly the first domain,has been identified as containing the binding region for theenvelope gp120 protein of the human immuno-deficiency virus.The model was predicted based on the sequence homology of eachdomain with the variable light chain of immunoglobulins. Theframework ß-sheet regions were taken from the crystalcoordinates of REI. For one region in the first domain of CD4there was an ambiguity in the alignment with REI and two alternatemodels are presented. Loops connecting the framework were modeledfrom fragments selected from a database of main chain coordinatesfrom all known protein structures. Residues identified as involvedin binding gp120 have been located in several other studieswithin the first domain of CD4. Epitopes from eight monoclonalantibodies have been mapped onto residues in both domains. Competitionof these antibodies with each other and with gp120 can be interpretedfrom the structural model.  相似文献   

6.
Restriction enzymes (REases) are commercial reagents commonly used in DNA manipulations and mapping. They are regarded as very attractive models for studying protein-DNA interactions and valuable targets for protein engineering. Their amino acid sequences usually show no similarities to other proteins, with rare exceptions of other REases that recognize identical or very similar sequences. Hence, they are extremely hard targets for structure prediction and modeling. NlaIV is a Type II REase, which recognizes the interrupted palindromic sequence GGNNCC (where N indicates any base) and cleaves it in the middle, leaving blunt ends. NlaIV shows no sequence similarity to other proteins and virtually nothing is known about its sequence-structure-function relationships. Using protein fold recognition, we identified a remote relationship between NlaIV and EcoRV, an extensively studied REase, which recognizes the GATATC sequence and whose crystal structure has been determined. Using the 'FRankenstein's monster' approach we constructed a comparative model of NlaIV based on the EcoRV template and used it to predict the catalytic and DNA-binding residues. The model was validated by site-directed mutagenesis and analysis of the activity of the mutants in vivo and in vitro as well as structural characterization of the wild-type enzyme and two mutants by circular dichroism spectroscopy. The structural model of the NlaIV-DNA complex suggests regions of the protein sequence that may interact with the 'non-specific' bases of the target and thus it provides insight into the evolution of sequence specificity in restriction enzymes and may help engineer REases with novel specificities. Before this analysis was carried out, neither the three-dimensional fold of NlaIV, its evolutionary relationships or its catalytic or DNA-binding residues were known. Hence our analysis may be regarded as a paradigm for studies aiming at reducing 'white spaces' on the evolutionary landscape of sequence-function relationships by combining bioinformatics with simple experimental assays.  相似文献   

7.
8.
Membrane fusion is essential for many biological processes. Though there have been many structure and fusion studies of cellular and viral fusion proteins in the last years, their functional mechanism remains elusive. In particular, the structural modes of operation of the transmembrane domains and viral fusion peptides of fusion proteins during membrane fusion have not been elucidated, although work on de novo designed fusogenic peptides suggested that conformational flexibility was necessary. In addition, the use of different and incompatible measurement criteria has made a comparative overview difficult. Here, we report a systematic structural analysis of viral fusion peptides from different fusion protein classes and transmembrane domains of viral and cellular fusion proteins by using circular dichroism spectroscopy. The data that were obtained demonstrate that class I viral fusion peptides show a structural flexibility between helix and irregular secondary structures, whereas fusion peptides of class II viral fusion proteins are characterized by a stable random coil and turn structure. Thus, conformational flexibility does not seem to be a universal criterion for the fusion activity of a fusion peptide. On the contrary, the transmembrane domains of fusion proteins are distinguished by a structural flexibility between helix and sheet structure that is similar to de novo designed unnatural peptides with high fusion activities (M. W. Hofmann et al. PNAS 2004, 101, 14 776-14 781). Thus, the conformational behavior of the fusogenic unnatural peptides most closely resembles that of fusion protein transmembrane domains, and allows them to be used to gain a deeper understanding of the membrane fusion process.  相似文献   

9.
The nacreous layer of molluscan shells consists of a highly organised, layered structure comprising calcium carbonate aragonite crystals, each surrounded by an organic matrix. In the Japanese pearl oyster Pinctada fucata, the Pif protein from the nacreous layer functions in aragonite binding, and plays a key role in nacre formation. Here, we investigated whether the blue mussel Mytilus galloprovincialis also has a protein with similar functions in the nacreous layer. By using a calcium carbonate-binding assay, we identified the novel protein blue mussel shell protein (BMSP) 100 that can bind calcium carbonate crystals of both aragonite and calcite. When the entire sequence of a cDNA encoding BMSP 100 was determined, it was found that BMSP is a preproprotein consisting of a signal peptide and two proteins, BMSP 120 and BMSP 100. BMSP 120 contains four von Willebrand factor A (VWA) domains and one chitin-binding domain, thus suggesting that it has a role in maintaining structure within the matrix. Immunohistochemical analysis revealed that BMSP 100 is present throughout the nacreous layer with dense localisation in the myostracum. Posttranslational modification analysis indicated that BMSP 100 is phosphorylated and glycosylated. These results suggest that there is a common molecular mechanism between P. fucata and M. galloprovincialis that underlies the nacreous layer formation.  相似文献   

10.
11.
Modelling protein unfolding: hen egg-white lysozyme   总被引:1,自引:0,他引:1  
A novel modelling procedure, which rapidly unfolds a protein by enhancing solvent penetration of its core, was used to investigate the unfolding pathway of hen egg-white lysozyme. Early on the unfolding pathway there is a dramatic disruption of the tertiary contacts within the protein, which decouples its domains. Subsequently, the helical domain slowly loses its compactness and the helices fluctuate rapidly. The protein then adopts a 'molten globule-like' structure in which the native beta-sheet is essentially intact. The modelled structures have properties similar to those of lysozyme's experimentally characterized partially folded states and provide insight into its complex (un)folding process. The sequence of unfolding events shows how the unfolding pathway of a multidomain protein may be most similar to its fastest, but not necessarily its dominant, folding pathway.   相似文献   

12.
Background: For decades, the rate of solving new biomolecular structures has been exceeding that at which their manual classification and feature characterisation can be carried out efficiently. Therefore, a new comprehensive and holistic tool for their examination is needed. Methods: Here we propose the Biological Sequence and Structure Network (BioS2Net), which is a novel deep neural network architecture that extracts both sequential and structural information of biomolecules. Our architecture consists of four main parts: (i) a sequence convolutional extractor, (ii) a 3D structure extractor, (iii) a 3D structure-aware sequence temporal network, as well as (iv) a fusion and classification network. Results: We have evaluated our approach using two protein fold classification datasets. BioS2Net achieved a 95.4% mean class accuracy on the eDD dataset and a 76% mean class accuracy on the F184 dataset. The accuracy of BioS2Net obtained on the eDD dataset was comparable to results achieved by previously published methods, confirming that the algorithm described in this article is a top-class solution for protein fold recognition. Conclusions: BioS2Net is a novel tool for the holistic examination of biomolecules of known structure and sequence. It is a reliable tool for protein analysis and their unified representation as feature vectors.  相似文献   

13.
The glycoprotein P-selectin belongs to the selectin family of cell adhesion molecules. In this study, we cloned the full-length cDNA of P-selectin from zebrafish (Danio rerio) by the method of rapid amplification of cDNA ends polymerase chain reaction (RACE-PCR). Zebrafish P-selectin cDNA is 2,800 bp and encodes a putative 868 amino acid protein with a theoretical molecular weight of 122.36 kDa and isoelectric point of 6.27. A signal peptide of 25 amino acids is predicted at the N-terminus of the putative protein. All structural domains involved in P-selectin function are conserved in the putative protein. The amino acid sequence of zebrafish P-selectin is 37% to 39% identical to that of mammalian P-selectins. Real-time quantitative PCR and whole-mount in situ hybridization analysis revealed that P-selectin was expressed in early embryonic development, the expression increased from 0.2 hpf (1-cell stage) to 72 hpf, and the expression significantly upregulated within 30 minutes of ADP induction. The results indicate that the structure of P-selectin protein is highly conserved among species and zebrafish P-selectin plays an important role in early embryonic development and probably has similar biological function to mammalian P-selectins.  相似文献   

14.
Huntington’s disease is a rare neurodegenerative and autosomal dominant disorder. HD is caused by a mutation in the gene coding for huntingtin (Htt). The result is the production of a mutant Htt with an abnormally long polyglutamine repeat that leads to pathological Htt aggregates. Although the structure of human Htt has been determined, albeit at low resolution, its functions and how they are performed are largely unknown. Moreover, there is little information on the structure and function of Htt in other organisms. The comparison of Htt homologs can help to understand if there is a functional conservation of domains in the evolution of Htt in eukaryotes. In this work, through a computational approach, Htt homologs from lower eukaryotes have been analysed, identifying ordered domains and modelling their structure. Based on the structural models, a putative function for most of the domains has been predicted. A putative C. elegans Htt-like protein has also been analysed following the same approach. The results obtained support the notion that this protein is a orthologue of human Htt.  相似文献   

15.
Intrinsically-disordered regions lack a well-defined 3D structure, but play key roles in determining the function of many proteins. Although predictors of disorder have been shown to achieve relatively high rates of correct classification of these segments, improvements over the the years have been slow, and accurate methods are needed that are capable of accommodating the ever-increasing amount of structurally-determined protein sequences to try to boost predictive performances. In this paper, we propose a predictor for short disordered regions based on bidirectional recurrent neural networks and tested by rigorous five-fold cross-validation on a large, non-redundant dataset collected from MobiDB, a new comprehensive source of protein disorder annotations. The system exploits sequence and structural information in the forms of frequency profiles, predicted secondary structure and solvent accessibility and direct disorder annotations from homologous protein structures (templates) deposited in the Protein Data Bank. The contributions of sequence, structure and homology information result in large improvements in predictive accuracy. Additionally, the large scale of the training set leads to low false positive rates, making our systems a robust and efficient way to address high-throughput disorder prediction.  相似文献   

16.
Tens of thousands of terpenoids are present in both terrestrial and marine plants, as well as fungi. In the last 5-10 years, however, it has become evident that terpenes are also produced by numerous bacteria, especially soil-dwelling Gram-positive organisms such as Streptomyces and other Actinomycetes. Although some microbial terpenes, such as geosmin, the degraded sesquiterpene responsible for the smell of moist soil, the characteristic odor of the earth itself, have been known for over 100 years, few terpenoids have been identified by classical structure- or activity-guided screening of bacterial culture extracts. In fact, the majority of cyclic terpenes from bacterial species have only recently been uncovered by the newly developed techniques of "genome mining". In this new paradigm for biochemical discovery, bacterial genome sequences are first analyzed with powerful bioinformatic tools, such as the BLASTP program or Profile Hidden Markov models, to screen for and identify conserved protein sequences harboring a characteristic set of universally conserved functional domains typical of all terpene synthases. Of particular importance is the presence of variants of two universally conserved domains, the aspartate-rich DDXX(D/E) motif and the NSE/DTE triad, (N/D)DXX(S/T)XX(K/R)(D/E). Both domains have been implicated in the binding of the essential divalent cation, typically Mg(2+), that is required for cyclization of the universal acyclic terpene precursors, such as farnesyl and geranyl diphosphate. The low level of overall sequence similarity among terpene synthases, however, has so far precluded any simple correlation of protein sequence with the structure of the cyclized terpene product. The actual biochemical function of a cryptic bacterial (or indeed any) terpene synthase must therefore be determined by direct experiment. Two common approaches are (i) incubation of the expressed recombinant protein with acyclic allylic diphosphate substrates and identification of the resultant terpene hydrocarbon or alcohol and (ii) in vivo expression in engineered bacterial hosts that can support the production of terpene metabolites. One of the most attractive features of the coordinated application of genome mining and biochemical characterization is that the discovery of natural products is directly coupled to the simultaneous discovery and exploitation of the responsible biosynthetic genes and enzymes. Bacterial genome mining has proved highly rewarding scientifically, already uncovering more than a dozen newly identified cyclic terpenes (many of them unique to bacteria), as well as several novel cyclization mechanisms. Moreover, bioinformatic analysis has identified more than 120 presumptive genes for bacterial terpene synthases that are now ripe for exploration. In this Account, we review a particularly rich vein we have mined in the genomes of two model Actinomycetes, Streptomyces coelicolor and Streptomyces avermitilis, from which the entire set of terpenoid biosynthetic genes and pathways have now been elucidated. In addition, studies of terpenoid biosynthetic gene clusters have revealed a wealth of previously unknown oxidative enzymes, including cytochromes P450, non-heme iron-dependent dioxygenases, and flavin monooxygenases. We have shown that these enzymes catalyze a variety of unusual biochemical reactions, including two-step ketonization of methylene groups, desaturation-epoxidation of secondary methyl groups, and pathway-specific Baeyer-Villiger oxidations of cyclic ketones.  相似文献   

17.
A new approach has been developed to reduce multiple proteinstructures obtained from NMR structure analysis to a smallernumber of representative structures which still reflect thestructural diversity of the data sets. The method, based onthe clustering of similar structures, has been tested in thehomology model building of the structure of Sox-5, a sequence-specificDNA-binding protein belonging to the high mobility group (HMG)nuclear proteins family. Sox (SRY box) genes are the autosomalgenes related to the sex-determining SRY, Y chromosomal gene.The Sox-5 protein, encoded by one of the SRY-related genes,displays a 29% sequence identity with the HMG1 B-box domainwhose structure, determined previously by NMR, has been usedin our study to predict the structure of Sox-5. Two independentensembles of HMG1 structures, each represented by closely relatedcoordinate sets, were used. Nine representative structures forHMG1 were subsequently selected as starting points for the modellingof Sox-5. The model of the protein shows close similarity tothe HMG1 fold, with differences at the secondary structure levellocated mainly in a-helices 1 and 3. A left-handed, three residueper turn polyproline II helix, forming a conserved polyprolineII/-helix supersecondary motif, was identified in the N-terminalregion of Sox-5 and other HMG boxes.  相似文献   

18.
[Structure: see text]. FHA domains are protein modules that switch signals in diverse biological pathways by monitoring the phosphorylation of threonine residues of target proteins. As part of the effort to gain insight into cellular avoidance of cancer, FHA domains involved in the cellular response to DNA damage have been especially well-characterized. The complete protein where the FHA domain resides and the interaction partners determine the nature of the signaling. Thus, a key biochemical question is how do FHA domains pick out their partners from among thousands of alternatives in the cell? This Account discusses the structure, affinity, and specificity of FHA domains and the formation of their functional structure. Although FHA domains share sequence identity at only five loop residues, they all fold into a beta-sandwich of two beta-sheets. The conserved arginine and serine of the recognition loops recognize the phosphorylation of the threonine targeted. Side chains emanating from loops that join beta-strand 4 with 5, 6 with 7, or 10 with 11 make specific contacts with amino acids of the ligand that tailor sequence preferences. Many FHA domains choose a partner in extended conformation, somewhat according to the residue three after the phosphothreonine in sequence (pT + 3 position). One group of FHA domains chooses a short carboxylate-containing side chain at pT + 3. Another group chooses a long, branched aliphatic side chain. A third group prefers other hydrophobic or uncharged polar side chains at pT + 3. However, another FHA domain instead chooses on the basis of pT - 2, pT - 3, and pT + 1 positions. An FHA domain from a marker of human cancer instead chooses a much longer protein fragment that adds a beta-strand to its beta-sheet and that presents hydrophobic residues from a novel helix to the usual recognition surface. This novel recognition site and more remote sites for the binding of other types of protein partners were predicted for the entire family of FHA domains by a bioinformatics approach. The phosphopeptide-dependent dynamics of an FHA domain, SH2 domain, and PTB domain suggest a common theme: rigid, preformed binding surfaces support van der Waals contacts that provide favorable binding enthalpy. Despite the lack of pronounced conformational changes in FHA domains linked to binding events, more subtle adjustments may be possible. In the one FHA domain tested, phosphothreonine peptide binding is accompanied by increased flexibility just outside the binding site and increased rigidity across the beta-sandwich. The folding of the same FHA domain progresses through near-native intermediates that stabilize the recognition loops in the center of the phosphoprotein-binding surface; this may promote rigidity in the interface and affinity for targets phosphorylated on threonine.  相似文献   

19.
A huge quantity of gene and protein sequences have become available during the post-genomic era, and information about genetic variations, including amino acid substitutions and SNPs, is accumulating rapidly. To understand the effects of these changes, it is often essential to apply bioinformatics tools. Where there is a lack of homologous sequences or a three-dimensional structure, it becomes essential to predict the effects of mutations based solely on protein sequence information. Several computational methods utilizing machine learning techniques have been developed. These predictions generally use the 20-alphabet amino acid code to train the model. With limited available data, the 20-alphabet amino acid features may introduce so many parameters that the model becomes over-fitted. To decrease the number of parameters, we propose a physicochemical feature-based method to forecast the effects of amino acid substitutions on protein stability. Protein structure alterations caused by mutations can be classified as stabilizing or destabilizing. Based on experimental folding-unfolding free energy (DeltaDeltaG) values, we trained a support vector machine with a cleaned data set. The physicochemical properties of the mutated residues, the number of neighboring residues in the primary sequence and the temperature and pH were used as input attributes. Different kernel functions, attributes and window sizes were optimized. An average accuracy of 80% was obtained in cross-validation experiments.  相似文献   

20.
Binary discontinuous compact protein domains   总被引:6,自引:0,他引:6  
Few methods exist that identify discontinuous protein domainscontaining more than one polypeptide chain. This paper describesa new method for locating such discontinuous domains based ontheir compactness, and applies the methodology to locate themost compact domains in bovine pancreatic trypsin inhibitor,ribonuclease, cytochrome c and myoglobin. The compactness ofall binary discontinuous peptide combinations is first exhaustivelyevaluated. Several screening steps are then used to locate thosecompact units that represent global minima of compactness. Sincedomains are generally taken to be large, mutually exclusivestructures that span most of the protein's sequence, compactdomains were found by examining all compact units (both continuousand discontinuous) to locate two or three units that span mostof the protein's sequence, have little mutual overlap and goodoverall compactness. Compact domains compare well with domainsfound by other methods and with experimental evidence that maydifferentiate domain structure. The strongest experimental evidencefor the existence of compact discontinuous domains comes fromthe work of Oas and Kim [(1988) Nature, 336, 42–48] wherea peptide that corresponds almost exactly to a compact domainhas been synthesized and shown to have native-like structurein solution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号