首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We examine in this paper one of the expected consequences of the hypothesis that modern proteins evolved from random heteropeptide sequences. Specifically, we investigate the lengthwise distributions of amino acids in a set of 1,789 protein sequences with little sequence identify using the run test statistic (ro) of Mood (1940, Ann. Math. Stat. 11, 367-392). The probability density of ro for a collection of random sequences has mean = 0 and variance = 1 [the N(0,1) distribution] and can be used to measure the tendency of amino acids of a given type to cluster together in a sequence relative to that of a random sequence. We implement the run test using binary representations of protein sequences in which the amino acids of interest are assigned a value of 1 and all others a value of 0. We consider individual amino acids and sets of various combinations of them based upon hydrophobicity (4 sets), charge (3 sets), volume (4 sets), and secondary structure propensity (3 sets). We find that any sequence chosen randomly has a 90% or greater chance of having a lengthwise distribution of amino acids that is indistinguishable from the random expectation regardless of amino acid type. We regard this as strong support for the random-origin hypothesis. However, we do observe significant deviations from the random expectation as might be expected after billions years of evolution. Two important global trends are found: (1) Amino acids with a strong alpha-helix propensity show a strong tendency to cluster whereas those with beta-sheet or reverse-turn propensity do not. (2) Clustered rather than evenly distributed patterns tend to be preferred by the individual amino acids and this is particularly so for methionine. Finally, we consider the problem of reconciling the random nature of protein sequences with structurally meaningful periodic "patterns" that can be detected by sliding-window, autocorrelation, and Fourier analyses. Two examples, rhodopsin and bacteriorhodopsin, show that such patterns are a natural feature of random sequences.  相似文献   

2.
3.
A combination of Edman sequence analysis and mass spectrometry identified the major proteins of the young human lens as alphaA, alphaB, betaA1, betaA3, betaA4, betaB1, betaB2, betaB3, gammaS, gammaC, and gammaD-crystallins and mapped their positions on two-dimensional electrophoretic gels. The primary structures of human betaA1, betaA3, betaA4, and betaB3-crystallin subunits were predicted by determining cDNA sequences. Mass spectrometric analyses of each intact protein as well as the peptides from trypsin-digested proteins confirmed the predicted amino acid sequences and detected a partially degraded form of betaA3/A1 missing either 22 or 4 amino acid residues from its N-terminal extension. These studies were a prerequisite for future studies to determine how human lens proteins are altered during aging and cataract formation.  相似文献   

4.
sigma E and sigma K are sporulation-specific sigma factors of Bacillus subtilis that are synthesized as inactive proproteins. Pro-sigma E and pro-sigma K are activated by the removal of 27 and 20 amino acids, respectively, from their amino termini. To explore the properties of the precursor-specific sequences, we exchanged the coding elements for these domains in the sigma E and sigma K structural genes and determined the properties of the resulting chimeric proteins in B. subtilis. The pro-sigma E-sigma K chimera accumulated and was cleaved into active sigma K, while the pro-sigma K-sigma E fusion protein failed to accumulate and is likely unstable in B. subtilis. A fusion of the sigE "pro" sequence to an unrelated protein (bovine rhodanese) also formed a protein that was cleaved by the pro-sigma E processing apparatus. The data suggest that the sigma E pro sequence contains sufficient information for pro-sigma E processing as well as a unique quality needed for sigma E accumulation.  相似文献   

5.
High-resolution two-dimensional (2-D) polyacrylamide gel electrophoresis allows the separation of complex biological mixtures (i.e., several hundred proteins from a bacterial cell lysate) in a single experiment. In this report proteins from Haemophilus influenzae were separated by 2-D gels and analyzed by peptide mass fingerprinting and/or amino acid analysis. By comparing the peptide mass profiles and the amino acid composition with the Haemophilus influenzae database, 119 protein spots were identified. The combination of amino acid analysis and peptide mass fingerprinting is a powerful tool for a rapid and economical identification of a large number of proteins resolved by 2-D gels. Studies on gene regulation and changes of protein expression upon drug treatment require quick and serial analysis techniques to efficiently identify potential new drug targets.  相似文献   

6.
Separation and identification of proteins by two-dimensional (2-D) electrophoresis can be used for protein-based gene expression analysis. In this report single protein spots, from polyvinylidene difluoride blots of micropreparative E. coli 2-D gels, were rapidly and economically identified by matching their amino acid composition, estimated pI and molecular weight against all E. coli entries in the SWISS-PROT database. Thirty proteins from an E. coli 2-D map were analyzed and identities assigned. Three of the proteins were unknown. By protein sequencing analysis, 20 of the 27 proteins were correctly identified. Importantly, correct identifications showed unambiguous "correct" score patterns. While incorrect protein identifications also showed distinctive score patterns, indicating that protein must be identified by other means. These techniques allow large-scale screening of the protein complement of simple organisms, or tissues in normal and disease states. The computer program described here is accessible via the World Wide Web at URL address (http:@expasy.hcuge.ch/).  相似文献   

7.
MOTIVATION: Lacking structures resolved at atomic resolution, the great majority of membrane proteins have typically been depicted in a schematic two-dimensional (2D) topology consisting of putative transmembrane domains predicted from hydropathy plots. As more and more sequences of membrane proteins become available from genome projects, there is a need to automate the process of generating the schematic topology while allowing important information, such as the individual amino acid and the extent to which it is conserved in evolution, to be conveniently inspected. We addressed this need by developing a program called VHMPT. RESULTS: VHMPT (a graphical V iewer and editor for H elical line M embrane P rotein T opologies) can automatically generate a schematic 2D topology for a protein with transmembrane helices. Through an interactive graphical interface, VHMPT allows users to modify the layout of the generated topology, label specific amino acid or amino acid groups, and annotate with arrows and texts. Given a multiple sequence alignment file, VHMPT can also color code a normalized conservation score for each amino acid on the generated topology, allowing ready visual recognition of highly conserved (or variable) topological regions. VHMPT is written in Tcl/Tk and can run on platforms that have installed the Tcl/Tk interpreter. AVAILABILITY: The source code and a user manual for VHMPT are available for download at http://www. ibms.sinica.edu.tw/mjhwang/vhmpt. CONTACT: mjhwang@mail.ibms.sinica.edu.tw  相似文献   

8.
The nucleotide sequence of a full-length cDNA encoding NAD(+)-malic enzyme from the parasitic nematode Ascaris suum was determined. The entire sequence of 2269 bases comprises a 5'-leader, a single open reading frame of 1851 bases, and the complete 3'-noncoding region of 340 bases. The first 12 amino acids of the translated sequence are hydrophobic, typical of mitochondrial translocation signals, and do not appear in the purified mature protein. The mature protein contains 605 amino acids and has a molecular mass of 68,478 Da. The amino acid sequences of tryptic peptides from the purified protein and also the N-terminal sequence show excellent correspondence with the translated nucleotide sequence. Comparison of the amino acid sequence of the ascarid protein with the human and rat liver NAD(+)-malic enzymes reveals highly conserved regions interrupted with long stretches of lesser homologous sequences. Structural motifs such as the putative nucleotide binding domains and also the malate binding site are clearly identified by alignment of the three protein sequences.  相似文献   

9.
Anchoring proteins to cell surface membranes by glycosylphosphatidylinositols (GPIs) is important. We have isolated a component of the putative transamidase machinery, hGaa1p (human GPI anchor attachment protein). hGAA1 cDNA is approximately 2 kb in length and codes 621 amino acids. The amino acid sequence of hGaa1p is 25%, identical and 57% homologous to that of yeast Gaa1p. Moreover, Kite-Dolittle hydrophobicity plots of both proteins show marked similarity. hGAA1 gene is expressed ubiquitously and mRNA levels are higher in the undifferentiated state. Overexpression of antisense hGAA1 in human K562 cells significantly reduced the production of a reporter GPI-anchored protein.  相似文献   

10.
The capped RNA primers required for the initiation of influenza virus mRNA synthesis are produced by the viral polymerase itself, which consists of three proteins PB1, PB2 and PA. Production of primers is activated only when the 5'- and 3'-terminal sequences of virion RNA (vRNA) bind sequentially to the polymerase, indicating that vRNA molecules function not only as templates for mRNA synthesis but also as essential cofactors which activate catalytic functions. Using thio U-substituted RNA and UV crosslinking, we demonstrate that the 5' and 3' sequences of vRNA bind to different amino acid sequences in the same protein subunit, the PB1 protein. Mutagenesis experiments proved that these two amino acid sequences constitute the functional RNA-binding sites. The 5' sequence of vRNA binds to an amino acid sequence centered around two arginine residues at positions 571 and 572, causing an allosteric alteration which activates two new functions of the polymerase complex. In addition to the PB2 protein subunit acquiring the ability to bind 5'-capped ends of RNAs, the PB1 protein itself acquires the ability to bind the 3' sequence of vRNA, via a ribonucleoprotein 1 (RNP1)-like motif, amino acids 249-256, which contains two phenylalanine residues required for binding. Binding to this site induces a second allosteric alteration which results in the activation of the endonuclease that produces the capped RNA primers needed for mRNA synthesis. Hence, the PB1 protein plays a central role in the catalytic activity of the viral polymerase, not only in the catalysis of RNA-chain elongation but also in the activation of the enzyme activities that produce capped RNA primers.  相似文献   

11.
Data on the identification of proteins of Bacillus subtilis on two-dimensional (2-D) gels as well as their regulation are summarized and the identification of 56 protein spots is included. The pattern of proteins synthesized in Bacillus subtilis during exponential growth, during starvation for glucose or phosphate, or after the imposition of stresses like heat shock, salt- and ethanol stress as well as oxidative stress was analyzed. N-terminal sequencing of protein spots allowed the identification of 93 proteins on 2-D gels, which are required for the synthesis of amino acids and nucleotides, the generation of ATP, for glycolyses, the pentose phosphate cycle, the citric acid cycle as well as for adaptation to a variety of stress conditions. A computer-aided analysis of the 2-D gels was used to monitor the synthesis profile of more than 130 protein spots. Proteins performing housekeeping functions during exponential growth displayed a reduced synthesis rate during stress and starvation, whereas spots induced during stress and starvation were classified as specific stress proteins induced by a single stimulus or a group of related stimuli, or as general stress proteins induced by a variety of entirely different stimuli. The analysis of mutants in global regulators was initiated in order to establish a response regulation map for B. subtilis. These investigations demonstrated that the alternative sigma factor sigma B is involved in the regulation of almost all of the general stress proteins and that the phoPR two-component system is required for the induction of a large part but not all of the proteins induced by phosphate starvation.  相似文献   

12.
The complete sequences of wheat yellow mosaic bymovirus (WYMV) RNA1 and RNA2 were determined. RNA1 is 7636 nucleotides long [excluding the 3'-poly(A)], and codes for a 269 kDa polyprotein of 2,404 amino acids which contains the capsid protein (CP) at the C terminus and seven putative nonstructural proteins. RNA2 is 3,659 nucleotides long and codes for a polyprotein of 904 amino acids which contains a 28 kDa putative proteinase and a 73 kDa polypeptide. These functional proteins are arranged as in RNA1 and RNA2 of barley yellow mosaic bymovirus (BaYMV). Comparisons with the sequence reported for the 3' half of RNA1 of wheat spindle streak mosaic bymovirus (WSSMV) from Southern France show that WYMV and WSSMV have a similar genetic organization. However, WYMV and WSSMV share only 77% amino acid sequence identity in their deduced CPs in spite of their close serological relationship, and 74% nucleotide sequence identity in their 3' non-coding regions. Thus, the sequence data indicate that WYMV and WSSMV are not strains of the same virus, which has long been suggested, but are distinct virus species within the genus Bymovirus of the family Potyviridae.  相似文献   

13.
The combination of two-dimensional polyacrylamide gel electrophoresis (2-D PAGE), computer image analysis and several protein identification techniques allowed the Escherichia coli SWISS-2DPAGE database to be established. This is part of the ExPASy molecular biology server accessible through the WWW at the URL address http://www.expasy.ch/ch2d/ch2d-top.html . Here we report recent progress in the development of the E. coli SWISS-2DPAGE database. Proteins were separated with immobilized pH gradients in the first dimension and sodium dodecyl sulfate-polyacrylamide gel electrophoresis in the second dimension. To increase the resolution of the separation and thus the number of identified proteins, a variety of wide and narrow range immobilized pH gradients were used in the first dimension. Micropreparative gels were electroblotted onto polyvinylidene difluoride membranes and spots were visualized by amido black staining. Protein identification techniques such as amino acid composition analysis, gel comparison and microsequencing were used, as well as a recently described Edman "sequence tag" approach. Some of the above identification techniques were coupled with database searching tools. Currently 231 polypeptides are identified on the E. coli SWISS-2DPAGE map: 64 have been identified by N-terminal microsequencing, 39 by amino acid composition, and 82 by sequence tag. Of 153 proteins putatively identified by gel comparison, 65 have been confirmed. Many proteins have been identified using more than one technique. Faster progress in the E. coli proteome project will now be possible with advances in biochemical methodology and with the completion of the entire E. coli genome.  相似文献   

14.
Recent increases in the number of genome sequencing projects means that the amount of protein sequence in databases is increasing at an astonishing pace. In proteome studies, this is facilitating the identification of proteins from molecularly well-defined organisms. However, in studies of proteins from the majority of organisms, proteins must be identified by comparing analytical data to sequences in databases from other species. This process is known as cross-species protein identification. Here we present a new program, MultiIdent, which uses multiple protein parameters such as amino acid composition, peptide masses, sequence tags, estimated protein pI and mass, to achieve cross-species protein identification. The program is structured so that protein amino acid composition, which is highly conserved across species boundaries, first generates a set of candidate proteins. These proteins are then queried with other protein parameters such as sequence tags and peptide masses. A final list of database entries which considers all analytical parameters is presented, ranked by an integrated score. We illustrate the power of the approach with the identification of a set of standard proteins, and the identification of proteins from dog heart separated by two-dimensional gel electrophoresis. The MultiIdent program is available on the world-wide web at: http://www.expasy.ch/sprot/multiident.h tml.  相似文献   

15.
In this paper a new approach for the prediction of protein coding gene structures is described. The principal scheme of prediction is as follows: first, the exons with the best potential are predicted in a sequence with unknown functions and a list of potential amino acid fragments coded by these exons is formed. Second, testing the homology between each amino acid fragment from the list and proteins from the SWISS-PROT database of amino acid sequences. One protein with the best homology is chosen out of all the homologous sequences. Third, reconstruction of the exon-intron structure, basing it on its homology with the chosen protein sequences. The method was tested on an independent control set (20 genes). The results were as follows: 21% of real exons were lost and 3% of non-real exons were found. This system can be used to refine the results of gene prediction systems, especially if highly homologous proteins are found in the amino acid sequence database.  相似文献   

16.
A critical overview is given on the application of amino acid composition data for the establishment of the protein's identity (amino acids composition vs. protein identity, the AAC-PI method). Several criteria are used to measure the differences between the amino acid compositions of various proteins. The AAC-PI method unambiguously identifies proteins which belong to the families with a high phylogenetic conservancy of their sequences. The identification of pure proteins can be accomplished with a relatively high level of confidence. The AAC-PI method, however, sometimes needs the support of N-terminal or internal sequencing of proteins since, alone, it cannot distinguish whether the lack of finding a candidate protein in protein data bases is because the investigated amino acid composition corresponds to an unknown protein or its processed form or because it is a sum of at least two protein components, or whether it is due to other experimental errors. The identification of a few new proteins such as "arginine-rich protein", macrophage migration inhibitory factor (MIF) and the preformed neurotrophic factor present in the calf brain cytosol is also reported.  相似文献   

17.
We present two methods for designing amino acid sequences of proteins that will fold to have good hydrophobic cores. Given the coordinates of the desired target protein or polymer structure, the methods generate sequences of hydrophobic (H) and polar (P) monomers that are intended to fold to these structures. One method designs hydrophobic inside, polar outside; the other minimizes an energy function in a sequence evolution process. The sequences generated by these methods agree at the level of 60-80% of the sequence positions in 20 proteins in the Protein Data Bank. A major challenge in protein design is to create sequences that can fold uniquely, i.e. to a single conformation rather than to many. While an earlier lattice-based sequence evolution method was shown not to design unique folders, our method generates unique folders in lattice model tests. These methods may also be useful in designing other types of foldable polymer not based on amino acids.  相似文献   

18.
19.
The nucleotide sequences of 3 cDNA clones corresponding to entire RNA genome of bean common mosaic virus NL3 strain have been determined. The RNA is 9612 nucleotides long, excluding a 3'-terminal poly(A) tail. A putative start codon located at nucleotide positions 170-172 initiates one large open reading frame that is terminated with a UAA codon at position 9368-9370. The predicted polyprotein has 3066 amino acids and an M(r) of 340.3 kDa. The positions of putative protein cleavage sites have been determined by analogy to consensus sequences in other potyviruses. The nucleotide sequences of the non-translated regions and the predicted amino acid sequences of BCMV NL3, were compared with those of other potyviruses. Comparison of the BCMV NL3 proteins with those of other potyviruses indicated a similar genomic organization, and high percentage of amino acid sequence identity in the cylindrical inclusion protein, nuclear inclusion 'b' protein and coat protein. BCMV NL3 displays the highest amino acid sequence identity with soybean mosaic virus.  相似文献   

20.
Chitin catabolism in Vibrio furnissii comprises several signal transducing systems and many proteins. Two of these enzymes are periplasmic and convert chitin oligosaccharides to GlcNAc and (GlcNAc)2. One of these unique enzymes, a chitodextrinase, designated EndoI, is described here. The protein, isolated from a recombinant Escherichia coli clone, exhibited (via SDS-polyacrylamide gel electrophoresis) two enzymatically active, close running bands ( approximately mass of 120 kDa) with identical N-terminal sequences. The chitodextrinase rapidly cleaved chitin oligosaccharides, (GlcNAc)4 to (GlcNAc)2, and (GlcNAc)5,6 to (GlcNAc)2 and (GlcNAc)3. EndoI was substrate inhibited in the millimolar range and was inactive with chitin, glucosamine oligosaccharides, glycoproteins, and glycopeptides containing (GlcNAc)2. The sequence of the cloned gene indicates that it encodes a 112,690-kDa protein (1046 amino acids). Both proteins lacked the predicted N-terminal 31 amino acids, corresponding to a consensus prokaryotic signal peptide. Thus, E. coli recognizes and processes this V. furnissii signal sequence. Although inactive with chitin, the predicted amino acid sequence of EndoI displayed similarities to many chitinases, with 8 amino acids completely conserved in 10 or more of the homologous proteins. There was, however, no "consensus" chitin-binding domain in EndoI.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号