首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Large-scale DNA sequencing is creating a sequence infrastructure of great benefit to protein biochemistry. Concurrent with the application of large-scale DNA sequencing to whole genome analysis, mass spectrometry has attained the capability to rapidly, and with remarkable sensitivity, determine weights and amino acid sequences of peptides. Computer algorithms have been developed to use the two different types of data generated by mass spectrometers to search sequence databases. When a protein is digested with a site-specific protease, the molecular weights of the resulting collection of peptides, the mass map or fingerprint, can be determined using mass spectrometry. The molecular weights of the set of peptides derived from the digestion of a protein can then be used to identify the protein. Several different approaches have been developed. Protein identification using peptide mass mapping is an effective technique when studying organisms with completed genomes. A second method is based on the use of data created by tandem mass spectrometers. Tandem mass spectra contain highly specific information in the fragmentation pattern as well as sequence information. This information has been used to search databases of translated protein sequences as well as nucleotide databases such as expressed sequence tag (EST) sequences. The ability to search nucleotide databases is an advantage when analyzing data obtained from organisms whose genomes are not yet completed, but a large amount of expressed gene sequence is available (e.g., human and mouse). Furthermore, a strength of using tandem mass spectra to search databases is the ability to identify proteins present in fairly complex mixtures.  相似文献   

2.
In the search for novel nuclear binding proteins, two bands from a sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) gel were analyzed and each was found to contain a number of proteins that subsequently were identified by tandem mass spectrometry (MS/MS) on a quadrupole ion trap instrument. The bands were digested with trypsin in situ on a polyvinylidene difluoride (PVDF) membrane following electroblot transfer. Analysis of a 2.5% aliquot of each peptide mixture by matrix assisted laser desorption/ionization-mass spectrometry (MALDI-MS) followed by an initial database search with the peptide masses failed to identify the proteins. The peptides were separated by reversed-phase capillary high performance liquid chromatography (HPLC) in anticipation of subsequent Edman degradation, but mass analysis of the chromatographic fractions by MALDI-MS revealed multiple, coeluting peptides that precluded this approach. Selected fractions were analyzed by capillary HPLC-electrospray ionization-ion trap mass spectrometry. Tandem mass spectrometry provided significant fragmentation from which full or partial sequence was deduced for a number of peptides. Two stages of fragmentation (MS3) were used in one case to determine additional sequence. Database searches, each using a single peptide mass plus partial sequence, identified four proteins from a single electrophoretic band at 45 kDa, and four proteins from a second band at 60 kDa. Many of these proteins were derived from human keratin. The protein identifications were corroborated by the presence of multiple matching peptide masses in the MALDI-MS spectra. In addition, a novel sequence, not found in protein or DNA databases, was determined by interpretation of the MS/MS data. These results demonstrate the power of the quadrupole ion trap for the identification of multiple proteins in a mixture, and for de novo determination of peptide sequence. Reanalysis of the fragmentation data with a modified database searching algorithm showed that the same sets of proteins were identified from a limited number of fragment ion masses, in the absence of mass spectral interpretation or amino acid sequence. The implications for protein identification solely from fragment ion masses are discussed, including advantages for low signal levels, for a reduction of the necessary interpretation expertise, and for increased speed.  相似文献   

3.
The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database.  相似文献   

4.
A "single-base sequence" is a DNA sequence in which the identities and locations of bases of only one type have been determined. We present experimental procedures for single-base sequencing and describe the effective use of existing software (FASTA) in similarity comparisons of single-base sequences. We determined the theoretical and experimental minimum sequence lengths required for identification of a sequence within a large dataset and optimized the FASTA parameters for use in single-base similarity comparisons. Single-base sequences have been used to identify cDNAs occurring in a database. Single-base sequencing could be used to reduce the redundancy of "shot-gun sequencing."  相似文献   

5.
A method for the identification of proteins by their amino acid sequence at the low-femtomole to subfemtomole sensitivity level is described. It is based on an integrated system consisting of a capillary zone electrophoresis (CZE) instrument coupled to an electrospray ionization triple- quadrupole tandem mass spectrometer (ESI-MS/MS) via a microspray interface. The method consists of proteolytic fragmentation of a protein, peptide separation by CZE, analysis of separated peptides by ESI-MS/MS, and identification of the protein by correlation of the collision-induced dissociation (CID) patterns of selected peptides with the CID patterns predicted from all the isobaric peptides in a sequence database. Using standard peptides applied to a 20-microns-i.d. capillary, we demonstrate an ESI-MS limit of detection of less than 300 amol and CID spectra suitable for searching sequence databases obtained with 600 amol of sample applied to the capillary. Successful protein identification by the method was demonstrated by applying 50 and 38 fmol of a tryptic digest of the proteins beta-lactoglobulin and bovine serum albumin, respectively, to the system.  相似文献   

6.
We have developed an algorithm (MassDynSearch) for identifying proteins using a combination of peptide masses with small associated sequences (tags). Unlike the approach developed by Matthias Mann, 'Tag searching', in which the sequence tags are generated by gas phase fragmentation of peptides in a mass spectrometer, 'Rag Tag' searching uses peptide tags which are generated enzymatically or chemically. The protein is digested either chemically or with an endopeptidase and the resultant mixture is then subjected to partial exopeptidase degradation. The mixture is analyzed by matrix assisted laser desorption and ionization time of flight mass spectrometry and a list of intact peptide masses is generated, each associated with a set of degradation product masses which serve as unique tags. These 'tagged masses' are used as the input to an algorithm we have written, MassDynSearch, which searches protein and DNA databases for proteins which contain similar tagged motifs. The method is simple, rapid and can be fully automated. The main advantage of this approach is that the specificity of the initial digestion is unimportant since multiple peptides with tags are used to search the database. This is especially useful for proteins like membrane, cytoskeletal, and other proteins where specific endopeptidases are less efficient and lower specificity proteases such as chymotrypsin, pepsin, and elastase must be used.  相似文献   

7.
The sequence RGITVNGKTYGR has been reported as part of a de novo design peptide system. This peptide folds as a beta-hairpin structure with three residues per strand and two residue turns. Asn6 side-chain, the residue in position L1 of the beta-turn, appeared to be solvent exposed, interacting only within the turn but not with the rest of the peptide. We have chosen this position as a good candidate to design mutations, based on the protein database statistical abundances, that should mainly affect the turn stability and possibly the pairing between strands. We have found that all NMR parameters, in particular the conformational shift analysis of CalphaH and the coupling constants, 3JHNalpha, correlate very well and show similar conformational features in all the turn mutant peptides. The population estimates are in reasonable agreement among the different methods used. It appears that the peptide with Asn in position L1 is the most structured peptide, followed by the one with Asp6. The next structured peptide is the one with Gly6. The least populated peptides were those with Ala6 and Ser6. We have found a strong correlation between the hairpin population, as determined from the conformational shift of CalphaH and the occurrence of the different residues at position L1 of beta-hairpins with type I' beta-turn, in the protein database. Our analysis demonstrates that this peptide system is sensitive enough to register small energy changes in the hairpin structure; therefore, it constitutes an appropriate model to quantify energy contributions, once the appropriate sheet/coil transition algorithm is developed. Comparison with the other studies indicate that the design of a specific hairpin structure must involve a sequence at the turn region favouring the desired turn type, and a sequence at the strands that avoids alternative interstrand side-chain pairings.  相似文献   

8.
The GenBank (Registered Trademark symbol) sequence database incorporates DNA sequences from all available public sources, primarily through the direct submission of sequence data from individual laboratories and from large-scale sequencing projects. Most submitters use the BankIt (Web) or Sequin programs to format and send sequence data. Data exchange with the EMBL Data Library and the DNA Data Bank of Japan helps ensure comprehensive worldwide coverage. GenBank data is accessible through NCBI's integrated retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome and protein structure information. MEDLINE (Registered Trademark symbol) s from published articles describing the sequences are included as an additional source of biological annotation through the PubMed search system. Sequence similarity searching is offered through the BLAST series of database search programs. In addition to FTP, Email, and server/client versions of Entrez and BLAST, NCBI offers a wide range of World Wide Web retrieval and analysis services based on GenBank data. The GenBank database and related resources are freely accessible via the URL: http://www.ncbi.nlm.nih.gov  相似文献   

9.
Searches of the major public protein databases with core and linker chicken and human histone sequences have resulted in the compilation of an annotated set of histone protein sequences. In addition, new database searches with two distinct motif search algorithms have identified several members of the histone fold family, including human DRAP1 and yeast CSE4. Database resources include information on conflicts between similar sequence entries in different source databases, multiple sequence alignments, links to the Entrez integrated information retrieval system, structures for histone and histone fold proteins, and the ability to visualize structural data through Cn3D. The database currently contains >1000 protein sequences, which are searchable by protein type, accession number, organism name, or any other free text appearing in the definition line of the entry. All sequences and alignments in this database are available through the World Wide Web at http://www.nhgri.nih. gov/DIR/GTB/HISTONES or http://www.ncbi.nlm.nih. gov/Baxevani/HISTONES  相似文献   

10.
MOTIVATION: Optimal sequence alignment based on the Smith-Waterman algorithm is usually too computationally demanding to be practical for searching large sequence databases. Heuristic programs like FASTA and BLAST have been developed which run much faster, but at the expense of sensitivity. RESULTS: In an effort to approximate the sensitivity of an optimal alignment algorithm, a new algorithm has been devised for the computation of a gapped alignment of two sequences. After scanning for high-scoring words and extensions of these to form fragments of similarity, the algorithm uses dynamic programming to build an accurate alignment based on the fragments initially identified. The algorithm has been implemented in a program called SALSA and the performance has been evaluated on a set of test sequences. The sensitivity was found to be close to the Smith-Waterman algorithm, while the speed was similar to FASTA (ktup = 2). AVAILABILITY: Searches can be performed from the SALSA homepage at http://dna.uio.no/salsa/ using a wide range of databases. Source code and precompiled executables are also available. CONTACT: torbjorn.rognes@labmed.uio.no  相似文献   

11.
Recent increases in the number of genome sequencing projects means that the amount of protein sequence in databases is increasing at an astonishing pace. In proteome studies, this is facilitating the identification of proteins from molecularly well-defined organisms. However, in studies of proteins from the majority of organisms, proteins must be identified by comparing analytical data to sequences in databases from other species. This process is known as cross-species protein identification. Here we present a new program, MultiIdent, which uses multiple protein parameters such as amino acid composition, peptide masses, sequence tags, estimated protein pI and mass, to achieve cross-species protein identification. The program is structured so that protein amino acid composition, which is highly conserved across species boundaries, first generates a set of candidate proteins. These proteins are then queried with other protein parameters such as sequence tags and peptide masses. A final list of database entries which considers all analytical parameters is presented, ranked by an integrated score. We illustrate the power of the approach with the identification of a set of standard proteins, and the identification of proteins from dog heart separated by two-dimensional gel electrophoresis. The MultiIdent program is available on the world-wide web at: http://www.expasy.ch/sprot/multiident.h tml.  相似文献   

12.
A technique for systematic peptide variation by a combination of rational and evolutionary approaches is presented. The design scheme consists of five consecutive steps: (i) identification of a "seed peptide" with a desired activity, (ii) generation of variants selected from a physicochemical space around the seed peptide, (iii) synthesis and testing of this biased library, (iv) modeling of a quantitative sequence-activity relationship by an artificial neural network, and (v) de novo design by a computer-based evolutionary search in sequence space using the trained neural network as the fitness function. This strategy was successfully applied to the identification of novel peptides that fully prevent the positive chronotropic effect of anti-beta1-adrenoreceptor autoantibodies from the serum of patients with dilated cardiomyopathy. The seed peptide, comprising 10 residues, was derived by epitope mapping from an extracellular loop of human beta1-adrenoreceptor. A set of 90 peptides was synthesized and tested to provide training data for neural network development. De novo design revealed peptides with desired activities that do not match the seed peptide sequence. These results demonstrate that computer-based evolutionary searches can generate novel peptides with substantial biological activity.  相似文献   

13.
A method to directly identify proteins contained in mixtures by microcolumn reversed-phase liquid chromatography electrospray ionization tandem mass spectrometry (LC/MS/MS) is studied. In this method, the mixture of proteins is digested with a proteolytic enzyme to produce a large collection of peptides. The complex peptide mixture is then separated on-line with a tandem mass spectrometer, acquiring large numbers of tandem mass spectra. The tandem mass spectra are then used to search a protein database to identify the proteins present. Results from standard protein mixtures show that proteins present in simple mixtures can be readily identified with a 30-fold difference in molar quantity, that the identifications are reproducible, and that proteins within the mixture can be identified at low femtomole levels. Based on these studies, methodology has been developed for direct LC/MS/MS analysis of proteins enriched by immunoaffinity precipitation, specific interaction with a protein-protein fusion product, and specific interaction with a macromolecular complex. The approach described in this article provides a rapid method for the direct identification of proteins in mixtures.  相似文献   

14.
We describe a new procedure that enables selective detection and sequencing of Ser-, Thr-, and Tyr-phosphopeptides at the low femtomole level in protein digests. Radiolabeling with 32P is not required, nor is prior chromatographic separation of the peptide mixture. One to two microliters of the unfractionated protein digest is infused at basic pH into an electrospray mass spectrometer at a flow rate of 20-40 nl/min using an ultra-low flow sprayer. A precursor-ion scan of m/z 79 (PO-3) produces a mass spectrum containing only the molecular ions of the phosphopeptides that are present in the sample. In cases where the protein sequence is known, the peptide molecular weights obtained are often sufficient to identify the specific sequences that are phosphorylated. If the protein sequence is not known, tandem MS with collision-induced dissociation of phosphopeptide precursor-ions may be used to obtain the amino acid sequences including the site(s) of phosphorylation. We demonstrate that phosphopeptides may be selectively detected using as little as 3 fmol of a 10 fmol/microl solution and that sequence information for a phosphopeptide in the mixture may be obtained using as little as 3 femtomole of the same solution. In addition, we show that the stoichiometry of phosphorylation at specific sites may be estimated from the ratio of the ion signals for the respective forms of the peptides observed in the normal full-scan mass spectra of the digest. These procedures are illustrated here to identify and sequence phosphopeptides from alpha-casein, a milk-derived protein possessing up to nine phosphorylation-sites. Numerous MS and tandem MS experiments were carried out on a single, 250 fmol/microl loading of the phosphoprotein digest. Phosphopeptides derived from an unexpected variant of the protein were also observed.  相似文献   

15.
Iron plays a critical role in the pathophysiology of Mycobacterium tuberculosis. To gain a better understanding of iron regulation by this organism, we have used two-dimensional (2-D) gel electrophoresis, mass spectrometry, and database searching to study protein expression in M. tuberculosis under conditions of high and low iron concentration. Proteins in cellular extracts from M. tuberculosis Erdman strain grown under low-iron (1 microM) and high-iron (70 microM) conditions were separated by 2-D polyacrylamide gel electrophoresis, which allowed high-resolution separation of several hundred proteins, as visualized by Coomassie staining. The expression of at least 15 proteins was induced, and the expression of at least 12 proteins was decreased under low-iron conditions. In-gel trypsin digestion was performed on these differentially expressed proteins, and the digestion mixtures were analyzed by matrix-assisted laser desorption ionization time-of-flight mass spectrometry to determine the molecular masses of the resulting tryptic peptides. Partial sequence data on some of the peptides were obtained by using after source decay and/or collision-induced dissociation. The fragmentation data were used to search computerized peptide mass and protein sequence databases for known proteins. Ten iron-regulated proteins were identified, including Fur and aconitase proteins, both of which are known to be regulated by iron in other bacterial systems. Our study shows that, where large protein sequence databases are available from genomic studies, the combined use of 2-D gel electrophoresis, mass spectrometry, and database searching to analyze proteins expressed under defined environmental conditions is a powerful tool for identifying expressed proteins and their physiologic relevance.  相似文献   

16.
Two homologous sequences, which have diverged beyond the point where their homology can be recognised by a simple direct comparison, can be related through a third sequence that is suitably intermediate between the two. High scores, for a sequence match between the first and third sequences and between the second and the third sequences, imply that the first and second sequences are related even though their own match score is low. We have tested the usefulness of this idea using a database that contains the sequences of 971 protein domains whose structures are known and whose residue identities with each other are some 40% or less (PDB40D). On the basis of sequence and structural information, 2143 pairs of these sequences are known to have an evolutionary relationship. FASTA, in an all-against-all comparison of the sequences in the database, detected 320 (15%) of these relationships as well as three false positive (i.e. 1% error rate). Using intermediate sequences found by FASTA matches of PDB40D sequences to those in the large non-redundant OWL database we could detect 550 evolutionary relationships with an error rate of 1%. This means the intermediate sequence procedure increases the ability to recognise the evolutionary relationships amongst the PDB40D sequences by 70%.  相似文献   

17.
Dissected tissue pieces of the pituitary pars intermedia from the amphibian Xenopus laevis was directly subjected to matrix-assisted laser desorption/ionization (MALDI) mass analysis. The obtained MALDI peptide profile revealed both previously known and unexpected processing products of the proopiomelanocortin gene. Mass spectrometric peptide sequencing of a few of these neuropeptides was performed by employing MALDI combined with postsource decay (PSD) fragment ion mass analysis. The potential of MALDI-PSD for sequence analysis of peptides directly from unfractionated tissue samples was examined for the first time for the known desacetyl-alpha-MSH-NH2 and the presumed vasotocin neuropeptide. In addition, the sequence of an unknown peptide which was present in the pars intermedia tissue sample at mass 1392.7 u was determined. The MALDI-PSD mass spectrum of precursor ion 1392.7 u contained sufficient structural information to uniquely identify the sequence by searching protein sequence databases. The determined amino acid sequence corresponds to the vasotocin peptide with a C-terminal extension of Gly-Lys-Arg ("vasotocinyl-GKR"), indicating incomplete processing of the vasotocin precursor protein in the pituitary pars intermediate of X. laevis. Both vasotocin and vasotocinyl-GKR are nonlinear peptides containing a disulfide (S-S) bridge between two cysteine residues. Interpretation of the spectra of these two peptides reveals three different forms of characteristic fragment ions of the cysteine side chain: peptide-CH2-SH (regular mass of Cys-containing fragment ions), peptide-CH2-S-SH (regular mass + 32 u) and peptide = CH2 (regular mass -34 u) due to cleavage on either side of the sulfur atoms.  相似文献   

18.
Analysis of peptides derived from HLA class I molecules indicates that thousands of unique peptides are bound by a single molecular type, and sequence examination of the pooled constituents yields a motif which collectively defines the peptides bound by a given class I molecule. Motifs resulting from pooled sequencing are then used to infer whether particular viral and tumor protein fragments might serve as class I-presented peptide therapeutics. Still undetermined from a pooled motif is the breadth or range of peptides in the population which are brought together to form the pooled motif, and it is therefore not yet known how representative of the population a pooled motif is. By employing hollow fiber bioreactors for large-scale production of HLA class I molecules, sufficient peptides are produced to investigate individual subsets of peptides comprising a motif. Edman sequencing and mass spectrometric analysis of peptides eluted from HLA-B*1501 reveal that many peptide sequences fail to align with either the N- or C-terminal anchors predicted for the B*1501 peptide motif through whole pool sequencing. These analyses further reveal auxiliary anchors not previously detected and peptides significantly larger and smaller than the predicted nonamer, ranging from 6 to 12 amino acids in length. These results demonstrate that constituents of the B*1501 peptide pool vary markedly in comparison with one another and therefore in comparison with previously established B*1501 motifs, and such complexity indicates that many of the peptide ligands presented to CTL cannot be predicted using class I consensus motifs as search criteria.  相似文献   

19.
Microcapillary HPLC electrospray ionization tandem mass spectrometry was used to sequence 15 peptides eluted from HLA-B7. Sequence alignment implicated four peptide positions in specific interactions with the class I molecule, and their importance was confirmed using synthetic peptides. Because no crystal structure for HLA-B7 was available, computer-assisted modeling was used to understand novel aspects of peptide binding specificity and to accurately predict the effect of defined changes in peptide structure. The results demonstrate that mass-spectrometric sequencing coupled with computer-assisted modeling can be used in the absence of a crystal structure to make accurate predictions concerning requirements for peptide binding to class I molecules. These techniques may be valuable to predict or engineer T cell epitopes.  相似文献   

20.
The set of proteins which are conserved across families of microbes contain important targets of new anti-microbial agents. We have developed a simple and efficient computational tool which determines concordances of putative gene products that show sets of proteins conserved across one set of user specified genomes and not present in another set of user specified genomes. The thresholds and the homology scoring criterion are selectable to allow the user to decide the stringency of the homologies. The system uses a relational database to store protein coding regions from different genomes, and to store the results of a complete comparison of all sequences against all sequences using the FASTA program. Using Web technology, the display of all the related proteins for a given sequence and calculation of multiple sequence alignments (using CLUSTALW) can be performed with the click of a button. The current database holds 97 365 sequences from 19 complete or partial genomes and 8798905 FASTA comparison results. A example concordance is presented which demonstrates that the target of the quinolone antibiotics could have been identified using this tool.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号