首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
With the increasing availability of de novo sequencing algorithms for interpreting high-mass accuracy tandem mass spectrometry (MS/MS) data, there is a growing need for programs that accurately identify proteins from de novo sequencing results. De novo sequences derived from tandem mass spectra of peptides often contain ambiguous regions where the exact amino acid order cannot be determined. One problem this poses for sequence alignment algorithms is the difficulty in distinguishing discrepancies due to de novo sequencing errors from actual genomic sequence variation and posttranslational modifications. We present a novel, mass-based approach to sequence alignment, implemented as a program called OpenSea, to resolve these problems. In this approach, de novo and database sequences are interpreted as masses of residues, and the masses, rather than the amino acid codes, are compared. To provide further flexibility, the masses can be aligned in groups, which can resolve many de novo sequencing errors. The performance of OpenSea was tested with three types of data: a mixture of known proteins, a mixture of unknown proteins that commonly contain sequence variations, and a mixture of posttranslationally modified known proteins. In all three cases, we demonstrate that OpenSea can identify more peptides and proteins than commonly used database-searching programs (SEQUEST and ProteinLynx) while accurately locating sequence variation sites and unanticipated posttranslational modifications in a high-throughput environment.  相似文献   

2.
Here we propose a novel method for rapidly identifying proteins in complex mixtures. A list of candidate proteins (including provision for posttranslational modifications) is obtained by database searching, within a specified mass range about the accurately measured mass (e.g., +/- 0.1 Da at 10 kDa) of the intact protein, by capillary liquid chromatography electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry (LC ESI FT-ICR MS). On alternate scans, LC ESI infrared multiphoton dissociation (IRMPD) FT-ICR MS yields mostly b and y fragment ions for each protein, from which the correct candidate is identified as the one with the highest "hit" score (i.e., most b and y fragments matching the candidate database protein amino acid sequence masses) and sequence "tag" score (based on a series of fragment sequences differing in mass by 1 or 2 amino acids). The method succeeds in uniquely identifying each of a mixture of five proteins treated as unknowns (melittin, ubiquitin, GroES, myoglobin, carbonic anhydrase II), from more than 1000 possible database candidates within a +/- 500 Da mass window. We are also able to identify posttranslational modifications of two of the proteins (mellitin and GroES). The method is simple, rapid, and definitive and is extendable to a mixture of affinity-selected proteins, to identify proteins with a common biological function.  相似文献   

3.
We present an MS/MS database search algorithm with the following novel features: (1) a novel protein database structure containing extensive preindexing and (2) zone modification searching, which enables the rapid discovery of protein modifications of known (i.e., user-specified) and unanticipated delta masses. All of these features are implemented in Interrogator, the search engine that runs behind the Pro ID, Pro ICAT, and Pro QUANT software products. Speed benchmarks demonstrate that our modification-tolerant database search algorithm is 100-fold faster than traditional database search algorithms when used for comprehensive searches for a broad variety of modification species. The ability to rapidly search for a large variety of known as well as unanticipated modifications allows a significantly greater percentage of MS/MS scans to be identified. We demonstrate this with an example in which, out of a total of 473 identified MS/MS scans, 315 of these scans correspond to unmodified peptides, while 158 scans correspond to a wide variety of modified peptides. In addition, we provide specific examples where the ability to search for unanticipated modifications allows the scientist to discover: unexpected modifications that have biological significance; amino acid mutations; salt-adducted peptides in a sample that has nominally been desalted; peptides arising from nontryptic cleavage in a sample that has nominally been digested using trypsin; other unintended consequences of sample handling procedures.  相似文献   

4.
A powerful technique for peptide and protein identification is tandem mass spectrometry followed by database search using a program such as SEQUEST or Mascot. These programs, however, become slow and lose sensitivity when allowing nonspecific cleavages or peptide modifications. De novo sequencing and hybrid methods such as sequence tagging offer speed and robustness for wider searches, yet these approaches require better spectra with more complete and consecutive fragmentation and, hence, are less sensitive to low-abundance peptides. Here we describe a new hybrid method that retains the sensitivity of pure database search. The method uses a small amount of de novo analysis to identify likely b- and y-ion peaks--"lookup peaks"--that can then be used to extract candidate peptides from the database, with the number of candidates tunable to fit a computing budget. We describe a program called ByOnic that implements this method, and we benchmark ByOnic on several data sets, including one of mouse blood plasma spiked with low concentrations of recombinant human proteins. We demonstrate that ByOnic is more sensitive than sequence tagging and, indeed, more sensitive than the three most popular pure database search tools--SEQUEST, Mascot, and X!Tandem--on both the peptide and protein levels. On the mouse plasma samples, ByOnic consistently found spiked proteins missed by the other tools.  相似文献   

5.
Size- and shape-dependent optical properties of gold nanorods allow monitoring their growth using a novel fast single-particle spectroscopy (fastSPS) method. FastSPS uses a spatially addressable electronic shutter based on a liquid crystal device to investigate particles randomly deposited on a substrate, orders of magnitude faster than other techniques. We use fastSPS to observe nanoparticle growth in situ on a single-particle level and extract quantitative data on nanoparticle growth.  相似文献   

6.
Tandem mass spectrometry is the prevailing approach for large-scale peptide sequencing in high-throughput proteomic profiling studies. Effective database search engines have been developed to identify peptide sequences from MS/MS fragmentation spectra. Since proteins are polymorphic and subject to post-translational modifications (PTM), however, computational methods for detecting unanticipated variants are also needed to achieve true proteome-wide coverage. Different from existing "unrestrictive" search tools, we present a novel algorithm, termed SIMS (for Sequential Motif Interval Search), that interprets pairs of product ion peaks, representing potential amino acid residues or "intervals", as a means of mapping PTMs or substitutions in a blind database search mode. An effective heuristic software program was likewise developed to evaluate, rank, and filter optimal combinations of relevant intervals to identify candidate sequences, and any associated PTM or polymorphism, from large collections of MS/MS spectra. The prediction performance of SIMS was benchmarked extensively against annotated reference spectral data sets and compared favorably with, and was complementary to, current state-of-the-art methods. An exhaustive discovery screen using SIMS also revealed thousands of previously overlooked putative PTMs in a compendium of yeast protein complexes and in a proteome-wide map of adult mouse cardiomyocytes. We demonstrate that SIMS, freely accessible for academic research use, addresses gaps in current proteomic data interpretation pipelines, improving overall detection coverage, and facilitating comprehensive investigations of the fundamental multiplicity of the expressed proteome.  相似文献   

7.
Database-searching algorithms compatible with shotgun proteomics match a peptide tandem mass spectrum to a predicted mass spectrum for an amino acid sequence within a database. SEQUEST is one of the most common software algorithms used for the analysis of peptide tandem mass spectra by using a cross-correlation (XCorr) scoring routine to match tandem mass spectra to model spectra derived from peptide sequences. To assess a match, SEQUEST uses the difference between the first- and second-ranked sequences (ACn). This value is dependent on the database size, search parameters, and sequence homologies. In this report, we demonstrate the use of a scoring routine (SEQUEST-NORM) that normalizes XCorr values to be independent of peptide size and the database used to perform the search. This new scoring routine is used to objectively calculate the percent confidence of protein identifications and posttranslational modifications based solely on the XCorr value.  相似文献   

8.
Integration of mass spectrometry in analytical biotechnology.   总被引:9,自引:0,他引:9  
Mass spectrometry (MS) has become an indispensable tool for peptide and protein structure analysis because of three unique capabilities that enable it to be used to solve structural problems not easily handled by conventional techniques. First, MS is able to provide accurate molecular weight information on low-picomole amounts of peptides and proteins independent of covalent modifications that may be present. Second, this information is obtainable for peptides present in complex mixtures such as those that result from a proteolytic digest of a protein. Third, by using tandem MS, partial to complete sequence information may be obtained for peptides containing up to 25 amino acid residues, even if the peptides are present in mixtures. Sensitivity and speed of the MS-based approaches now equal (and in some cases exceed) that of Edman-based sequence analysis. In this perspective we discuss how MS, tandem high-performance MS, and on-line liquid chromatography/MS using fast atom bombardment or electrospray ionization have been integrated with more conventional techniques in order to increase the accuracy and speed of peptide and protein structure characterization. The expanding role of matrix-assisted laser desorption MS in protein analysis is also described. The unique niche that MS occupies for locating and structurally characterizing posttranslational modifications of proteins is emphasized. Examples chosen from the authors' laboratory illustrate how MS is used to sequence blocked proteins, define N- and C-terminal sequence heterogeneity, locate and correct errors in DNA- and cDNA-deduced protein sequences, identify sites of deamidation, isoaspartyl formation, phosphorylation, oxidation, disulfide bond formation, and glycosylation, and define the structural class of carbohydrate at specific attachment sites in glycoproteins.  相似文献   

9.
Identifying proteins and their modification states and with known levels of confidence remains as a significant challenge for proteomics. Random or decoy peptide databases are increasingly being used to estimate the false discovery rate (FDR), e.g., from liquid chromatography-tandem mass spectrometry (LC-MS/MS) analyses of tryptic digests. We show that this approach can significantly underestimate the FDR and describe an approach for more confident protein identifications that uses unique partial sequences derived from a combination of database searching and amino acid residue sequencing using high-accuracy MS/MS data. Applied to a Saccharomyces cerevisiae tryptic digest, the approach provided 3 132 confident peptide identifications ( approximately 5% modified in some fashion), covering 575 proteins with an estimated zero FDR. The conventional approach provided 3 359 peptide identifications and 656 proteins with 0.3% FDR based upon a decoy database analysis. However, the present approach revealed approximately 5% of the 3 359 identifications to be incorrect and many more as potentially ambiguous (e.g., due to not considering certain amino acid substitutions and modifications). In addition, 677 peptides and 39 proteins were identified that had been missed by conventional analysis, including nontryptic peptides, peptides with a variety of expected/unexpected chemical modifications, known/unknown post-translational modifications, single nucleotide polymorphisms or gene encoding errors, and multiple modifications of individual peptides.  相似文献   

10.
There are several computer programs that can match peptide tandem mass spectrometry data to their exactly corresponding database sequences, and in most protein identification projects, these programs are utilized in the early stages of data interpretation. However, situations frequently arise where tandem mass spectral data cannot be correlated with any database sequences. In these cases, the unmatched data could be due to peptides derived from novel proteins, allelic or species-derived variants of known proteins, or posttranslational or chemical modifications. Two additional problems are frequently encountered in high-throughput protein identification. First, it is difficult to quickly sift through large amounts of data to identify those spectra that, due to poor signal or contaminants, can be ignored. Second, it is important to find incorrect database matches (false positives). We have chosen to address these difficulties by performing automatic de novo sequencing using a computer program called Lutefisk. Sequence candidates obtained are used as input in a homology-based database search program called CIDentify to identify variants of known proteins. Comparison of database-derived sequences with de novo sequences allows for electronic validation of database matches even if the latter are not completely correct. Modifications to the original Lutefisk program have been implemented to handle data obtained from triple quadrupole, ion trap, and quadrupole/time-of-flight hybrid (Qtof) mass spectrometers. For example, the linearity of mass errors due to temperature-dependent expansion of the flight tube in a Qtof was exploited such that isobaric amino acids (glutamine/lysine and oxidized methionine/ phenylalanine) can be differentiated without careful attention to mass calibration.  相似文献   

11.
We report here the novel use of electrochemistry to generate covalent oxidative labels on intact proteins in both non-native and physiologically relevant solutions as a surface mapping probe of higher order protein structure. Two different working electrode types were tested across a range of experimental parameters including voltage, flow rate, and solution electrolyte composition to affect the extent of oxidation on intact proteins, as measured both on-line and off-line with mass spectrometry. Oxidized proteins were collected off-line for proteolytic digestion followed by LC-MS/MS analysis. Peptide MS/MS data were searched with the InsPecT scoring algorithm for 46 oxidative mass shifts previously reported in the literature. Preliminary data showed reasonable agreement between amino acid solvent accessibility and the resulting oxidation status of these residues in aqueous buffer, while more buried residues were found to be oxidized in non-native solution. Our results indicate that electrochemical oxidation using a boron-doped diamond electrode has the potential to become a useful and easily accessible tool for conducting oxidative surface mapping experiments.  相似文献   

12.
Tabb DL  Saraf A  Yates JR 《Analytical chemistry》2003,75(23):6415-6421
Shotgun proteomics is a powerful tool for identifying the protein content of complex mixtures via liquid chromatography and tandem mass spectrometry. The most widely used class of algorithms for analyzing mass spectra of peptides has been database search software such as SEQUEST. A new sequence tag database search algorithm, called GutenTag, makes it possible to identify peptides with unknown posttranslational modifications or sequence variations. This software automates the process of inferring partial sequence "tags" directly from the spectrum and efficiently examines a sequence database for peptides that match these tags. When multiple candidate sequences result from the database search, the software evaluates which is the best match by a rapid examination of spectral fragment ions. We compare GutenTag's accuracy to that of SEQUEST on a defined protein mixture, showing that both modified and unmodified peptides can be successfully identified by this approach. GutenTag analyzed 33,000 spectra from a human lens sample, identifying peptides that were missed in prior SEQUEST analysis due to sequence polymorphisms and posttranslational modifications. The software is available under license; visit http://fields.scripps.edu for information.  相似文献   

13.
14.
We present a Web-based application that uses whole-protein masses determined by mass spectrometry to identify putative co- and posttranslational proteolytic cleavages and chemical modifications. The protein cleavage and modification engine (PROCLAME) requires as input an intact mass measurement and a precursor identification based on peptide mass fingerprinting or tandem mass spectrometry. This approach predicts mass-modifying events using a depth-first tree search, bounded by a set of rules controlled by a custom-built fuzzy logic engine, to explore a large number of possible combinations of modifications accounting for the experimental mass. Candidates are saved during a search if they are within a user-specified instrument mass accuracy; the total number of possible candidates searched is based on a specified fuzzy cutoff score. Candidates are scored and ranked using a simple probabilistic model. There is generally not enough information in an intact mass measurement to determine a single unique protein characterization; however, the program provides utility by expediting the identification of sets of putative events consistent with the mass data and ranking them for further investigation. This approach uses a simple, intuitive rule base and lends itself to discovery of unannotated posttranslational events. We have assessed the program with both in silico-generated test data and with published data from an analysis of large ribosomal subunit proteins, both from the yeast S. cerevisiae. Results indicate a high degree of sensitivity and specificity in characterizing proteins whose masses resulted from reasonable proteolysis and covalent modification scenarios. The application is available on the web at http://proclame.unc.edu.  相似文献   

15.
Currently available mass spectrometric (MS) techniques lack specificity in identifying protein modifications because molecular mass is the only parameter used to characterize these changes. Consequently, the suspected modified peptides are subjected to tandem MS/MS sequencing that may demand more time and sample. We report the use of stable isotope-enriched amino acids as residue-specific "mass signatures" for the rapid and sensitive detection of protein modifications directly from the peptide mass map (PMM) without enrichment of the modified peptides. These mass signatures are easily recognized through their characteristic spectral patterns and provide fingerprints for peptides containing the same content of specific amino acid residue(s) in a PMM. Without the need for tandem MS/MS sequencing, a peptide and its modified form(s) can readily be identified through their identical fingerprints, regardless of the nature of modifications. In this report, we demonstrate this strategy for the detection of methionine oxidation and protein phosphorylation. More interestingly, the phosphorylation of a histone protein, H2A.X, obtained from human skin fibroblast cells, was effectively identified in response to low-dose radiation. In general, this strategy of residue-specific mass tagging should be applicable to other posttranslational modifications.  相似文献   

16.
Protein phosphorylation is one of the most important known posttranslational modifications. Tandem mass spectrometry has become an important tool for mapping out the phosphorylation sites. However, when a peptide generated from the enzymatic or chemical digestion of a phosphoprotein is highly phosphorylated or contains many potential phosphorylation residues, phosphorylation site assignment becomes difficult. Separation and enrichment of phosphopeptides from a digest mixture is desirable and often a critical step for MS/MS-based site determination. In this work, we present a novel open tubular immobilized metal ion affinity chromatography (OT-IMAC) method, which is found to be more effective and reproducible for phosphopeptide enrichment, compared to a commonly used commercial product, Ziptip from Millipore. A strategy based on a combination of OT-IMAC, sequential dual-enzyme digestion, and matrix-assisted laser desorption/ionization (MALDI) quadrupole time-of-flight tandem mass spectrometry for phosphoprotein characterization is presented. It is shown that MALDI MS/MS with collision-induced dissociation can be very effective in generating fragment ion spectra containing rich structural information, which enables the identification of phosphorylation sites even from highly phosphorylated peptides. The applicability of this method for real world applications is demonstrated in the characterization and identification of phosphorylation sites of a Na(+)/H(+) exchanger fusion protein, His182, which was phosphorylated in vitro using the kinase Erk2.  相似文献   

17.
Multistage mass spectrometry (MS(n)) generating so-called spectral trees is a powerful tool in the annotation and structural elucidation of metabolites and is increasingly used in the area of accurate mass LC/MS-based metabolomics to identify unknown, but biologically relevant, compounds. As a consequence, there is a growing need for computational tools specifically designed for the processing and interpretation of MS(n) data. Here, we present a novel approach to represent and calculate the similarity between high-resolution mass spectral fragmentation trees. This approach can be used to query multiple-stage mass spectra in MS spectral libraries. Additionally the method can be used to calculate structure-spectrum correlations and potentially deduce substructures from spectra of unknown compounds. The approach was tested using two different spectral libraries composed of either human or plant metabolites which currently contain 872 MS(n) spectra acquired from 549 metabolites using Orbitrap FTMS(n). For validation purposes, for 282 of these 549 metabolites, 765 additional replicate MS(n) spectra acquired with the same instrument were used. Both the dereplication and de novo identification functionalities of the comparison approach are discussed. This novel MS(n) spectral processing and comparison approach increases the probability to assign the correct identity to an experimentally obtained fragmentation tree. Ultimately, this tool may pave the way for constructing and populating large MS(n) spectral libraries that can be used for searching and matching experimental MS(n) spectra for annotation and structural elucidation of unknown metabolites detected in untargeted metabolomics studies.  相似文献   

18.
Mass spectral analysis is an increasingly common method used to characterize glycoproteins. When more than one glycosylation site is present on a protein, obtaining MS data of glycopeptides is a highly effective way of obtaining glycosylation information because this approach can be used to identify not only what the carbohydrates are but also at which glycosylation site they are attached. Unfortunately, this is not yet a routine analytical approach, in part because data analysis can be quite challenging. We are developing strategies to simplify this analysis. Presented herein is a novel mass spectrometry technique that identifies the peptide moiety of either sulfated, sialylated, or both sialylated and sulfated glycopeptides. This technique correlates product ions in collision-induced dissociation (CID) experiments of suspected glycopeptides to a peptide composition using a newly developed web-based tool, GlycoPep ID. After identifying the peptide portion of glycopeptides with GlycoPep ID, the process of assigning the rest of the glycopeptide composition to the MS data is greatly facilitated because the "unknown" portion of the mass assignment that remains can be directly attributed to the carbohydrate component. Several examples of the utility and reliability of this method are presented herein.  相似文献   

19.
Posttranslational modifications are major mechanisms of regulating protein activity and function in vertebrate cells. It is essential to obtain qualitative information about posttranslational modification patterns of proteins to understand signal transduction mechanisms in greater detail. However, it is equally important to measure the dynamics of posttranslational modifications such as phosphorylation to approach signaling networks from a systems biology perspective. Despite a number of advances, methods to quantitate posttranslational modifications remain difficult to implement due to a number of factors including lack of a generic method, elaborate chemical steps, and requirement for large amounts of sample. We have previously shown that stable isotope-containing amino acids in cell culture (SILAC) can be used to differentially label growing cell populations for quantitation of protein levels. In this report, we extend the use of SILAC as a novel proteomic approach for the relative quantitation of posttranslational modifications such as phosphorylation. We have used SILAC to quantitate the extent of known phosphorylation sites as well as to identify and quantitate novel phosphorylation sites.  相似文献   

20.
Detection and identification of pathogenic bacteria and their protein toxins play a crucial role in a proper response to natural or terrorist-caused outbreaks of infectious diseases. The recent availability of whole genome sequences of priority bacterial pathogens opens new diagnostic possibilities for identification of bacteria by retrieving their genomic or proteomic information. We describe a method for identification of bacteria based on tandem mass spectrometric (MS/MS) analysis of peptides derived from bacterial proteins. This method involves bacterial cell protein extraction, trypsin digestion, liquid chromatography MS/MS analysis of the resulting peptides, and a statistical scoring algorithm to rank MS/MS spectral matching results for bacterial identification. To facilitate spectral data searching, a proteome database was constructed by translating genomes of bacteria of interest with fully or partially determined sequences. In this work, a prototype database was constructed by the automated analysis of 87 publicly available, fully sequenced bacterial genomes with the GLIMMER gene finding software. MS/MS peptide spectral matching for peptide sequence assignment against this proteome database was done by SEQUEST. To gauge the relative significance of the SEQUEST-generated matching parameters for correct peptide assignment, discriminant function (DF) analysis of these parameters was applied and DF scores were used to calculate probabilities of correct MS/MS spectra assignment to peptide sequences in the database. The peptides with DF scores exceeding a threshold value determined by the probability of correct peptide assignment were accepted and matched to the bacterial proteomes represented in the database. Sequence filtering or removal of degenerate peptides matched with multiple bacteria was then performed to further improve identification. It is demonstrated that using a preset criterion with known distributions of discriminant function scores and probabilities of correct peptide sequence assignments, a test bacterium within the 87 database microorganisms can be unambiguously identified.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号