首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Database search identification algorithms, such as Sequest and Mascot, constitute powerful enablers for proteomic tandem mass spectrometry. We introduce DBDigger, an algorithm that reorganizes the database identification process to remove a problematic bottleneck. Typically such algorithms determine which candidate sequences can be compared to each spectrum. Instead, DBDigger determines which spectra can be compared to each candidate sequence, enabling the software to generate candidate sequences only once for each HPLC separation rather than for each spectrum. This reorganization also reduces the number of times a spectrum must be predicted for a particular candidate sequence and charge state. As a result, DBDigger can accelerate some database searches by more than an order of magnitude. In addition, the software offers features to reduce the performance degradation introduced by posttranslational modification (PTM) searching. DBDigger allows researchers to specify the sequence context in which each PTM is possible. In the case of CNBr digests, for example, modified methionine residues can be limited to occur only at the C-termini of peptides. Use of "context-dependent" PTM searching reduces the performance penalty relative to traditional PTM searching. We characterize the performance possible with DBDigger, showcasing MASPIC, a new statistical scorer. We describe the implementation of these innovations in the hope that other researchers will employ them for rapid and highly flexible proteomic database search.  相似文献   

2.
For analysis of intact proteins by mass spectrometry (MS), a new twist to a two-dimensional approach to proteome fractionation employs an acid-labile detergent instead of sodium dodecyl sulfate during continuous-elution gel electrophoresis. Use of this acid-labile surfactant (ALS) facilitates subsequent reversed-phase liquid chromatography (RPLC) for a net two-dimensional fractionation illustrated by transforming thousands of intact proteins from Saccharomyces cerevisiae to mixtures of 5-20 components (all within approximately 5 kDa of one another) for presentation via electrospray ionization (ESI) to a Fourier transform MS (FTMS). Between 3 and 13 proteins have been detected directly using ESI-FTMS (or MALDI-TOF), and the fractionation showed a peak capacity of approximately 400 between 0 and 70 kDa. A probability-based identification was made automatically from raw MS/MS data (obtained using a quadrupole-FTMS hybrid instrument) for one protein that differed from that predicted in a yeast database of approximately 19,000 protein forms. This ALS-PAGE/RPLC approach to proteome processing ameliorates the "front end" problem that accompanies direct analysis of whole proteins and assists the future realization of protein identification with 100% sequence coverage in a high-throughput format.  相似文献   

3.
A comprehensive approach to protein identification and determination of sites of posttranslational modifications (PTMs) in heavily modified proteins was tested. In this approach, termed "reconstructed molecular mass analysis" (REMMA), the molecular mass distribution of the intact protein is measured first, which reveals the extent and heterogeneity of modifications. Then the protein is digested with one or several enzymes, with peptides separated by reversed-phase HPLC, and analyzed by Fourier transform mass spectrometry (FTMS). Vibrational excitation (collisional or infrared) or electron capture dissociation (ECD) of peptide ions provides protein identification. When a measured peptide molecular mass indicates the possibility of a PTM, vibrational excitation is applied to determine via characteristic losses the type and eventually the structure of the modification, while ECD determines the PTM site. Chromatographic peak analysis continues until full sequence coverage is obtained, after which the molecular mass is reconstructed and compared with the measured value. An agreement indicates that the PTM characterization was complete. This procedure applied to the bovine milk PP3 protein containing 25% modifications by weight yielded all known modifications (five phosphorylations, two O- and one N-glycosylation) as well as the previously unreported NeuNAc-Hex-[NeuNAc]HexNAc group O-linked to Ser60. With the FTMS performance improved, REMMA can serve as the basis for high-throughput, high-sensitivity PTM characterization of biological important proteins, which should speed up the proteomics research.  相似文献   

4.
Tandem mass spectrometry is the prevailing approach for large-scale peptide sequencing in high-throughput proteomic profiling studies. Effective database search engines have been developed to identify peptide sequences from MS/MS fragmentation spectra. Since proteins are polymorphic and subject to post-translational modifications (PTM), however, computational methods for detecting unanticipated variants are also needed to achieve true proteome-wide coverage. Different from existing "unrestrictive" search tools, we present a novel algorithm, termed SIMS (for Sequential Motif Interval Search), that interprets pairs of product ion peaks, representing potential amino acid residues or "intervals", as a means of mapping PTMs or substitutions in a blind database search mode. An effective heuristic software program was likewise developed to evaluate, rank, and filter optimal combinations of relevant intervals to identify candidate sequences, and any associated PTM or polymorphism, from large collections of MS/MS spectra. The prediction performance of SIMS was benchmarked extensively against annotated reference spectral data sets and compared favorably with, and was complementary to, current state-of-the-art methods. An exhaustive discovery screen using SIMS also revealed thousands of previously overlooked putative PTMs in a compendium of yeast protein complexes and in a proteome-wide map of adult mouse cardiomyocytes. We demonstrate that SIMS, freely accessible for academic research use, addresses gaps in current proteomic data interpretation pipelines, improving overall detection coverage, and facilitating comprehensive investigations of the fundamental multiplicity of the expressed proteome.  相似文献   

5.
Here we describe an algorithm for identifying peptides/ proteins of known sequence and unknown peptides from partial spectra generated by an in-source decay (ISD) technique coupled with matrix-assisted laser desorption/ ionization time-of-flight mass spectrometry. The identification of protein fragments is processed with a software program called CMATCH, which generates candidate subsequences for both known peptides/proteins and unknown peptides for the major product ions in the spectral range m/z 400-5000 and then matches these to known protein sequences contained in a reference database for the known peptides/proteins. CMATCH, which is compiled for MSDOS or WINDOWS95/NT, has two main advantages: first, the candidate subsequences are generated automatically without the need for supplementary information concerning the distribution of either N-terminal or C-terminal ions in the spectra for both known peptides/proteins and unknown peptides; second, the highest coordinated homologous sequences are picked up automatically from the reference database as the best matches with known peptides/proteins. Examples from the ISD spectra of several test proteins demonstrate the efficacy of this protein identification software.  相似文献   

6.
A method for testing the significance of mass spectrometric (MS) protein identification results is presented. MS proteolytic peptide mapping and genome database searching provide a rapid, sensitive, and potentially accurate means for identifying proteins. Database search algorithms detect the matching between proteolytic peptide masses from an MS peptide map and theoretical proteolytic peptide masses of the proteins in a genome database. The number of masses that matches is used to compute a score, S, for each protein, and the protein that yields the best score is assumed as the identification result. There is a risk of obtaining a false result, because masses determined by MS are not unique; i.e., each mass in a peptide map can match randomly one or several proteins in a genome database. A false result is obtained when the score, S, due to random matching cannot be discerned from the score due to matching with a real protein in the sample. We therefore introduce the frequency function, f(S), for false (random) identification results as a basis for testing at what significance level, alpha, one can reject a null hypothesis, H0: "the result is false". The significance is tested by comparing an experimental score, S(E), with a critical score, S(C), required for a significant result at the level alpha. If S(E) > or = S(C), H0 is rejected. f(S) and S(C) were obtained by simulations utilizing random tryptic peptide maps generated from a genome database. The critical score, S(C), was studied as a function of the number of masses in the peptide map, the mass accuracy, the degree of incomplete enzymatic cleavage, the protein mass range, and the size of the genome. With S(C) known for a variety of experimental constraints, significance testing can be fully automated and integrated with database searching software used for protein identification.  相似文献   

7.
For improved detection of diverse posttranslational modifications (PTMs), direct fragmentation of protein ions by top down mass spectrometry holds promise but has yet to be achieved on a large scale. Using lysate from Saccharomyces cerevisiae, 117 gene products were identified with 100% sequence coverage revealing 26 acetylations, 1 N-terminal dimethylation, 1 phosphorylation, 18 duplicate genes, and 44 proteolytic fragments. The platform for this study combined continuous-elution gel electrophoresis, reversed-phase liquid chromatography, automated nanospray coupled with a quadrupole-FT hybrid mass spectrometer, and a new search engine for querying a custom database. The proteins identified required no manual validation, ranged from 5 to 39 kDa, had codon biases from 0.93 to 0.083, and were primarily associated with glycolysis and protein synthesis. Illustrations of gene-specific identifications, PTM detection and subsequent PTM localization (using either electron capture dissociation or known PTM data stored in a database) show how larger scale proteome projects incorporating top down may proceed in the future using commercial Q-FT instruments.  相似文献   

8.
We investigated and compared three approaches for shotgun protein identification by combining MS and MS/MS information using LTQ-Orbitrap high mass accuracy data. In the first approach, we employed a unique mass identifier method where MS peaks matched to peptides predicted from proteins identified from an MS/MS database search are first subtracted before using the MS peaks as unique mass identifiers for protein identification. In the second method, we used an accurate mass and time tag method by building a potential mass and retention time database from previous MudPIT analyses. For the third method, we used a peptide mass fingerprinting-like approach in combination with a randomized database for protein identification. We show that we can improve protein identification sensitivity for low-abundance proteins by combining MS and MS/MS information. Furthermore, "one-hit wonders" from MS/MS database searching can be further substantiated by MS information and the approach improves the identification of low-abundance proteins. The advantages and disadvantages for the three approaches are then discussed.  相似文献   

9.
Top-down proteomics for rapid identification of intact microorganisms   总被引:2,自引:0,他引:2  
We apply MALDI-TOF/TOF mass spectrometry for the rapid and high-confidence identification of intact Bacillus spore species. In this method, fragment ion spectra of whole (undigested) protein biomarkers are obtained without the need for biomarker prefractionation, digestion, separation, and cleanup. Laser-induced dissociation (unimolecular decay) of higher mass (>5 kDa) precursor ions in the first TOF analyzer is followed by reacceleration and subsequent high-resolution mass analysis of the resulting sequence-specific fragments in a reflectron TOF analyzer. In-house-developed software compares an experimental MS/MS spectrum with in silico-generated tandem mass spectra from all protein sequences, contained in a proteome database, with masses within a preset range around the precursor ion mass. A p-value, the probability that the observed matches between experimental and in silico-generated fragments occur by chance, is computed and used to rank the database proteins to identify the most plausible precursor protein. By inference, the source microorganism is then identified on the basis of the identification of individual, unique protein biomarker(s). As an example, intact Bacillus atrophaeus and Bacillus cereus spores, either pure or in mixtures, were unambiguously identified by this method after fragmenting and identifying individual small, acid-soluble spore proteins that are specific for each species. Factors such as experimental mass accuracy and number of detected fragment ions, precursor ion charge state, and sequence-specific fragmentation have been evaluated with the objective of extending the approach to other microorganisms. MALDI-TOF/TOF-MS in a lab setting is an efficient tool for in situ confirmation/verification of initial microorganism identification.  相似文献   

10.
Here we propose a novel method for rapidly identifying proteins in complex mixtures. A list of candidate proteins (including provision for posttranslational modifications) is obtained by database searching, within a specified mass range about the accurately measured mass (e.g., +/- 0.1 Da at 10 kDa) of the intact protein, by capillary liquid chromatography electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry (LC ESI FT-ICR MS). On alternate scans, LC ESI infrared multiphoton dissociation (IRMPD) FT-ICR MS yields mostly b and y fragment ions for each protein, from which the correct candidate is identified as the one with the highest "hit" score (i.e., most b and y fragments matching the candidate database protein amino acid sequence masses) and sequence "tag" score (based on a series of fragment sequences differing in mass by 1 or 2 amino acids). The method succeeds in uniquely identifying each of a mixture of five proteins treated as unknowns (melittin, ubiquitin, GroES, myoglobin, carbonic anhydrase II), from more than 1000 possible database candidates within a +/- 500 Da mass window. We are also able to identify posttranslational modifications of two of the proteins (mellitin and GroES). The method is simple, rapid, and definitive and is extendable to a mixture of affinity-selected proteins, to identify proteins with a common biological function.  相似文献   

11.
An improved data analysis method is described for rapid identification of intact microorganisms from MALDI-TOF-MS data. The method makes no use of mass spectral fingerprints. Instead, a microorganism database is automatically generated that contains biomarker masses derived from ribosomal protein sequences and a model of N-terminal Met loss. We quantitatively validate the method via a blind study that seeks to identify microorganisms with known ribosomal protein sequences. We also include in the database microorganisms with incompletely known sets of ribosomal proteins to test the specificity of the method. With an optimal MALDI protocol, and at the 95% confidence level, microorganisms represented in the database with 20 or more biomarkers (i.e., those with complete or nearly completely sequenced genomes) are correctly identified from their spectra 100% of the time, with no incorrect identifications. Microorganisms with seven or less biomarkers (i.e., incompletely sequenced genomes) are either not identified or misidentified. Robustness with respect to variations in sample preparation protocol and mass analysis protocol is demonstrated by collecting data with two different matrixes and under two different ion-mode configurations. Statistical analysis suggests that, even without further improvement, the method described here would successfully scale up to microorganism databases with roughly 1000 microorganisms. The results demonstrate that microorganism identification based on proteome data and modeling can perform as well as methods based on mass spectral fingerprinting.  相似文献   

12.
The present study reports a procedure developed for the identification of SDS-polyacrylamide gel electrophoretically separated proteins using an electrospray ionization quadrupole time-of-flight mass spectrometer (Q-TOF MS) equipped with pressurized sample introduction. It is based on in-gel digestion of the proteins without previous reduction/alkylation and on the capability of the Q-TOF MS to provide data suitable for peptide mass fingerprinting database searches and for tandem mass spectrometry (MS/MS) database searches (sequence tags). Omitting the reduction/alkylation step reduces sample contamination and sample loss, resulting in increased sensitivity. Omitting this step can leave disulfide-connected peptides in the analyte that can lead to misleading or ambiguous results from the peptide mass fingerprinting database search. This uncertainty, however, is overcome by MS/MS analysis of the peptides. Furthermore, the two complementary MS approaches increase the accuracy of the assignment of the unknown protein. This procedure is thus, highly sensitive, accurate, and rapid. In combination with pressurized nanospray sample introduction, it is suitable for automated sample handling. Here, we apply this approach to identify protein contaminants observed during the purification of the yeast DNA mismatch repair protein Mlh 1.  相似文献   

13.
The purpose of this work is to develop and verify statistical models for protein identification using peptide identifications derived from the results of tandem mass spectral database searches. Recently we have presented a probabilistic model for peptide identification that uses hypergeometric distribution to approximate fragment ion matches of database peptide sequences to experimental tandem mass spectra. Here we apply statistical models to the database search results to validate protein identifications. For this we formulate the protein identification problem in terms of two independent models, two-hypothesis binomial and multinomial models, which use the hypergeometric probabilities and cross-correlation scores, respectively. Each database search result is assumed to be a probabilistic event. The Bernoulli event has two outcomes: a protein is either identified or not. The probability of identifying a protein at each Bernoulli event is determined from relative length of the protein in the database (the null hypothesis) or the hypergeometric probability scores of the protein's peptides (the alternative hypothesis). We then calculate the binomial probability that the protein will be observed a certain number of times (number of database matches to its peptides) given the size of the data set (number of spectra) and the probability of protein identification at each Bernoulli event. The ratio of the probabilities from these two hypotheses (maximum likelihood ratio) is used as a test statistic to discriminate between true and false identifications. The significance and confidence levels of protein identifications are calculated from the model distributions. The multinomial model combines the database search results and generates an observed frequency distribution of cross-correlation scores (grouped into bins) between experimental spectra and identified amino acid sequences. The frequency distribution is used to generate p-value probabilities of each score bin. The probabilities are then normalized with respect to score bins to generate normalized probabilities of all score bins. A protein identification probability is the multinomial probability of observing the given set of peptide scores. To reduce the effect of random matches, we employ a marginalized multinomial model for small values of cross-correlation scores. We demonstrate that the combination of the two independent methods provides a useful tool for protein identification from results of database search using tandem mass spectra. A receiver operating characteristic curve demonstrates the sensitivity and accuracy level of the approach. The shortcomings of the models are related to the cases when protein assignment is based on unusual peptide fragmentation patterns that dominate over the model encoded in the peptide identification process. We have implemented the approach in a program called PROT_PROBE.  相似文献   

14.
Zhang W  Chait BT 《Analytical chemistry》2000,72(11):2482-2489
We describe the protein search engine "ProFound", which employs a Bayesian algorithm to identify proteins from protein databases using mass spectrometric peptide mapping data. The algorithm ranks protein candidates by taking into account individual properties of each protein in the database as well as other information relevant to the peptide mapping experiment. The program consistently identifies the correct protein(s) even when the data quality is relatively low or when the sample consists of a simple mixture of proteins. Illustrative examples of protein identifications are provided.  相似文献   

15.
A method for rapid identification of microorganisms is presented, which exploits the wealth of information contained in prokaryotic genome and protein sequence databases. The method is based on determining the masses of a set of ions by MALDI TOF mass spectrometry of intact or treated cells. Subsequent correlation of each ion in the set to a protein, along with the organismic source of the protein, is performed by searching an Internet-accessible protein database. Convoluting the lists for all ions and ranking the organisms corresponding to matched ions results in the identification of the microorganism. The method has been successfully demonstrated on B. subtilis and E. coli, two organisms with completely sequenced genomes. The method has been also tested for identification from mass spectra of mixtures of microorganisms, from spectra of an organism at different growth stages, and from spectra originating at other laboratories. Experimental factors such as MALDI matrix preparation, spectral reproducibility, contaminants, mass range, and measurement accuracy on the database search procedure are addressed too. The proposed method has several advantages over other MS methods for microorganism identification.  相似文献   

16.
MALDI-TOF mass spectrometry has been coupled with Internet-based proteome database search algorithms in an approach for direct microorganism identification. This approach is applied here to characterize intact H. pylori (strain 26695) Gram-negative bacteria, the most ubiquitous human pathogen. A procedure for including a specific and common posttranslational modification, N-terminal Met cleavage, in the search algorithm is described. Accounting for posttranslational modifications in putative protein biomarkers improves the identification reliability by at least an order of magnitude. The influence of other factors, such as number of detected biomarker peaks, proteome size, spectral calibration, and mass accuracy, on the microorganism identification success rate is illustrated as well.  相似文献   

17.
We present a new probability-based method for protein identification using tandem mass spectra and protein databases. The method employs a hypergeometric distribution to model frequencies of matches between fragment ions predicted for peptide sequences with a specific (M + H)+ value (at some mass tolerance) in a protein sequence database and an experimental tandem mass spectrum. The hypergeometric distribution constitutes null hypothesis-all peptide matches to a tandem mass spectrum are random. It is used to generate a score characterizing the randomness of a database sequence match to an experimental tandem mass spectrum and to determine the level of significance of the null hypothesis. For each tandem mass spectrum and database search, a peptide is identified that has the least probability of being a random match to the spectrum and the corresponding level of significance of the null hypothesis is determined. To check the validity of the hypergeometric model in describing fragment ion matches, we used chi2 test. The distribution of frequencies and corresponding hypergeometric probabilities are generated for each tandem mass spectrum. No proteolytic cleavage specificity is used to create the peptide sequences from the database. We do not use any empirical probabilities in this method. The scores generated by the hypergeometric model do not have a significant molecular weight bias and are reasonably independent of database size. The approach has been implemented in a database search algorithm, PEP_PROBE. By using a large set of tandem mass spectra derived from a set of peptides created by digestion of a collection of known proteins using four different proteases, a false positive rate of 5% is demonstrated.  相似文献   

18.
Arising from spontaneous aspartic acid (Asp) isomerization or asparagine (Asn) deamidation, isoaspartic acid (isoAsp, isoD, or beta-Asp) is a ubiquitous nonenzymatic modification of proteins and peptides. Because there is no mass difference between isoaspartyl and aspartyl species, sensitive and specific detection of isoAsp, particularly in complex samples, remains challenging. Here we report a novel assay for Asp isomerization by isotopic labeling with (18)O via a two-step process: the isoAsp peptide is first specifically methylated by protein isoaspartate methyltransferase (PIMT, EC 2.1.1.77) to the corresponding methyl ester, which is subsequently hydrolyzed in (18)O-water to regenerate isoAsp. The specific replacement of (16)O with (18)O at isoAsp leads to a mass shift of 2 Da, which can be automatically and unambiguously recognized using standard mass spectrometry, such as collision-induced dissociation (CID), and data analysis algorithms. Detection and site identification of several isoAsp peptides in a monoclonal antibody and the β-delta sleep-inducing peptide (DSIP) are demonstrated.  相似文献   

19.
We present a Web-based application that uses whole-protein masses determined by mass spectrometry to identify putative co- and posttranslational proteolytic cleavages and chemical modifications. The protein cleavage and modification engine (PROCLAME) requires as input an intact mass measurement and a precursor identification based on peptide mass fingerprinting or tandem mass spectrometry. This approach predicts mass-modifying events using a depth-first tree search, bounded by a set of rules controlled by a custom-built fuzzy logic engine, to explore a large number of possible combinations of modifications accounting for the experimental mass. Candidates are saved during a search if they are within a user-specified instrument mass accuracy; the total number of possible candidates searched is based on a specified fuzzy cutoff score. Candidates are scored and ranked using a simple probabilistic model. There is generally not enough information in an intact mass measurement to determine a single unique protein characterization; however, the program provides utility by expediting the identification of sets of putative events consistent with the mass data and ranking them for further investigation. This approach uses a simple, intuitive rule base and lends itself to discovery of unannotated posttranslational events. We have assessed the program with both in silico-generated test data and with published data from an analysis of large ribosomal subunit proteins, both from the yeast S. cerevisiae. Results indicate a high degree of sensitivity and specificity in characterizing proteins whose masses resulted from reasonable proteolysis and covalent modification scenarios. The application is available on the web at http://proclame.unc.edu.  相似文献   

20.
Online liquid chromatography-mass spectrometric (LC-MS) analysis of intact proteins (i.e., top-down proteomics) is a growing area of research in the mass spectrometry community. A major advantage of top-down MS characterization of proteins is that the information of the intact protein is retained over the vastly more common bottom-up approach that uses protease-generated peptides to search genomic databases for protein identification. Concurrent to the emergence of top-down MS characterization of proteins has been the development and implementation of the stable isotope labeling of amino acids in cell culture (SILAC) method for relative quantification of proteins by LC-MS. Herein we describe the qualitative and quantitative top-down characterization of proteins derived from SILAC-labeled Aspergillus flavus using nanoflow reversed-phase liquid chromatography directly coupled to a linear ion trap Fourier transform ion cyclotron resonance mass spectrometer (nLC-LTQ-FTICR-MS). A. flavus is a toxic filamentous fungus that significantly impacts the agricultural economy and human health. SILAC labeling improved the confidence of protein identification, and we observed 1318 unique protein masses corresponding to 659 SILAC pairs, of which 22 were confidently identified. However, we have observed some limiting issues with regard to protein quantification using top-down MS/MS analyses of SILAC-labeled proteins. The role of SILAC labeling in the presence of competing endogenously produced amino acid residues and its impact on quantification of intact species are discussed in detail.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号