首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present a statistical model to estimate the accuracy of peptide assignments to tandem mass (MS/MS) spectra made by database search applications such as SEQUEST. Employing the expectation maximization algorithm, the analysis learns to distinguish correct from incorrect database search results, computing probabilities that peptide assignments to spectra are correct based upon database search scores and the number of tryptic termini of peptides. Using SEQUEST search results for spectra generated from a sample of known protein components, we demonstrate that the computed probabilities are accurate and have high power to discriminate between correctly and incorrectly assigned peptides. This analysis makes it possible to filter large volumes of MS/MS database search results with predictable false identification error rates and can serve as a common standard by which the results of different research groups are compared.  相似文献   

2.
Recent technological advances have made multidimensional peptide separation techniques coupled with tandem mass spectrometry the method of choice for high-throughput identification of proteins. Due to these advances, the development of software tools for large-scale, fully automated, unambiguous peptide identification is highly necessary. In this work, we have used as a model the nuclear proteome from Jurkat cells and present a processing algorithm that allows accurate predictions of random matching distributions, based on the two SEQUEST scores Xcorr and DeltaCn. Our method permits a very simple and precise calculation of the probabilities associated with individual peptide assignments, as well as of the false discovery rate among the peptides identified in any experiment. A further mathematical analysis demonstrates that the score distributions are highly dependent on database size and precursor mass window and suggests that the probability associated with SEQUEST scores depends on the number of candidate peptide sequences available for the search. Our results highlight the importance of adjusting the filtering criteria to discriminate between correct and incorrect peptide sequences according to the circumstances of each particular experiment.  相似文献   

3.
The purpose of this work is to develop and verify statistical models for protein identification using peptide identifications derived from the results of tandem mass spectral database searches. Recently we have presented a probabilistic model for peptide identification that uses hypergeometric distribution to approximate fragment ion matches of database peptide sequences to experimental tandem mass spectra. Here we apply statistical models to the database search results to validate protein identifications. For this we formulate the protein identification problem in terms of two independent models, two-hypothesis binomial and multinomial models, which use the hypergeometric probabilities and cross-correlation scores, respectively. Each database search result is assumed to be a probabilistic event. The Bernoulli event has two outcomes: a protein is either identified or not. The probability of identifying a protein at each Bernoulli event is determined from relative length of the protein in the database (the null hypothesis) or the hypergeometric probability scores of the protein's peptides (the alternative hypothesis). We then calculate the binomial probability that the protein will be observed a certain number of times (number of database matches to its peptides) given the size of the data set (number of spectra) and the probability of protein identification at each Bernoulli event. The ratio of the probabilities from these two hypotheses (maximum likelihood ratio) is used as a test statistic to discriminate between true and false identifications. The significance and confidence levels of protein identifications are calculated from the model distributions. The multinomial model combines the database search results and generates an observed frequency distribution of cross-correlation scores (grouped into bins) between experimental spectra and identified amino acid sequences. The frequency distribution is used to generate p-value probabilities of each score bin. The probabilities are then normalized with respect to score bins to generate normalized probabilities of all score bins. A protein identification probability is the multinomial probability of observing the given set of peptide scores. To reduce the effect of random matches, we employ a marginalized multinomial model for small values of cross-correlation scores. We demonstrate that the combination of the two independent methods provides a useful tool for protein identification from results of database search using tandem mass spectra. A receiver operating characteristic curve demonstrates the sensitivity and accuracy level of the approach. The shortcomings of the models are related to the cases when protein assignment is based on unusual peptide fragmentation patterns that dominate over the model encoded in the peptide identification process. We have implemented the approach in a program called PROT_PROBE.  相似文献   

4.
Pan S  Gu S  Bradbury EM  Chen X 《Analytical chemistry》2003,75(6):1316-1324
Identification of proteins with low sequence coverage using mass spectrometry (MS) requires tandem MS/MS peptide sequencing. It is very challenging to obtain a complete or to interpret an incomplete tandem MS/MS spectrum from fragmentation of a weak peptide ion signal for sequence assignment. Here, we have developed an effective and high-throughput MALDI-TOF-based method for the identification of membrane and other low-abundance proteins with a simple, one-dimensional separation step. In this approach, several stable isotope-labeled amino acid precursors were selected to mass-tag, in parallel, the human proteome of human skin fibroblast cells in a residue-specific manner during in vivo cell culturing. These labeled residues can be recognized by their characteristic isotope patterns in MALDI-TOF MS spectra. The isotope pattern of particular peptides induced by the different labeled precursors provides information about their amino acid compositions. The specificity of peptide signals in a peptide mass mapping is thus greatly enhanced, resolving a high degree of mass degeneracy of proteolytic peptides derived from the complex human proteome. Further, false positive matches in database searching can be eliminated. More importantly, proteins can be accurately identified through a single peptide with its m/z value and partial amino acid composition. With the increased solubility of hydrophobic proteins in SDS, we have demonstrated that our approach is effective for the identification of membrane and low-abundant proteins with low sequence coverage and weak signal intensity, which are often difficult for obtaining informative fragment patterns in tandem MS/MS peptide sequencing analysis.  相似文献   

5.
6.
Wang Z  Dunlop K  Long SR  Li L 《Analytical chemistry》2002,74(13):3174-3182
The availability of a suitable database is critical in a proteomic approach for bacterial identification by mass spectrometry (MS). The major limitation of the present public proteome database is the lack of extensive low-mass bacterial protein entries with masses experimentally verified for most bacteria. Here, we present a method based on mass spectrometry to create protein mass tables specifically tailored for bacterial identification. Several issues related to the detection of bacterial proteins for the purpose of database creation are addressed. Three species of bacteria, namely, Escherichia coli, Bacillus megaterium, and Citrobacter freundii, which can be found in the ambient environment, were chosen for this study. Direct matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) MS analysis of each bacterial extract reveals 20-29 protein components in the mass range from 2000 to 20,000 Da. HPLC fractionation of bacterial extracts followed by off-line MALDI-TOF analysis of individual fractions detects 156-423 components. Analysis of the extracts by HPLC/electrospray ionization MS shows the number of detectable proteins in the range of 46-59. Although a number of components were common to the three detection schemes employed, some unique components were found using each of these techniques. In addition, for E. coli where a large proteome database exists in the public domain, a number of masses detected by the mass spectrometric methods do not match with the proteome database. Compared to the public proteome database, the mass tables generated in this work are demonstrated to be more useful for bacterial identification in an application where the bacteria of interest have limited protein entries in the public database. The implication of this work for future development of a comprehensive mass database is discussed.  相似文献   

7.
Hu A  Tsai PJ  Ho YP 《Analytical chemistry》2005,77(5):1488-1495
In this paper, we propose a new strategy for identifying specific bacteria in bacterial mixtures by using CE-selective MS/MS of peptide marker ions associated with the bacteria of interest. We searched the CE-MS/MS spectra acquired from the proteolytic digests of pure bacterial cell extracts against protein databases. The identified peptides that match the protein associated with the corresponding species were selected as marker ions for bacterial identification. Specific peptide marker ions were obtained for each of the following three pathogens: Pseudomonas aeruginasa, Staphylococcus aureus, and Staphylococcus epidermidis. To identify a bacterial species in a sample, we performed CE-MS/MS analysis of the selected marker ions in the proteolytic digest of the cell extract and then performed protein database searches. The selected peptides that we identified correctly from Xcorr values ranking at the top of the search results allowed us to identify the corresponding bacterial species present in the sample. We have applied this method successfully to the identification of various mixtures of the three pathogens. Even minor bacterial species present at a concentration of 1% can be identified with great confidence. This method for CE-MS/MS analysis of bacteria-specific marker peptides provides excellent selectivity and high accuracy when identifying bacterial species in complex systems. In addition, we have used this approach to identify P. aeruginasa in a saliva sample spiked with E.coli and P. aeruginasa.  相似文献   

8.
Gu S  Pan S  Bradbury EM  Chen X 《Analytical chemistry》2002,74(22):5774-5785
Here, we describe a method for protein identification and de novo peptide sequencing. Through in vivo cell culturing, the deuterium-labeled lysine residue (Lys-d4) introduces a 4-Da mass tag at the carboxyl terminus of proteolytic peptides when cleaved by certain proteases. The 4-Da mass difference between the unlabeled and the deuterated lysine assigns a mass signature to all lysine-containing peptides in any pool of proteolytic peptides for protein identification directly through peptide mass mapping. Furthermore, it was used to distinguish between N- and C-terminal fragments for accurate assignments of daughter ions in tandem MS/MS spectra for sequence assignment. This technique simplifies the labeling scheme and the interpretation of the MS/MS spectra by assigning different series of fragment ions correctly and easily and is very useful in de novo peptide sequencing. We have also successfully implemented this approach to the analysis of protein mixtures derived from the human proteome.  相似文献   

9.
We have developed a pattern recognition algorithm called SALSA (scoring algorithm for spectral analysis) for the detection of specific features in tandem MS (MS-MS) spectra. Application of the SALSA algorithm to the detection of peptide MS-MS ion series enables identification of MS-MS spectra displaying characteristics of specific peptide sequences. SALSA analysis scores MS-MS spectra based on correspondence between theoretical ion series for peptide sequence motifs and actual MS-MS product ion series, regardless of their absolute positions on the m/z axis. Analyses of tryptic digests of bovine serum albumin (BSA) by LC-MS-MS followed by SALSA analysis detected MS-MS spectra for both unmodified and multiple modified forms of several BSA tryptic peptides. SALSA analysis of MS-MS data from mixtures of BSA and human serum albumin (HSA) tryptic digests indicated that ion series searches with BSA peptide sequence motifs identified MS-MS spectra for both BSA and closely related HSA peptides. Optimal discrimination between MS-MS spectra of variant peptide forms is achieved when the SALSA search criteria are optimized to the target peptide. Application of SALSA to LC-MS-MS proteome analysis will facilitate the characterization of modified and sequence variant proteins.  相似文献   

10.
Current algorithms for quantifying peptide identification confidence in the accurate mass and time (AMT) tag approach assume that the AMT tags themselves have been correctly identified. However, there is uncertainty in the identification of AMT tags, because this is based on matching LC-MS/MS fragmentation spectra to peptide sequences. In this paper, we incorporate confidence measures for the AMT tag identifications into the calculation of probabilities for correct matches to an AMT tag database, resulting in a more accurate overall measure of identification confidence for the AMT tag approach. The method is referenced as Statistical Tools for AMT Tag Confidence (STAC). STAC additionally provides a uniqueness probability (UP) to help distinguish between multiple matches to an AMT tag and a method to calculate an overall false discovery rate (FDR). STAC is freely available for download, as both a command line and a Windows graphical application.  相似文献   

11.
Correct identification of a peptide sequence from MS/MS data is still a challenging research problem, particularly in proteomic analyses of higher eukaryotes where protein databases are large. The scoring methods of search programs often generate cases where incorrect peptide sequences score higher than correct peptide sequences (referred to as distraction). Because smaller databases yield less distraction and better discrimination between correct and incorrect assignments, we developed a method for editing a peptide-centric database (PC-DB) to remove unlikely sequences and strategies for enabling search programs to utilize this peptide database. Rules for unlikely missed cleavage and nontryptic proteolysis products were identified by data mining 11 849 high-confidence peptide assignments. We also evaluated ion exchange chromatographic behavior as an editing criterion to generate subset databases. When used to search a well-annotated test data set of MS/MS spectra, we found no loss of critical information using PC-DBs, validating the methods for generating and searching against the databases. On the other hand, improved confidence in peptide assignments was achieved for tryptic peptides, measured by changes in DeltaCN and RSP. Decreased distraction was also achieved, consistent with the 3-9-fold decrease in database size. Data mining identified a major class of common nonspecific proteolytic products corresponding to leucine aminopeptidase (LAP) cleavages. Large improvements in identifying LAP products were achieved using the PC-DB approach when compared with conventional searches against protein databases. These results demonstrate that peptide properties can be used to reduce database size, yielding improved accuracy and information capture due to reduced distraction, but with little loss of information compared to conventional protein database searches.  相似文献   

12.
Hu A  Chen CT  Tsai PJ  Ho YP 《Analytical chemistry》2006,78(14):5124-5133
Analysis of microbial mixtures in complex systems, such as clinical samples, using mass spectrometry can be challenging because the specimens may contain mixtures of several pathogens or both pathogens and nonpathogens. We have successfully applied capillary electrophoresis-selective MS/MS of unique peptide marker ions to the identification of common pathogens in clinical diagnosis. We searched the CE-MS/MS spectra acquired from the proteolytic digests of pure bacterial cell extracts against protein databases. The identified peptides that matched a protein associated with a particular pathogen were selected as marker ions to identify that bacterium in clinical specimens. Thirty-four clinical specimens, obtained from pus, wound, sputum, and urine samples, were analyzed using both biochemical and selective MS/MS methods. The bacteria in these clinical samples were cultivated directly, without prior isolation of a pure colony, before performing the selective MS/MS analyses. The bacteria analyzed included both Gram-positive and -negative strains. The match with respect to the pathogens identified was good between the biochemical and the selective MS/MS methods; the matching rate was 91%. The rate was as high as 97% when not considering two specimens for which the bacteria were not grown successfully. Two of the specimens that we identified using the biochemical method as containing two bacterial species were confirmed also through selective tandem MS analysis.  相似文献   

13.
The analysis of mass spectrometry data is still largely based on identification of single MS/MS spectra and does not attempt to make use of the extra information available in multiple MS/MS spectra from partially or completely overlapping peptides. Analysis of MS/MS spectra from multiple overlapping peptides opens up the possibility of assembling MS/MS spectra into entire proteins, similarly to the assembly of overlapping DNA reads into entire genomes. In this paper, we present for the first time a way to detect, score, and interpret overlaps between uninterpreted MS/MS spectra in an attempt to sequence entire proteins rather than individual peptides. We show that this approach not only extends the length of reconstructed amino acid sequences but also dramatically improves the quality of de novo peptide sequencing, even for low mass accuracy MS/MS data.  相似文献   

14.
A statistical model for identifying proteins by tandem mass spectrometry   总被引:51,自引:0,他引:51  
A statistical model is presented for computing probabilities that proteins are present in a sample on the basis of peptides assigned to tandem mass (MS/MS) spectra acquired from a proteolytic digest of the sample. Peptides that correspond to more than a single protein in the sequence database are apportioned among all corresponding proteins, and a minimal protein list sufficient to account for the observed peptide assignments is derived using the expectation-maximization algorithm. Using peptide assignments to spectra generated from a sample of 18 purified proteins, as well as complex H. influenzae and Halobacterium samples, the model is shown to produce probabilities that are accurate and have high power to discriminate correct from incorrect protein identifications. This method allows filtering of large-scale proteomics data sets with predictable sensitivity and false positive identification error rates. Fast, consistent, and transparent, it provides a standard for publishing large-scale protein identification data sets in the literature and for comparing the results obtained from different experiments.  相似文献   

15.
Tandem mass spectrometry (MS/MS) plays an important role in the unambiguous identification and structural elucidation of biomolecules. In contrast to conventional MS/MS approaches for protein identification where an individual polypeptide is sequentially selected and dissociated, a multiplexed-MS/MS approach increases throughput by selecting several peptides for simultaneous dissociation using either infrared multiphoton dissociation (IRMPD) or multiple frequency sustained off-resonance irradiation (SORI) collisionally induced dissociation (CID). The high mass measurement accuracy and resolution of FTICR combined with knowledge of peptide dissociation pathways allows the fragments arising from several different parent ions to be assigned. Herein we report the application of multiplexed-MS/MS coupled with on-line separations for the identification of peptides present in complex mixtures (i.e., whole cell lysate digests). Software was developed to enable "on-the-fly" data-dependent peak selection of a subset of polypeptides from each FTICR MS acquisition. In the subsequent MS/MS acquisitions, several coeluting peptides were fragmented simultaneously using either IRMPD or SORI-CID techniques. The utility of this approach has been demonstrated using a bovine serum albumin tryptic digest separated by capillary LC where multiple peptides were readily identified in single MS/MS acquisitions. We also present initial results from multiplexed-MS/MS analysis of a D. radiodurans whole cell digest to illustrate the utility of this approach for high-throughput analysis of a bacterial proteome.  相似文献   

16.
Due to the complexity of proteome samples, only a portion of peptides and thus proteins can be identified in a single LC-MS/MS analysis in current shotgun proteomics methodologies. It has been shown that replicate runs can be used to improve the comprehensiveness of the proteome analysis; however, high-intensity peptides tend to be analyzed repeatedly in different runs, thus reducing the chance of identifying low-intensity peptides. In contrast to commonly used online ESI-MS, offline MALDI decouples the separation from MS acquisition, thus allowing in-depth selection for specific precursor ions. Accordingly, we extended a strategy for offline LC-MALDI MS/MS analysis using a precursor ion exclusion list consisting of all identified peptides in preceding runs. The exclusion list eliminated redundant MS/MS acquisitions in subsequent runs, thus reducing MALDI sample depletion and allowing identification of a larger number of peptide identifications in the cumulative dataset. In the analysis of the digest of an Escherichia coli lysate, the exclusion list strategy resulted in a 25% increase in the number of unique peptide identifications in the second run, in contrast to simply pooling MS/MS data from two replicate runs. To reduce the increased LC analysis time for repeat runs, a four-column multiplexed LC system was developed to carry out separation simultaneously. The multiplexed LC-MALDI MS provides a high-throughput platform to utilize the exclusion list strategy in proteome analysis.  相似文献   

17.
The goal of this study was the development of N-terminal tags to improve peptide identification using high-throughput MALDI-TOF/TOF MS. Part 1 of the study was focused on the influence of derivatization on the intensities of MALDI-TOF MS signals of peptides. In part 2, various derivatization approaches for the improvement of peptide fragmentation efficiency in MALDI-TOF/TOF MS are explored. We demonstrate that permanent cation tags, while significantly improving signal intensity in the MS mode, lead to severe suppression of MS/MS fragmentation, making these tags unsuitable for high-throughput MALDI-TOF/TOF MS analysis. In the present work, it was found that labeling with Alexa Fluor 350, a coumarin tag containing a sulfo group, along with guanidation of epsilon-amino groups of Lys, could enhance unimolecular fragmentation of peptides with the formation of a high-intensity y-ion series, while the peptide intensities in the MS mode were not severely affected. LC-MALDI-TOF/TOF MS analysis of tryptic peptides from the SCX fractions of an E. coli lysate revealed improved peptide scores, a doubling of the total number of peptides, and a 30% increase in the number of proteins identified, as a result of labeling. Furthermore, by combining the data from native and labeled samples, confidence in correct identification was increased, as many proteins were identified by different peptides in the native and labeled data sets. Additionally, derivatization was found not to impair chromatographic behavior of peptides. All these factors suggest that labeling with Alexa Fluor 350 is a promising approach to the high-throughput LC-MALDI-TOF/TOF MS analysis of proteomic samples.  相似文献   

18.
Peptide identification based on tandem mass spectrometry and database searching algorithms has become one of the central technologies in proteomics. At the heart of this technology is the ability to reproducibly acquire high-quality tandem mass spectra for database interrogation. The variability in tandem mass spectra generation is often assumed to be minimal, and peptide identifications are typically based on a single tandem mass spectrum. In this paper, we characterize the variance of scores derived from replicate tandem mass spectra using several database search algorithms and demonstrate the effects of spectral variability on the correct identification of peptides. We show that the variance associated with the collection of tandem mass spectra can be substantial leading to sizable errors in search algorithm scores ( approximately 5-25% RSD) and ultimately incorrect assignments. Processing strategies are discussed to minimize the impact of tandem mass spectra variability on peptide identification.  相似文献   

19.
The present study reports a procedure developed for the identification of SDS-polyacrylamide gel electrophoretically separated proteins using an electrospray ionization quadrupole time-of-flight mass spectrometer (Q-TOF MS) equipped with pressurized sample introduction. It is based on in-gel digestion of the proteins without previous reduction/alkylation and on the capability of the Q-TOF MS to provide data suitable for peptide mass fingerprinting database searches and for tandem mass spectrometry (MS/MS) database searches (sequence tags). Omitting the reduction/alkylation step reduces sample contamination and sample loss, resulting in increased sensitivity. Omitting this step can leave disulfide-connected peptides in the analyte that can lead to misleading or ambiguous results from the peptide mass fingerprinting database search. This uncertainty, however, is overcome by MS/MS analysis of the peptides. Furthermore, the two complementary MS approaches increase the accuracy of the assignment of the unknown protein. This procedure is thus, highly sensitive, accurate, and rapid. In combination with pressurized nanospray sample introduction, it is suitable for automated sample handling. Here, we apply this approach to identify protein contaminants observed during the purification of the yeast DNA mismatch repair protein Mlh 1.  相似文献   

20.
De novo sequencing of peptides poses one of the most challenging tasks in data analysis for proteome research. In this paper, a generative hidden Markov model (HMM) of mass spectra for de novo peptide sequencing which constitutes a novel view on how to solve this problem in a Bayesian framework is proposed. Further extensions of the model structure to a graphical model and a factorial HMM to substantially improve the peptide identification results are demonstrated. Inference with the graphical model for de novo peptide sequencing estimates posterior probabilities for amino acids rather than scores for single symbols in the sequence. Our model outperforms state-of-the-art methods for de novo peptide sequencing on a large test set of spectra.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号