首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Database-searching algorithms compatible with shotgun proteomics match a peptide tandem mass spectrum to a predicted mass spectrum for an amino acid sequence within a database. SEQUEST is one of the most common software algorithms used for the analysis of peptide tandem mass spectra by using a cross-correlation (XCorr) scoring routine to match tandem mass spectra to model spectra derived from peptide sequences. To assess a match, SEQUEST uses the difference between the first- and second-ranked sequences (ACn). This value is dependent on the database size, search parameters, and sequence homologies. In this report, we demonstrate the use of a scoring routine (SEQUEST-NORM) that normalizes XCorr values to be independent of peptide size and the database used to perform the search. This new scoring routine is used to objectively calculate the percent confidence of protein identifications and posttranslational modifications based solely on the XCorr value.  相似文献   

2.
Reliable identification of posttranslational modifications is key to understanding various cellular regulatory processes. We describe a tool, InsPecT, to identify posttranslational modifications using tandem mass spectrometry data. InsPecT constructs database filters that proved to be very successful in genomics searches. Given an MS/MS spectrum S and a database D, a database filter selects a small fraction of database D that is guaranteed (with high probability) to contain a peptide that produced S. InsPecT uses peptide sequence tags as efficient filters that reduce the size of the database by a few orders of magnitude while retaining the correct peptide with very high probability. In addition to filtering, InsPecT also uses novel algorithms for scoring and validating in the presence of modifications, without explicit enumeration of all variants. InsPecT identifies modified peptides with better or equivalent accuracy than other database search tools while being 2 orders of magnitude faster than SEQUEST, and substantially faster than X!TANDEM on complex mixtures. The tool was used to identify a number of novel modifications in different data sets, including many phosphopeptides in data provided by Alliance for Cellular Signaling that were missed by other tools.  相似文献   

3.
A powerful technique for peptide and protein identification is tandem mass spectrometry followed by database search using a program such as SEQUEST or Mascot. These programs, however, become slow and lose sensitivity when allowing nonspecific cleavages or peptide modifications. De novo sequencing and hybrid methods such as sequence tagging offer speed and robustness for wider searches, yet these approaches require better spectra with more complete and consecutive fragmentation and, hence, are less sensitive to low-abundance peptides. Here we describe a new hybrid method that retains the sensitivity of pure database search. The method uses a small amount of de novo analysis to identify likely b- and y-ion peaks--"lookup peaks"--that can then be used to extract candidate peptides from the database, with the number of candidates tunable to fit a computing budget. We describe a program called ByOnic that implements this method, and we benchmark ByOnic on several data sets, including one of mouse blood plasma spiked with low concentrations of recombinant human proteins. We demonstrate that ByOnic is more sensitive than sequence tagging and, indeed, more sensitive than the three most popular pure database search tools--SEQUEST, Mascot, and X!Tandem--on both the peptide and protein levels. On the mouse plasma samples, ByOnic consistently found spiked proteins missed by the other tools.  相似文献   

4.
We present a statistical model to estimate the accuracy of peptide assignments to tandem mass (MS/MS) spectra made by database search applications such as SEQUEST. Employing the expectation maximization algorithm, the analysis learns to distinguish correct from incorrect database search results, computing probabilities that peptide assignments to spectra are correct based upon database search scores and the number of tryptic termini of peptides. Using SEQUEST search results for spectra generated from a sample of known protein components, we demonstrate that the computed probabilities are accurate and have high power to discriminate between correctly and incorrectly assigned peptides. This analysis makes it possible to filter large volumes of MS/MS database search results with predictable false identification error rates and can serve as a common standard by which the results of different research groups are compared.  相似文献   

5.
With the increasing availability of de novo sequencing algorithms for interpreting high-mass accuracy tandem mass spectrometry (MS/MS) data, there is a growing need for programs that accurately identify proteins from de novo sequencing results. De novo sequences derived from tandem mass spectra of peptides often contain ambiguous regions where the exact amino acid order cannot be determined. One problem this poses for sequence alignment algorithms is the difficulty in distinguishing discrepancies due to de novo sequencing errors from actual genomic sequence variation and posttranslational modifications. We present a novel, mass-based approach to sequence alignment, implemented as a program called OpenSea, to resolve these problems. In this approach, de novo and database sequences are interpreted as masses of residues, and the masses, rather than the amino acid codes, are compared. To provide further flexibility, the masses can be aligned in groups, which can resolve many de novo sequencing errors. The performance of OpenSea was tested with three types of data: a mixture of known proteins, a mixture of unknown proteins that commonly contain sequence variations, and a mixture of posttranslationally modified known proteins. In all three cases, we demonstrate that OpenSea can identify more peptides and proteins than commonly used database-searching programs (SEQUEST and ProteinLynx) while accurately locating sequence variation sites and unanticipated posttranslational modifications in a high-throughput environment.  相似文献   

6.
For the identification and characterization of proteins harboring posttranslational modifications (PTMs), a "top down" strategy using mass spectrometry has been forwarded recently but languishes without tailored software widely available. We describe a Web-based software and database suite called ProSight PTM constructed for large-scale proteome projects involving direct fragmentation of intact protein ions. Four main components of ProSight PTM are a database retrieval algorithm (Retriever), MySQL protein databases, a file/data manager, and a project tracker. Retriever performs probability-based identifications from absolute fragment ion masses, automatically compiled sequence tags, or a combination of the two, with graphical rendering and browsing of the results. The database structure allows known and putative protein forms to be searched, with prior or predicted PTM knowledge used during each search. Initial functionality is illustrated with a 36-kDa yeast protein identified from a processed cell extract after automated data acquisition using a quadrupole-FT hybrid mass spectrometer. A +142-Da delta(m) on glyceraldehyde-3-phosphate dehydrogenase was automatically localized between Asp90 and Asp192, consistent with its two cystine residues (149 and 153) alkylated by acrylamide (+71 Da each) during the gel-based sample preparation. ProSight PTM is the first search engine and Web environment for identification of intact proteins (https://prosightptm.scs.uiuc.edu/).  相似文献   

7.
A widespread proteomics procedure for characterizing a complex mixture of proteins combines tandem mass spectrometry and database search software to yield mass spectra with identified peptide sequences. The same peptides are often detected in multiple experiments, and once they have been identified, the respective spectra can be used for future identifications. We present a method for collecting previously identified tandem mass spectra into a reference library that is used to identify new spectra. Query spectra are compared to references in the library to find the ones that are most similar. A dot product metric is used to measure the degree of similarity. With our largest library, the search of a query set finds 91% of the spectrum identifications and 93.7% of the protein identifications that could be made with a SEQUEST database search. A second experiment demonstrates that queries acquired on an LCQ ion trap mass spectrometer can be identified with a library of references acquired on an LTQ ion trap mass spectrometer. The dot product similarity score provides good separation of correct and incorrect identifications.  相似文献   

8.
Proteolytic peptide mass mapping as measured by mass spectrometry provides a major approach for the identification of proteins. A protein is usually identified by the best match between the measured and calculated m/z values of the proteolytic peptides. A unique identification is, however, heavily dependent upon the mass accuracy and sequence coverage of the fragment ions generated by peptide ionization. Without ultrahigh instrumental accuracy, it is possible to increase the specificity of the assignments of particular proteolytic peptides by the incorporation of selected amino acid residue(s) enriched with stable isotope(s) into the protein sequence. Here we report this novel method of generating residue-specific mass-tagged proteolytic peptides for accurate and efficient protein identification. Selected amino acids are labeled with 13C/15N/2H and incorporated into proteins in a sequence-specific manner during cell culturing. Each of these labeled amino acids carries a defined mass change encoded in its monoisotopic distribution pattern. Through their characteristic patterns, the peptides with mass tags can then be readily distinguished from other peptides in mass spectra. This method of identifying unique proteins can also be extended to protein complexes and will significantly increase data search specificity, efficiency, and accuracy for protein identifications.  相似文献   

9.
Here we propose a novel method for rapidly identifying proteins in complex mixtures. A list of candidate proteins (including provision for posttranslational modifications) is obtained by database searching, within a specified mass range about the accurately measured mass (e.g., +/- 0.1 Da at 10 kDa) of the intact protein, by capillary liquid chromatography electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry (LC ESI FT-ICR MS). On alternate scans, LC ESI infrared multiphoton dissociation (IRMPD) FT-ICR MS yields mostly b and y fragment ions for each protein, from which the correct candidate is identified as the one with the highest "hit" score (i.e., most b and y fragments matching the candidate database protein amino acid sequence masses) and sequence "tag" score (based on a series of fragment sequences differing in mass by 1 or 2 amino acids). The method succeeds in uniquely identifying each of a mixture of five proteins treated as unknowns (melittin, ubiquitin, GroES, myoglobin, carbonic anhydrase II), from more than 1000 possible database candidates within a +/- 500 Da mass window. We are also able to identify posttranslational modifications of two of the proteins (mellitin and GroES). The method is simple, rapid, and definitive and is extendable to a mixture of affinity-selected proteins, to identify proteins with a common biological function.  相似文献   

10.
Electron capture dissociation (ECD) has previously been shown by other research groups to result in greater peptide sequence coverage than other ion dissociation techniques and to localize labile posttranslational modifications. Here, ECD has been achieved for 10-13-mer peptides microelectrosprayed from 10 nM (10 fmol/microL) solutions and for tryptic peptides from a 50 nM unfractionated digest of a 28-kDa protein. Tandem Fourier transform ion cyclotron resonance (FTICR) mass spectra contain fragment ions corresponding to cleavages at all possible peptide backbone amine bonds, except on the N-terminal side of proline, for substance P and neurotensin. For luteinizing hormone-releasing hormone, all but two expected backbone amine bond cleavages are observed. The tandem FTICR mass spectra of the tryptic peptides contain fragment ions corresponding to cleavages at 6 of 12 (1545.7-Da peptide) and 8 of 21 (2944.5-Da peptide) expected backbone amine bonds. The present sensitivity is 200-2000 times higher than previously reported. These results show promise for ECD as a tool to produce sequence tags for identification of peptides in complex mixtures available only in limited amounts, as in proteomics.  相似文献   

11.
Here we describe an algorithm for identifying peptides/ proteins of known sequence and unknown peptides from partial spectra generated by an in-source decay (ISD) technique coupled with matrix-assisted laser desorption/ ionization time-of-flight mass spectrometry. The identification of protein fragments is processed with a software program called CMATCH, which generates candidate subsequences for both known peptides/proteins and unknown peptides for the major product ions in the spectral range m/z 400-5000 and then matches these to known protein sequences contained in a reference database for the known peptides/proteins. CMATCH, which is compiled for MSDOS or WINDOWS95/NT, has two main advantages: first, the candidate subsequences are generated automatically without the need for supplementary information concerning the distribution of either N-terminal or C-terminal ions in the spectra for both known peptides/proteins and unknown peptides; second, the highest coordinated homologous sequences are picked up automatically from the reference database as the best matches with known peptides/proteins. Examples from the ISD spectra of several test proteins demonstrate the efficacy of this protein identification software.  相似文献   

12.
Database search identification algorithms, such as Sequest and Mascot, constitute powerful enablers for proteomic tandem mass spectrometry. We introduce DBDigger, an algorithm that reorganizes the database identification process to remove a problematic bottleneck. Typically such algorithms determine which candidate sequences can be compared to each spectrum. Instead, DBDigger determines which spectra can be compared to each candidate sequence, enabling the software to generate candidate sequences only once for each HPLC separation rather than for each spectrum. This reorganization also reduces the number of times a spectrum must be predicted for a particular candidate sequence and charge state. As a result, DBDigger can accelerate some database searches by more than an order of magnitude. In addition, the software offers features to reduce the performance degradation introduced by posttranslational modification (PTM) searching. DBDigger allows researchers to specify the sequence context in which each PTM is possible. In the case of CNBr digests, for example, modified methionine residues can be limited to occur only at the C-termini of peptides. Use of "context-dependent" PTM searching reduces the performance penalty relative to traditional PTM searching. We characterize the performance possible with DBDigger, showcasing MASPIC, a new statistical scorer. We describe the implementation of these innovations in the hope that other researchers will employ them for rapid and highly flexible proteomic database search.  相似文献   

13.
There are several computer programs that can match peptide tandem mass spectrometry data to their exactly corresponding database sequences, and in most protein identification projects, these programs are utilized in the early stages of data interpretation. However, situations frequently arise where tandem mass spectral data cannot be correlated with any database sequences. In these cases, the unmatched data could be due to peptides derived from novel proteins, allelic or species-derived variants of known proteins, or posttranslational or chemical modifications. Two additional problems are frequently encountered in high-throughput protein identification. First, it is difficult to quickly sift through large amounts of data to identify those spectra that, due to poor signal or contaminants, can be ignored. Second, it is important to find incorrect database matches (false positives). We have chosen to address these difficulties by performing automatic de novo sequencing using a computer program called Lutefisk. Sequence candidates obtained are used as input in a homology-based database search program called CIDentify to identify variants of known proteins. Comparison of database-derived sequences with de novo sequences allows for electronic validation of database matches even if the latter are not completely correct. Modifications to the original Lutefisk program have been implemented to handle data obtained from triple quadrupole, ion trap, and quadrupole/time-of-flight hybrid (Qtof) mass spectrometers. For example, the linearity of mass errors due to temperature-dependent expansion of the flight tube in a Qtof was exploited such that isobaric amino acids (glutamine/lysine and oxidized methionine/ phenylalanine) can be differentiated without careful attention to mass calibration.  相似文献   

14.
We present a new probability-based method for protein identification using tandem mass spectra and protein databases. The method employs a hypergeometric distribution to model frequencies of matches between fragment ions predicted for peptide sequences with a specific (M + H)+ value (at some mass tolerance) in a protein sequence database and an experimental tandem mass spectrum. The hypergeometric distribution constitutes null hypothesis-all peptide matches to a tandem mass spectrum are random. It is used to generate a score characterizing the randomness of a database sequence match to an experimental tandem mass spectrum and to determine the level of significance of the null hypothesis. For each tandem mass spectrum and database search, a peptide is identified that has the least probability of being a random match to the spectrum and the corresponding level of significance of the null hypothesis is determined. To check the validity of the hypergeometric model in describing fragment ion matches, we used chi2 test. The distribution of frequencies and corresponding hypergeometric probabilities are generated for each tandem mass spectrum. No proteolytic cleavage specificity is used to create the peptide sequences from the database. We do not use any empirical probabilities in this method. The scores generated by the hypergeometric model do not have a significant molecular weight bias and are reasonably independent of database size. The approach has been implemented in a database search algorithm, PEP_PROBE. By using a large set of tandem mass spectra derived from a set of peptides created by digestion of a collection of known proteins using four different proteases, a false positive rate of 5% is demonstrated.  相似文献   

15.
The characterization of proteomes by mass spectrometry is largely limited to organisms with sequenced genomes. To identify proteins from organisms with unsequenced genomes, database sequences from related species must be employed for sequence-similarity protein identifications. Peptide sequence tags (Mann, 1994) have been used successfully for the identification of proteins in sequence databases using partially interpreted tandem mass spectra of tryptic peptides. We have extended the ability of sequence tag searching to the identification of proteins whose sequences are yet unknown but are homologous to known database entries. The MultiTag method presented here assigns statistical significance to matches of multiple error-tolerant sequence tags to a database entry and ranks alignments by their significance. The MultiTag approach has the distinct advantage over other sequence-similarity approaches of being able to perform sequence-similarity identifications using only very short (2-4) amino acid residue stretches of peptide sequences, rather than complete peptide sequences deduced by de novo interpretation of tandem mass spectra. This feature facilitates the identification of low abundance proteins, since noisy and low-intensity tandem mass spectra can be utilized.  相似文献   

16.
Currently available mass spectrometric (MS) techniques lack specificity in identifying protein modifications because molecular mass is the only parameter used to characterize these changes. Consequently, the suspected modified peptides are subjected to tandem MS/MS sequencing that may demand more time and sample. We report the use of stable isotope-enriched amino acids as residue-specific "mass signatures" for the rapid and sensitive detection of protein modifications directly from the peptide mass map (PMM) without enrichment of the modified peptides. These mass signatures are easily recognized through their characteristic spectral patterns and provide fingerprints for peptides containing the same content of specific amino acid residue(s) in a PMM. Without the need for tandem MS/MS sequencing, a peptide and its modified form(s) can readily be identified through their identical fingerprints, regardless of the nature of modifications. In this report, we demonstrate this strategy for the detection of methionine oxidation and protein phosphorylation. More interestingly, the phosphorylation of a histone protein, H2A.X, obtained from human skin fibroblast cells, was effectively identified in response to low-dose radiation. In general, this strategy of residue-specific mass tagging should be applicable to other posttranslational modifications.  相似文献   

17.
Wan Y  Yang A  Chen T 《Analytical chemistry》2006,78(2):432-437
An accurate scoring function for database search is crucial for peptide identification using tandem mass spectrometry. Although many mathematical models have been proposed to score peptides against tandem mass spectra, our method (called PepHMM, http://msms.cmb.usc.edu) is unique in that it combines information on machine accuracy, mass peak intensity, and correlation among ions into a hidden Markov model (HMM). In addition, we develop a method to calculate statistical significance of the HMM scores. We implement the method and test them on two sets of experimental data generated by two different types of mass spectrometers and compare the results with MASCOT and SEQUEST under the same condition. One experimental results show that PepHMM has a much higher accuracy (with 6.5% error rate) than MASCOT (with 17.4% error rate), and the other experimental results show that PepHMM identifies 43 and 31% more correct spectra than SEQUEST and MASCOT, respectively.  相似文献   

18.
TwinPeaks, a close variant of the SEQUEST protein identification algorithm, is capable of unrestricted, large-scale, identification of post-translation modifications (PTMs). TwinPeaks is applied on a sample of 100441 tandem mass spectra from the HUPO Plasma Proteome Project data set, with full non-redundant human as a reference protein database. With a 3.5% error rate, TwinPeaks identifies a collection of 539 spectra that were not identified by the usual PTM-restricted identification algorithm. At this error rate, TwinPeaks increases the rate of spectra identifications by at least 17.6%, making unrestricted PTM identification an integral part of proteomics.  相似文献   

19.
Lu B  Ruse C  Xu T  Park SK  Yates J 《Analytical chemistry》2007,79(4):1301-1310
We developed and compared two approaches for automated validation of phosphopeptide tandem mass spectra identified using database searching algorithms. Phosphopeptide identifications were obtained through SEQUEST searches of a protein database appended with its decoy (reversed sequences). Statistical evaluation and iterative searches were employed to create a high-quality data set of phosphopeptides. Automation of postsearch validation was approached by two different strategies. By using statistical multiple testing, we calculate a p value for each tentative peptide phosphorylation. In a second method, we use a support vector machine (SVM; a machine learning algorithm) binary classifier to predict whether a tentative peptide phosphorylation is true. We show good agreement (85%) between postsearch validation of phosphopeptide/spectrum matches by multiple testing and that from support vector machines. Automatic methods conform very well with manual expert validation in a blinded test. Additionally, the algorithms were tested on the identification of synthetic phosphopeptides. We show that phosphate neutral losses in tandem mass spectra can be used to assess the correctness of phosphopeptide/spectrum matches. An SVM classifier with a radial basis function provided classification accuracy from 95.7% to 96.8% of the positive data set, depending on search algorithm used. Establishing the efficacy of an identification is a necessary step for further postsearch interrogation of the spectra for complete localization of phosphorylation sites. Our current implementation performs validation of phosphoserine/phosphothreonine-containing peptides having one or two phosphorylation sites from data gathered on an ion trap mass spectrometer. The SVM-based algorithm has been implemented in the software package DeBunker. We illustrate the application of the SVM-based software DeBunker on a large phosphorylation data set.  相似文献   

20.
Protein-protein interactions are key to function and regulation of many biological pathways. To facilitate characterization of protein-protein interactions using mass spectrometry, a new data acquisition/analysis pipeline was designed. The goal for this pipeline was to provide a generic strategy for identifying cross-linked peptides from single LC/MS/MS data sets, without using specialized cross-linkers or custom-written software. To achieve this, each peptide in the pair of cross-linked peptides was considered to be "post-translationally" modified with an unknown mass at an unknown amino acid. This allowed use of an open-modification search engine, Popitam, to interpret the tandem mass spectra of cross-linked peptides. False positives were reduced and database selectivity increased by acquiring precursors and fragments at high mass accuracy. Additionally, a high-charge-state-driven data acquisition scheme was utilized to enrich data sets for cross-linked peptides. This open-modification search based pipeline was shown to be useful for characterizing both chemical as well as native cross-links in proteins. The pipeline was validated by characterizing the known interactions in the chemically cross-linked CYP2E1-b5 complex. Utility of this method in identifying native cross-links was demonstrated by mapping disulfide bridges in RcsF, an outer membrane lipoprotein involved in Rcs phosphorelay.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号