首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Here we describe an algorithm for identifying peptides/ proteins of known sequence and unknown peptides from partial spectra generated by an in-source decay (ISD) technique coupled with matrix-assisted laser desorption/ ionization time-of-flight mass spectrometry. The identification of protein fragments is processed with a software program called CMATCH, which generates candidate subsequences for both known peptides/proteins and unknown peptides for the major product ions in the spectral range m/z 400-5000 and then matches these to known protein sequences contained in a reference database for the known peptides/proteins. CMATCH, which is compiled for MSDOS or WINDOWS95/NT, has two main advantages: first, the candidate subsequences are generated automatically without the need for supplementary information concerning the distribution of either N-terminal or C-terminal ions in the spectra for both known peptides/proteins and unknown peptides; second, the highest coordinated homologous sequences are picked up automatically from the reference database as the best matches with known peptides/proteins. Examples from the ISD spectra of several test proteins demonstrate the efficacy of this protein identification software.  相似文献   

2.
The purpose of this work is to develop and verify statistical models for protein identification using peptide identifications derived from the results of tandem mass spectral database searches. Recently we have presented a probabilistic model for peptide identification that uses hypergeometric distribution to approximate fragment ion matches of database peptide sequences to experimental tandem mass spectra. Here we apply statistical models to the database search results to validate protein identifications. For this we formulate the protein identification problem in terms of two independent models, two-hypothesis binomial and multinomial models, which use the hypergeometric probabilities and cross-correlation scores, respectively. Each database search result is assumed to be a probabilistic event. The Bernoulli event has two outcomes: a protein is either identified or not. The probability of identifying a protein at each Bernoulli event is determined from relative length of the protein in the database (the null hypothesis) or the hypergeometric probability scores of the protein's peptides (the alternative hypothesis). We then calculate the binomial probability that the protein will be observed a certain number of times (number of database matches to its peptides) given the size of the data set (number of spectra) and the probability of protein identification at each Bernoulli event. The ratio of the probabilities from these two hypotheses (maximum likelihood ratio) is used as a test statistic to discriminate between true and false identifications. The significance and confidence levels of protein identifications are calculated from the model distributions. The multinomial model combines the database search results and generates an observed frequency distribution of cross-correlation scores (grouped into bins) between experimental spectra and identified amino acid sequences. The frequency distribution is used to generate p-value probabilities of each score bin. The probabilities are then normalized with respect to score bins to generate normalized probabilities of all score bins. A protein identification probability is the multinomial probability of observing the given set of peptide scores. To reduce the effect of random matches, we employ a marginalized multinomial model for small values of cross-correlation scores. We demonstrate that the combination of the two independent methods provides a useful tool for protein identification from results of database search using tandem mass spectra. A receiver operating characteristic curve demonstrates the sensitivity and accuracy level of the approach. The shortcomings of the models are related to the cases when protein assignment is based on unusual peptide fragmentation patterns that dominate over the model encoded in the peptide identification process. We have implemented the approach in a program called PROT_PROBE.  相似文献   

3.
The characterization of proteomes by mass spectrometry is largely limited to organisms with sequenced genomes. To identify proteins from organisms with unsequenced genomes, database sequences from related species must be employed for sequence-similarity protein identifications. Peptide sequence tags (Mann, 1994) have been used successfully for the identification of proteins in sequence databases using partially interpreted tandem mass spectra of tryptic peptides. We have extended the ability of sequence tag searching to the identification of proteins whose sequences are yet unknown but are homologous to known database entries. The MultiTag method presented here assigns statistical significance to matches of multiple error-tolerant sequence tags to a database entry and ranks alignments by their significance. The MultiTag approach has the distinct advantage over other sequence-similarity approaches of being able to perform sequence-similarity identifications using only very short (2-4) amino acid residue stretches of peptide sequences, rather than complete peptide sequences deduced by de novo interpretation of tandem mass spectra. This feature facilitates the identification of low abundance proteins, since noisy and low-intensity tandem mass spectra can be utilized.  相似文献   

4.
Detection and identification of pathogenic bacteria and their protein toxins play a crucial role in a proper response to natural or terrorist-caused outbreaks of infectious diseases. The recent availability of whole genome sequences of priority bacterial pathogens opens new diagnostic possibilities for identification of bacteria by retrieving their genomic or proteomic information. We describe a method for identification of bacteria based on tandem mass spectrometric (MS/MS) analysis of peptides derived from bacterial proteins. This method involves bacterial cell protein extraction, trypsin digestion, liquid chromatography MS/MS analysis of the resulting peptides, and a statistical scoring algorithm to rank MS/MS spectral matching results for bacterial identification. To facilitate spectral data searching, a proteome database was constructed by translating genomes of bacteria of interest with fully or partially determined sequences. In this work, a prototype database was constructed by the automated analysis of 87 publicly available, fully sequenced bacterial genomes with the GLIMMER gene finding software. MS/MS peptide spectral matching for peptide sequence assignment against this proteome database was done by SEQUEST. To gauge the relative significance of the SEQUEST-generated matching parameters for correct peptide assignment, discriminant function (DF) analysis of these parameters was applied and DF scores were used to calculate probabilities of correct MS/MS spectra assignment to peptide sequences in the database. The peptides with DF scores exceeding a threshold value determined by the probability of correct peptide assignment were accepted and matched to the bacterial proteomes represented in the database. Sequence filtering or removal of degenerate peptides matched with multiple bacteria was then performed to further improve identification. It is demonstrated that using a preset criterion with known distributions of discriminant function scores and probabilities of correct peptide sequence assignments, a test bacterium within the 87 database microorganisms can be unambiguously identified.  相似文献   

5.
We present a new probability-based method for protein identification using tandem mass spectra and protein databases. The method employs a hypergeometric distribution to model frequencies of matches between fragment ions predicted for peptide sequences with a specific (M + H)+ value (at some mass tolerance) in a protein sequence database and an experimental tandem mass spectrum. The hypergeometric distribution constitutes null hypothesis-all peptide matches to a tandem mass spectrum are random. It is used to generate a score characterizing the randomness of a database sequence match to an experimental tandem mass spectrum and to determine the level of significance of the null hypothesis. For each tandem mass spectrum and database search, a peptide is identified that has the least probability of being a random match to the spectrum and the corresponding level of significance of the null hypothesis is determined. To check the validity of the hypergeometric model in describing fragment ion matches, we used chi2 test. The distribution of frequencies and corresponding hypergeometric probabilities are generated for each tandem mass spectrum. No proteolytic cleavage specificity is used to create the peptide sequences from the database. We do not use any empirical probabilities in this method. The scores generated by the hypergeometric model do not have a significant molecular weight bias and are reasonably independent of database size. The approach has been implemented in a database search algorithm, PEP_PROBE. By using a large set of tandem mass spectra derived from a set of peptides created by digestion of a collection of known proteins using four different proteases, a false positive rate of 5% is demonstrated.  相似文献   

6.
Database-searching algorithms compatible with shotgun proteomics match a peptide tandem mass spectrum to a predicted mass spectrum for an amino acid sequence within a database. SEQUEST is one of the most common software algorithms used for the analysis of peptide tandem mass spectra by using a cross-correlation (XCorr) scoring routine to match tandem mass spectra to model spectra derived from peptide sequences. To assess a match, SEQUEST uses the difference between the first- and second-ranked sequences (ACn). This value is dependent on the database size, search parameters, and sequence homologies. In this report, we demonstrate the use of a scoring routine (SEQUEST-NORM) that normalizes XCorr values to be independent of peptide size and the database used to perform the search. This new scoring routine is used to objectively calculate the percent confidence of protein identifications and posttranslational modifications based solely on the XCorr value.  相似文献   

7.
Database search identification algorithms, such as Sequest and Mascot, constitute powerful enablers for proteomic tandem mass spectrometry. We introduce DBDigger, an algorithm that reorganizes the database identification process to remove a problematic bottleneck. Typically such algorithms determine which candidate sequences can be compared to each spectrum. Instead, DBDigger determines which spectra can be compared to each candidate sequence, enabling the software to generate candidate sequences only once for each HPLC separation rather than for each spectrum. This reorganization also reduces the number of times a spectrum must be predicted for a particular candidate sequence and charge state. As a result, DBDigger can accelerate some database searches by more than an order of magnitude. In addition, the software offers features to reduce the performance degradation introduced by posttranslational modification (PTM) searching. DBDigger allows researchers to specify the sequence context in which each PTM is possible. In the case of CNBr digests, for example, modified methionine residues can be limited to occur only at the C-termini of peptides. Use of "context-dependent" PTM searching reduces the performance penalty relative to traditional PTM searching. We characterize the performance possible with DBDigger, showcasing MASPIC, a new statistical scorer. We describe the implementation of these innovations in the hope that other researchers will employ them for rapid and highly flexible proteomic database search.  相似文献   

8.
In this paper, we present an intensity-based probability function to identify peptides from tandem mass spectra and amino acid sequence databases. The function is an approximation to the central limiting theorem, and it explicitly depends on the cumulative product ion intensities, number of product ions of a peptide, and expectation value of the cumulative intensity. We compare the results of database searches using the new scoring function and scoring functions from earlier algorithms, which implement hypergeometric probability, Poisson's model, and cross-correlation scores. For a standard protein mixture (tandem mass spectra generated from the mixture of five known proteins), we generate receiver operating curves with all scoring schemes. The receiver operating curves show that the shared peaks count-based probability methods (like Poisson and hypergeometric models) are the most specific for matching high-quality tandem mass spectra. The intensity-based (central limit model) and intensity-modeled (cross-correlation) methods are more sensitive when matching low-quality tandem mass spectra, where the number of shared peaks is insufficient to correctly identify a peptide. Cross-correlation methods show a small advantage over the intensity-based probability method.  相似文献   

9.
We have developed a pattern recognition algorithm called SALSA (scoring algorithm for spectral analysis) for the detection of specific features in tandem MS (MS-MS) spectra. Application of the SALSA algorithm to the detection of peptide MS-MS ion series enables identification of MS-MS spectra displaying characteristics of specific peptide sequences. SALSA analysis scores MS-MS spectra based on correspondence between theoretical ion series for peptide sequence motifs and actual MS-MS product ion series, regardless of their absolute positions on the m/z axis. Analyses of tryptic digests of bovine serum albumin (BSA) by LC-MS-MS followed by SALSA analysis detected MS-MS spectra for both unmodified and multiple modified forms of several BSA tryptic peptides. SALSA analysis of MS-MS data from mixtures of BSA and human serum albumin (HSA) tryptic digests indicated that ion series searches with BSA peptide sequence motifs identified MS-MS spectra for both BSA and closely related HSA peptides. Optimal discrimination between MS-MS spectra of variant peptide forms is achieved when the SALSA search criteria are optimized to the target peptide. Application of SALSA to LC-MS-MS proteome analysis will facilitate the characterization of modified and sequence variant proteins.  相似文献   

10.
A method for rapid and unambiguous identification of proteins by sequence database searching using the accurate mass of a single peptide and specific sequence constraints is described. Peptide masses were measured using electrospray ionization-Fourier transform ion cyclotron resonance mass spectrometry to an accuracy of 1 ppm. The presence of a cysteine residue within a peptide sequence was used as a database searching constraint to reduce the number of potential database hits. Cysteine-containing peptides were detected within a mixture of peptides by incorporating chlorine into a general alkylating reagent specific for cysteine residues. Secondary search constraints included the specificity of the protease used for protein digestion and the molecular mass of the protein estimated by gel electrophoresis. The natural isotopic distribution of chlorine encoded the cysteine-containing peptide with a distinctive isotopic pattern that allowed automatic screening of mass spectra. The method is demonstrated for a peptide standard and unknown proteins from a yeast lysate using all 6118 possible yeast open reading frames as a database. As judged by calculation of codon bias, low-abundance proteins were identified from the yeast lysate using this new method but not by traditional methods such as tandem mass spectrometry via data-dependent acquisition or mass mapping.  相似文献   

11.
Pan S  Gu S  Bradbury EM  Chen X 《Analytical chemistry》2003,75(6):1316-1324
Identification of proteins with low sequence coverage using mass spectrometry (MS) requires tandem MS/MS peptide sequencing. It is very challenging to obtain a complete or to interpret an incomplete tandem MS/MS spectrum from fragmentation of a weak peptide ion signal for sequence assignment. Here, we have developed an effective and high-throughput MALDI-TOF-based method for the identification of membrane and other low-abundance proteins with a simple, one-dimensional separation step. In this approach, several stable isotope-labeled amino acid precursors were selected to mass-tag, in parallel, the human proteome of human skin fibroblast cells in a residue-specific manner during in vivo cell culturing. These labeled residues can be recognized by their characteristic isotope patterns in MALDI-TOF MS spectra. The isotope pattern of particular peptides induced by the different labeled precursors provides information about their amino acid compositions. The specificity of peptide signals in a peptide mass mapping is thus greatly enhanced, resolving a high degree of mass degeneracy of proteolytic peptides derived from the complex human proteome. Further, false positive matches in database searching can be eliminated. More importantly, proteins can be accurately identified through a single peptide with its m/z value and partial amino acid composition. With the increased solubility of hydrophobic proteins in SDS, we have demonstrated that our approach is effective for the identification of membrane and low-abundant proteins with low sequence coverage and weak signal intensity, which are often difficult for obtaining informative fragment patterns in tandem MS/MS peptide sequencing analysis.  相似文献   

12.
Peptide identification based on tandem mass spectrometry and database searching algorithms has become one of the central technologies in proteomics. At the heart of this technology is the ability to reproducibly acquire high-quality tandem mass spectra for database interrogation. The variability in tandem mass spectra generation is often assumed to be minimal, and peptide identifications are typically based on a single tandem mass spectrum. In this paper, we characterize the variance of scores derived from replicate tandem mass spectra using several database search algorithms and demonstrate the effects of spectral variability on the correct identification of peptides. We show that the variance associated with the collection of tandem mass spectra can be substantial leading to sizable errors in search algorithm scores ( approximately 5-25% RSD) and ultimately incorrect assignments. Processing strategies are discussed to minimize the impact of tandem mass spectra variability on peptide identification.  相似文献   

13.
A protein mixture derived from a whole cell lysate fraction of Saccharomyces cerevisiae, which contains roughly 19 proteins, has been analyzed to identify an a priori unknown modified protein using a quadrupole ion trap tandem mass spectrometer. Collection of the experimental data was facilitated by collision-induced dissociation and ion/ion proton-transfer reactions in multistage mass spectrometry procedures. Ion/ion reactions were used to manipulate charge states of both parent ions and product ions for the purpose of concentrating charge into the parent ion of interest and to reduce the product ion charge states for determination of product ion mass and abundance. The identification of the protein was achieved by matching the uninterpreted product ion spectrum against protein sequence databases with varying degrees of annotation, coupled with a scoring scheme weighted for the relative abundances of the experimentally observed product ions and the frequency of fragmentations occurring at preferential sites. The protein was identified to be an acetylated yeast heat shock protein, HS12_Yeast (11.6 kDa), with the initiating methionine residue removed. This constitutes the first example of the identification of an a priori unknown protein that is not present in an annotated protein database using a "top-down" approach with a quadrupole ion trap. This example illustrates the utility of relatively low cost instrumentation with modest mass analysis characteristics for the identification of modified proteins without recourse to enzymatic digestion. It also illustrates how experimental data can be used interactively with protein databases when the modified protein of interest is not initially present in the database.  相似文献   

14.
Proteolytic peptide mass mapping as measured by mass spectrometry provides a major approach for the identification of proteins. A protein is usually identified by the best match between the measured and calculated m/z values of the proteolytic peptides. A unique identification is, however, heavily dependent upon the mass accuracy and sequence coverage of the fragment ions generated by peptide ionization. Without ultrahigh instrumental accuracy, it is possible to increase the specificity of the assignments of particular proteolytic peptides by the incorporation of selected amino acid residue(s) enriched with stable isotope(s) into the protein sequence. Here we report this novel method of generating residue-specific mass-tagged proteolytic peptides for accurate and efficient protein identification. Selected amino acids are labeled with 13C/15N/2H and incorporated into proteins in a sequence-specific manner during cell culturing. Each of these labeled amino acids carries a defined mass change encoded in its monoisotopic distribution pattern. Through their characteristic patterns, the peptides with mass tags can then be readily distinguished from other peptides in mass spectra. This method of identifying unique proteins can also be extended to protein complexes and will significantly increase data search specificity, efficiency, and accuracy for protein identifications.  相似文献   

15.
A MALDI QqTOF mass spectrometer has been used to identify proteins separated by one-dimensional or two-dimensional gel electrophoresis at the femtomole level. The high mass resolution and the high mass accuracy of this instrument in both MS and MS/MS modes allow identification of a protein either by peptide mass fingerprinting of the protein digest or from tandem mass spectra acquired by collision-induced dissociation of individual peptide precursors. A peptide mass map of the digest and tandem mass spectra of multiple peptide precursor ions can be acquired from the same sample in the course of a single experiment. Database searching and acquisition of MS and MS/MS spectra can be combined in an interactive fashion, increasing the information value of the analytical data. The approach has demonstrated its usefulness in the comprehensive characterization of protein in-gel digests, in the dissection of complex protein mixtures, and in sequencing of a low molecular weight integral membrane protein. Proteins can be identified in all types of sequence databases, including an EST database. Thus, MALDI QqTOF mass spectrometry promises to have remarkable potential for advancing proteomic research.  相似文献   

16.
S Kim  I Koo  J Jeong  S Wu  X Shi  X Zhang 《Analytical chemistry》2012,84(15):6477-6487
Compound identification is a key component of data analysis in the applications of gas chromatography-mass spectrometry (GC-MS). Currently, the most widely used compound identification is mass spectrum matching, in which the dot product and its composite version are employed as spectral similarity measures. Several forms of transformations for fragment ion intensities have also been proposed to increase the accuracy of compound identification. In this study, we introduced partial and semipartial correlations as mass spectral similarity measures and applied them to identify compounds along with different transformations of peak intensity. The mixture versions of the proposed method were also developed to further improve the accuracy of compound identification. To demonstrate the performance of the proposed spectral similarity measures, the National Institute of Standards and Technology (NIST) mass spectral library and replicate spectral library were used as the reference library and the query spectra, respectively. Identification results showed that the mixture partial and semipartial correlations always outperform both the dot product and its composite measure. The mixture similarity with semipartial correlation has the highest accuracy of 84.6% in compound identification with a transformation of (0.53,1.3) for fragment ion intensity and m/z value, respectively.  相似文献   

17.
Intact protein biomarkers from Bacillus cereus T spores have been analyzed by high-resolution tandem Fourier transform ion cyclotron resonance mass spectrometry. Two techniques have been applied for excitation of the isolated multiply charged precursor ion species: sustained off-resonance irradiation/collisionally activated dissociation and electron capture dissociation. Fragmentation-derived sequence tags and BLAST sequence similarity proteome database searches allow unequivocal identification of the major biomarker protein with unprecedented specificity. Sequence-specific fragmentation patterns further confirm protein identification. Moreover, methodology combining accurate mass measurements of intact proteins with additional information contained in a proteome database permits tentative assignment of several other protein biomarkers isolated from the B. cereus T spores. We argue that approaches involving tandem MS of protein biomarkers, combined with bioinformatics, can drastically improve the specificity of individual microorganism identification, particularly in complex environments.  相似文献   

18.
Identification of individual proteins in complex protein mixtures by high-resolution (HR), high-mass-accuracy matrix-assisted laser desorption/ionization (MALDI) time-of-flight mass spectrometry (TOF-MS) is demonstrated for synthetic protein mixtures. Instead of chemical denaturation, thermal denaturation followed by in-solution trypsin digestion is used to achieve uniform digestion of the constituents of the protein mixture. Protein identification is carried out using protein database searches with search scoring systems, which seems more effective than conventional peptide mass mapping without using a scoring system. Identification of individual proteins by MALDI HR-TOF-MS peptide mass mapping dramatically reduces data acquisition/analysis time and does not require special equipment for sample preparation/transfer prior to mass spectral analysis.  相似文献   

19.
A simple and reliable method is described here for the identification and relative quantification of proteins in complex mixtures using two-dimensional liquid chromatography/tandem mass spectrometry. The method is based on the classical proteomic analysis where proteins are digested with trypsin and the resulting peptides are separated by multidimensional liquid chromatography. The separated peptides are analyzed by tandem mass spectrometry and identified via a database search algorithm such as SEQUEST. The peak areas (integrated ion counts over the peptide elution time) of all identified peptides are calculated, and the relative concentration of each protein is determined by comparing the peak areas of all peptides from that protein in one sample versus those from the other. Using this strategy, we compared the relative level of protein expression of A431 cells (an epidermal cell line) grown in the presence or absence of epidermal growth factor (EGF). Our results are consistent with the published observations of the transient effects of EGF. In addition, the difference in the concentrations of several phosphopeptides determined in our studies suggests the possibility of several new targets involved in the EGF cell-signaling pathway. This global protein identification and quantification technology should prove to be a valuable means for comparing proteomes in biological samples subjected to differential treatments.  相似文献   

20.
Wan Y  Yang A  Chen T 《Analytical chemistry》2006,78(2):432-437
An accurate scoring function for database search is crucial for peptide identification using tandem mass spectrometry. Although many mathematical models have been proposed to score peptides against tandem mass spectra, our method (called PepHMM, http://msms.cmb.usc.edu) is unique in that it combines information on machine accuracy, mass peak intensity, and correlation among ions into a hidden Markov model (HMM). In addition, we develop a method to calculate statistical significance of the HMM scores. We implement the method and test them on two sets of experimental data generated by two different types of mass spectrometers and compare the results with MASCOT and SEQUEST under the same condition. One experimental results show that PepHMM has a much higher accuracy (with 6.5% error rate) than MASCOT (with 17.4% error rate), and the other experimental results show that PepHMM identifies 43 and 31% more correct spectra than SEQUEST and MASCOT, respectively.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号