首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
There are several computer programs that can match peptide tandem mass spectrometry data to their exactly corresponding database sequences, and in most protein identification projects, these programs are utilized in the early stages of data interpretation. However, situations frequently arise where tandem mass spectral data cannot be correlated with any database sequences. In these cases, the unmatched data could be due to peptides derived from novel proteins, allelic or species-derived variants of known proteins, or posttranslational or chemical modifications. Two additional problems are frequently encountered in high-throughput protein identification. First, it is difficult to quickly sift through large amounts of data to identify those spectra that, due to poor signal or contaminants, can be ignored. Second, it is important to find incorrect database matches (false positives). We have chosen to address these difficulties by performing automatic de novo sequencing using a computer program called Lutefisk. Sequence candidates obtained are used as input in a homology-based database search program called CIDentify to identify variants of known proteins. Comparison of database-derived sequences with de novo sequences allows for electronic validation of database matches even if the latter are not completely correct. Modifications to the original Lutefisk program have been implemented to handle data obtained from triple quadrupole, ion trap, and quadrupole/time-of-flight hybrid (Qtof) mass spectrometers. For example, the linearity of mass errors due to temperature-dependent expansion of the flight tube in a Qtof was exploited such that isobaric amino acids (glutamine/lysine and oxidized methionine/ phenylalanine) can be differentiated without careful attention to mass calibration.  相似文献   

2.
Mo L  Dutta D  Wan Y  Chen T 《Analytical chemistry》2007,79(13):4870-4878
Tandem mass spectrometry (MS/MS) has become the experimental method of choice for high-throughput proteomics-based biological discovery. The two primary ways of analyzing MS/MS data are database search and de novo sequencing. In this paper, we present a new approach to peptide de novo sequencing, called MSNovo, which has the following advanced features. (1) It works on data generated from both LCQ and LTQ mass spectrometers and interprets singly, doubly, and triply charged ions. (2) It integrates a new probabilistic scoring function with a mass array-based dynamic programming algorithm. The simplicity of the scoring function, with only 6-10 parameters to be trained, avoids the problem of overfitting and allows MSNovo to be adopted for other machines and data sets easily. The mass array data structure explicitly encodes all possible peptides and allows the dynamic programming algorithm to find the best peptide. (3) Compared to existing programs, MSNovo predicts peptides as well as sequence tags with a higher accuracy, which is important for those applications that search protein databases using the de novo sequencing results. More specifically, we show that MSNovo outperforms other programs on various ESI ion trap data. We also show that for high-resolution data the performance of MSNovo improves significantly. Supporting Information, executable files and data sets can be found at http://msms.usc.edu/supplementary/msnovo.  相似文献   

3.
A method for rapid identification of microorganisms is presented, which exploits the wealth of information contained in prokaryotic genome and protein sequence databases. The method is based on determining the masses of a set of ions by MALDI TOF mass spectrometry of intact or treated cells. Subsequent correlation of each ion in the set to a protein, along with the organismic source of the protein, is performed by searching an Internet-accessible protein database. Convoluting the lists for all ions and ranking the organisms corresponding to matched ions results in the identification of the microorganism. The method has been successfully demonstrated on B. subtilis and E. coli, two organisms with completely sequenced genomes. The method has been also tested for identification from mass spectra of mixtures of microorganisms, from spectra of an organism at different growth stages, and from spectra originating at other laboratories. Experimental factors such as MALDI matrix preparation, spectral reproducibility, contaminants, mass range, and measurement accuracy on the database search procedure are addressed too. The proposed method has several advantages over other MS methods for microorganism identification.  相似文献   

4.
We derive and validate a simple statistical model that predicts the distribution of false matches between peaks in matrix-assisted laser desorption/ionization mass spectrometry data and proteins in proteome databases. The model allows us to calculate the significance of previously reported microorganism identification results. In particular, for deltam = +/-1.5 Da, we find that the computed significance levels are sufficient to demonstrate the ability to identify microorganisms, provided the number of candidate microorganisms is limited to roughly three Escherichia coli-like or roughly 10 Bacillus subtilis-like microorganisms (in the sense of having roughly the same number of proteins per unit-mass interval). We conclude that, given the cluttered and incomplete nature of the data, it is likely that neither simple ranking nor simple hypothesis testing will be sufficient for truly robust microorganism identification over a large number of candidate microorganisms.  相似文献   

5.
A novel methodology for the automated de novo identification of peptides via integer linear optimization (also referred to as integer linear programming or ILP) and tandem mass spectrometry is presented in this article. The various features of the mathematical model are presented and examples are used to illustrate the key concepts of the proposed approach. A variety of challenging peptide identification problems, accompanied by a comparative study with five state-of-the-art methods, are examined to illustrate the proposed method's ability to address (a) residue-dependent fragmentation properties that result in missing ion peaks and (b) the variability of resolution in different mass analyzers. A preprocessing algorithm is utilized to identify important m/z values in the tandem mass spectrum. Missing peaks, due to residue-dependent fragmentation characteristics, are dealt with using a two-stage algorithmic framework. A cross-correlation approach is used to resolve missing amino acid assignments and to select the most probable peptide by comparing the theoretical spectra of the candidate sequences that were generated from the ILP sequencing stages with the experimental tandem mass spectrum. The novel, proposed de novo method, denoted as PILOT, is compared to existing popular methods such as Lutefisk, PEAKS, PepNovo, EigenMS, and NovoHMM for a set of spectra resulting from QTOF and ion trap instruments.  相似文献   

6.
Gu S  Pan S  Bradbury EM  Chen X 《Analytical chemistry》2002,74(22):5774-5785
Here, we describe a method for protein identification and de novo peptide sequencing. Through in vivo cell culturing, the deuterium-labeled lysine residue (Lys-d4) introduces a 4-Da mass tag at the carboxyl terminus of proteolytic peptides when cleaved by certain proteases. The 4-Da mass difference between the unlabeled and the deuterated lysine assigns a mass signature to all lysine-containing peptides in any pool of proteolytic peptides for protein identification directly through peptide mass mapping. Furthermore, it was used to distinguish between N- and C-terminal fragments for accurate assignments of daughter ions in tandem MS/MS spectra for sequence assignment. This technique simplifies the labeling scheme and the interpretation of the MS/MS spectra by assigning different series of fragment ions correctly and easily and is very useful in de novo peptide sequencing. We have also successfully implemented this approach to the analysis of protein mixtures derived from the human proteome.  相似文献   

7.
A novel concept of two-dimensional fragment correlation mass spectrometry and its application to peptide sequencing is described. The daughter ion (MS2) spectrum of a peptide contains the sequence information of the peptide. However, deciphering the MS2 spectrum, and thus deriving the peptide sequence is complex because of the difficulty in distinguishing the N-terminal fragments (e.g., b series) from the C-terminal fragments (e.g., y series). By taking a granddaughter ion (MS3) spectrum of a particular daughter ion, all fragment ions of the opposite terminus are eliminated in the MS3 spectrum. However, some internal fragments of the peptide will appear in the MS3 spectrum. Because internal fragments are rarely present in the MS2 spectrum, the intersection (a spectrum containing peaks that are present in both spectra) of the MS2 and MS3 spectra should contain only fragments of the same terminal type. A two-dimensional plot of the MS2 spectrum versus the intersection spectra (2-D fragment correlation mass spectrum) often gives enough information to derive the complete sequence of a peptide. This paper describes this novel technique and its application in sequencing cytochrome c and apomyoglobin. For a tryptic digest of cytochrome c, approximately 78% of the protein sequence was determined. For the Glu-C/tryptic digest of apomyoglobin, approximately 66% of the protein sequence was determined.  相似文献   

8.
The analysis of mass spectrometry data is still largely based on identification of single MS/MS spectra and does not attempt to make use of the extra information available in multiple MS/MS spectra from partially or completely overlapping peptides. Analysis of MS/MS spectra from multiple overlapping peptides opens up the possibility of assembling MS/MS spectra into entire proteins, similarly to the assembly of overlapping DNA reads into entire genomes. In this paper, we present for the first time a way to detect, score, and interpret overlaps between uninterpreted MS/MS spectra in an attempt to sequence entire proteins rather than individual peptides. We show that this approach not only extends the length of reconstructed amino acid sequences but also dramatically improves the quality of de novo peptide sequencing, even for low mass accuracy MS/MS data.  相似文献   

9.
Wan Y  Yang A  Chen T 《Analytical chemistry》2006,78(2):432-437
An accurate scoring function for database search is crucial for peptide identification using tandem mass spectrometry. Although many mathematical models have been proposed to score peptides against tandem mass spectra, our method (called PepHMM, http://msms.cmb.usc.edu) is unique in that it combines information on machine accuracy, mass peak intensity, and correlation among ions into a hidden Markov model (HMM). In addition, we develop a method to calculate statistical significance of the HMM scores. We implement the method and test them on two sets of experimental data generated by two different types of mass spectrometers and compare the results with MASCOT and SEQUEST under the same condition. One experimental results show that PepHMM has a much higher accuracy (with 6.5% error rate) than MASCOT (with 17.4% error rate), and the other experimental results show that PepHMM identifies 43 and 31% more correct spectra than SEQUEST and MASCOT, respectively.  相似文献   

10.
Hemoglobin-based oxygen therapeutics are prepared by reaction of hemoglobin with cross-linking molecules and are utilized as blood substitutes. They can be used as doping agents to increase the oxygen-carrying capacity of hemoglobin. We have compared a glutaraldehyde-polymerized bovine hemoglobin (Oxyglobin, Biopure Corp.) with natural bovine hemoglobin by mass spectrometry in order to detect specific fragment ions of the cross-linked protein for further potential applications in doping control of human blood samples. HCl acid (6 N) hydrolysis was performed in parallel on both proteins. Hydrolysates were then analyzed by direct infusion electrospray mass spectrometry (ESIMS) using a triple quadrupole mass spectrometer. Confirmation and precision were obtained by LC-ESIMS(n) experiments performed on an ion trap mass spectrometer. Chromatographic and mass spectrometry data allowed detection of two potential Oxyglobin-specific ions--m/z 299 and 399--that were shown to lose a 159 u neutral fragment under collision-induced dissociation conditions. Thus, monitoring of constant neutral loss of 159 u on acid hydrolysates of human serum samples spiked with different amounts of Oxyglobin has proved to be an efficient screening method to specifically detect and identify Oxyglobin. LC-MS of the spiked serum sample hydrolysates enabled detection of Oxyglobin at a detection limit of 4 g x L(-1).  相似文献   

11.
We report a new tandem mass spectrometric approach for the improved identification of polypeptides from mixtures (e.g., using genomic databases). The approach involves the dissociation of several species simultaneously in a single experiment and provides both increased speed and sensitivity. The data analysis makes use of the known fragmentation pathways for polypeptides and highly accurate mass measurements for both the set of parent polypeptides and their fragments. The accurate mass information makes it possible to attribute most fragments to a specific parent species. We provide an initial demonstration of this multiplexed tandem MS approach using an FTICR mass spectrometer with a mixture of seven polypeptides dissociated using infrared irradiation from a CO2 laser. The peptides were added to, and then successfully identified from, the largest genomic database yet available (C. elegans), which is equivalent in complexity to that for a specific differentiated mammalian cell type. Additionally, since only a few enzymatic fragments are necessary to unambiguously identify a protein from an appropriate database, it is anticipated that the multiplexed MS/MS method will allow the more rapid identification of complex protein mixtures with on-line separation of their enzymatically produced polypeptides.  相似文献   

12.
A novel peptide derivatization strategy based on guanidination and amidination is presented. Mass-coded labels help distinguish N- and C-terminal fragment ions produced by collision-induced dissociation and are of general utility since peptide N-termini are coded. The amidine labels also promote specific fragmentation pathways that elucidate N-terminal residues and provide valuable internal calibrants. This strategy is demonstrated with the tryptic peptides of several model proteins, including two that are phosphorylated. Additionally, interpreted peptide sequences are matched against a database of over 80,000 proteins to assess the selectivity of this sequencing approach.  相似文献   

13.
MALDI-TOF mass spectrometry has been coupled with Internet-based proteome database search algorithms in an approach for direct microorganism identification. This approach is applied here to characterize intact H. pylori (strain 26695) Gram-negative bacteria, the most ubiquitous human pathogen. A procedure for including a specific and common posttranslational modification, N-terminal Met cleavage, in the search algorithm is described. Accounting for posttranslational modifications in putative protein biomarkers improves the identification reliability by at least an order of magnitude. The influence of other factors, such as number of detected biomarker peaks, proteome size, spectral calibration, and mass accuracy, on the microorganism identification success rate is illustrated as well.  相似文献   

14.
De novo sequencing of peptides poses one of the most challenging tasks in data analysis for proteome research. In this paper, a generative hidden Markov model (HMM) of mass spectra for de novo peptide sequencing which constitutes a novel view on how to solve this problem in a Bayesian framework is proposed. Further extensions of the model structure to a graphical model and a factorial HMM to substantially improve the peptide identification results are demonstrated. Inference with the graphical model for de novo peptide sequencing estimates posterior probabilities for amino acids rather than scores for single symbols in the sequence. Our model outperforms state-of-the-art methods for de novo peptide sequencing on a large test set of spectra.  相似文献   

15.
The traditional approach to the identification of peptides in complex biological samples integrally involves the use of tandem mass spectrometry to generate a unique fragmentation pattern in order to accurately assign its identity to a particular protein. In this article we describe the theoretical basis for a new paradigm for the identification of peptides and proteins. This methodology employs the use of accurate mass and peptide isoelectric point (pI) as identification criteria, and represents a change in focus from current tandem mass spectrometry-dominated approaches. A mathematical derivation of the false positive rate associated with accurate mass and pI measurements is presented to demonstrate the utility of the technique. The equations for calculation of the experimental false positive rate allow for the determination of the validity of the data. The false positive rate issue examined in detail here is not restricted to accurate mass-based approaches, but also has application to the tandem mass spectrometry community as well. The theoretical proteomes of Escherichia coli and Rattus norvegicus are used to evaluate the efficacy of this approach. The power of the technique is demonstrated by analyzing a series of peptides with the same monoisotopic masses but with differing isoelectric points. Finally, the speed of algorithm when combined with the experimental peptide analysis has the potential to rapidly accelerate the protein identification process.  相似文献   

16.
Mass spectrometry based metabolomics represents a new area for bioinformatics technology development. While the computational tools currently available such as XCMS statistically assess and rank LC-MS features, they do not provide information about their structural identity. XCMS(2) is an open source software package which has been developed to automatically search tandem mass spectrometry (MS/MS) data against high quality experimental MS/MS data from known metabolites contained in a reference library (METLIN). Scoring of hits is based on a "shared peak count" method that identifies masses of fragment ions shared between the analytical and reference MS/MS spectra. Another functional component of XCMS(2) is the capability of providing structural information for unknown metabolites, which are not in the METLIN database. This "similarity search" algorithm has been developed to detect possible structural motifs in the unknown metabolite which may produce characteristic fragment ions and neutral losses to related reference compounds contained in METLIN, even if the precursor masses are not the same.  相似文献   

17.
T-1-family conotoxins belong to the T-superfamily and are composed of 10-17 amino acids. They share a common cysteine framework and disulfide connectivity and exhibit unusual posttranslational modifications, such as tryptophan bromination, glutamic acid carboxylation, and threonine glycosylation. We have isolated and characterized a novel peptide, Mo1274, containing 11 amino acids, that shows the same cysteine pattern, -CC-CC, and disulfide linkage as those of the T-1-family members. The complete sequence, GNWCCSARVCC, in which W denotes bromotryptophan, was derived from MS-based de novo sequencing. The FT-ICR MS/MS techniques of electron capture dissociation (ECD), infrared multiphoton dissociation, and collision-induced dissociation served to detect and localize the tryptophan bromination. The bromine contributes a distinctive isotopic distribution in all fragments that contain bromotryptophan. ECD fragmentation results in the loss of bromine and return to the normal isotopic distribution. Disulfide connectivity of Mo1274, between cysteine pairs 1-3 and 2-4, was determined by mass spectrometry in combination with chemical derivatization employing tris(2-carboxyethyl)phosphine, followed by differential alkylation with N-ethylmaleimide and iodoacetamide. The ECD spectra of the native and partially modified peptide reveal a loss of bromine in a process that requires the presence of a disulfide bond.  相似文献   

18.
Tao L  Yu X  Snyder AP  Li L 《Analytical chemistry》2004,76(22):6609-6617
A protein mass mapping approach using mass spectrometry (MS) combined with an experimentally derived protein mass database is presented for rapid and effective identification of bacterial species. A prototype mass database from the protein extracts of nine bacterial species has been created by off-line high-performance liquid chromatography (HPLC) matrix-assisted laser desorption/ionization (MALDI) MS, in which the microbiological parameter of bacterial growth time is considered. A numerical method using a statistical weight factor algorithm is devised for matching the protein masses of an unknown bacterial sample against the database. The sum of these weight factors produces a corresponding summed weight factor score for each bacterial species listed in the database, and the database species producing the highest score represents the identity of the respective unknown bacterium. The applicability and reliability of this protein mass mapping approach has been tested with seven bacterial species in a single-blind study by both direct MALDI MS and HPLC electrospray ionization MS methods, and identification results with 100% accuracy are obtained. Our studies have demonstrated that the protein mass database can be rapidly established and readily adopted with relatively less dependency on experimental factors. Furthermore, it is shown that a number of proteins can be detected using a protein sample amount equivalent to an extract of less than 1000 cells, demonstrating that this protein mass mapping approach can potentially be highly sensitive for rapid bacterial identification.  相似文献   

19.
The characterization of proteomes by mass spectrometry is largely limited to organisms with sequenced genomes. To identify proteins from organisms with unsequenced genomes, database sequences from related species must be employed for sequence-similarity protein identifications. Peptide sequence tags (Mann, 1994) have been used successfully for the identification of proteins in sequence databases using partially interpreted tandem mass spectra of tryptic peptides. We have extended the ability of sequence tag searching to the identification of proteins whose sequences are yet unknown but are homologous to known database entries. The MultiTag method presented here assigns statistical significance to matches of multiple error-tolerant sequence tags to a database entry and ranks alignments by their significance. The MultiTag approach has the distinct advantage over other sequence-similarity approaches of being able to perform sequence-similarity identifications using only very short (2-4) amino acid residue stretches of peptide sequences, rather than complete peptide sequences deduced by de novo interpretation of tandem mass spectra. This feature facilitates the identification of low abundance proteins, since noisy and low-intensity tandem mass spectra can be utilized.  相似文献   

20.
Detection and identification of pathogenic bacteria and their protein toxins play a crucial role in a proper response to natural or terrorist-caused outbreaks of infectious diseases. The recent availability of whole genome sequences of priority bacterial pathogens opens new diagnostic possibilities for identification of bacteria by retrieving their genomic or proteomic information. We describe a method for identification of bacteria based on tandem mass spectrometric (MS/MS) analysis of peptides derived from bacterial proteins. This method involves bacterial cell protein extraction, trypsin digestion, liquid chromatography MS/MS analysis of the resulting peptides, and a statistical scoring algorithm to rank MS/MS spectral matching results for bacterial identification. To facilitate spectral data searching, a proteome database was constructed by translating genomes of bacteria of interest with fully or partially determined sequences. In this work, a prototype database was constructed by the automated analysis of 87 publicly available, fully sequenced bacterial genomes with the GLIMMER gene finding software. MS/MS peptide spectral matching for peptide sequence assignment against this proteome database was done by SEQUEST. To gauge the relative significance of the SEQUEST-generated matching parameters for correct peptide assignment, discriminant function (DF) analysis of these parameters was applied and DF scores were used to calculate probabilities of correct MS/MS spectra assignment to peptide sequences in the database. The peptides with DF scores exceeding a threshold value determined by the probability of correct peptide assignment were accepted and matched to the bacterial proteomes represented in the database. Sequence filtering or removal of degenerate peptides matched with multiple bacteria was then performed to further improve identification. It is demonstrated that using a preset criterion with known distributions of discriminant function scores and probabilities of correct peptide sequence assignments, a test bacterium within the 87 database microorganisms can be unambiguously identified.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号