首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
PepNovo: de novo peptide sequencing via probabilistic network modeling   总被引:7,自引:0,他引:7  
We present a novel scoring method for de novo interpretation of peptides from tandem mass spectrometry data. Our scoring method uses a probabilistic network whose structure reflects the chemical and physical rules that govern the peptide fragmentation. We use a likelihood ratio hypothesis test to determine whether the peaks observed in the mass spectrum are more likely to have been produced under our fragmentation model than under a model that treats peaks as random events. We tested our de novo algorithm PepNovo on ion trap data and achieved results that are superior to popular de novo peptide sequencing algorithms. PepNovo can be accessed via the URL http://www-cse.ucsd.edu/groups/bioinformatics/software.html.  相似文献   

2.
A novel methodology for the automated de novo identification of peptides via integer linear optimization (also referred to as integer linear programming or ILP) and tandem mass spectrometry is presented in this article. The various features of the mathematical model are presented and examples are used to illustrate the key concepts of the proposed approach. A variety of challenging peptide identification problems, accompanied by a comparative study with five state-of-the-art methods, are examined to illustrate the proposed method's ability to address (a) residue-dependent fragmentation properties that result in missing ion peaks and (b) the variability of resolution in different mass analyzers. A preprocessing algorithm is utilized to identify important m/z values in the tandem mass spectrum. Missing peaks, due to residue-dependent fragmentation characteristics, are dealt with using a two-stage algorithmic framework. A cross-correlation approach is used to resolve missing amino acid assignments and to select the most probable peptide by comparing the theoretical spectra of the candidate sequences that were generated from the ILP sequencing stages with the experimental tandem mass spectrum. The novel, proposed de novo method, denoted as PILOT, is compared to existing popular methods such as Lutefisk, PEAKS, PepNovo, EigenMS, and NovoHMM for a set of spectra resulting from QTOF and ion trap instruments.  相似文献   

3.
Performance of a linear ion trap-Orbitrap hybrid for peptide analysis   总被引:1,自引:0,他引:1  
Proteomic analysis of digested complex protein mixtures has become a useful strategy to identify proteins involved in biological processes. We have evaluated the use of a new mass spectrometer that combines a linear ion trap and an Orbitrap to create a hybrid tandem mass spectrometer. A digested submandibular/sublingual saliva sample was used for the analysis. We find the instrument is capable of mass resolution in excess of 40,000 and mass measurement accuracies of less than 2 ppm for the analysis of complex peptide mixtures. Such high mass accuracy allowed the elimination of virtually any false positive peptide identifications, suggesting that peptides that do not match the specificity of the protease used in the digestion of the sample should not automatically be considered as false positives. Tandem mass spectra from the linear ion trap and from the Orbitrap have very similar ion abundance ratios. We conclude this instrument will be well suited for shotgun proteomic types of analyses.  相似文献   

4.
Mo L  Dutta D  Wan Y  Chen T 《Analytical chemistry》2007,79(13):4870-4878
Tandem mass spectrometry (MS/MS) has become the experimental method of choice for high-throughput proteomics-based biological discovery. The two primary ways of analyzing MS/MS data are database search and de novo sequencing. In this paper, we present a new approach to peptide de novo sequencing, called MSNovo, which has the following advanced features. (1) It works on data generated from both LCQ and LTQ mass spectrometers and interprets singly, doubly, and triply charged ions. (2) It integrates a new probabilistic scoring function with a mass array-based dynamic programming algorithm. The simplicity of the scoring function, with only 6-10 parameters to be trained, avoids the problem of overfitting and allows MSNovo to be adopted for other machines and data sets easily. The mass array data structure explicitly encodes all possible peptides and allows the dynamic programming algorithm to find the best peptide. (3) Compared to existing programs, MSNovo predicts peptides as well as sequence tags with a higher accuracy, which is important for those applications that search protein databases using the de novo sequencing results. More specifically, we show that MSNovo outperforms other programs on various ESI ion trap data. We also show that for high-resolution data the performance of MSNovo improves significantly. Supporting Information, executable files and data sets can be found at http://msms.usc.edu/supplementary/msnovo.  相似文献   

5.
We have developed an approach to identify the molecular weight of a peptide ion directly from its corresponding tandem mass spectrum using a cross-correlation function. We have shown that the monoisotopic molecular weight can be calculated for approximately 90% of tandem mass spectra identified from tryptic digests of complex protein mixtures. The accuracy of the calculated monoisotopic masses was dependent on the resolution and mass accuracy of the spectra analyzed, but was typically <0.25 amu for linear ion trap mass spectra. The ability to calculate accurate monoisotopic molecular weights for low-resolution ion trap data should significantly improve both the speed and performance of database searches for which typical mass accuracies of approximately 3 amu are employed. In addition, this strategy can also be used to identify the precursor ion for tandem mass spectra acquired using large ion selection windows in data-independent collision-activated dissociation and has the potential to identify multiplexed tandem mass spectra.  相似文献   

6.
De novo sequencing of peptides poses one of the most challenging tasks in data analysis for proteome research. In this paper, a generative hidden Markov model (HMM) of mass spectra for de novo peptide sequencing which constitutes a novel view on how to solve this problem in a Bayesian framework is proposed. Further extensions of the model structure to a graphical model and a factorial HMM to substantially improve the peptide identification results are demonstrated. Inference with the graphical model for de novo peptide sequencing estimates posterior probabilities for amino acids rather than scores for single symbols in the sequence. Our model outperforms state-of-the-art methods for de novo peptide sequencing on a large test set of spectra.  相似文献   

7.
Gu S  Pan S  Bradbury EM  Chen X 《Analytical chemistry》2002,74(22):5774-5785
Here, we describe a method for protein identification and de novo peptide sequencing. Through in vivo cell culturing, the deuterium-labeled lysine residue (Lys-d4) introduces a 4-Da mass tag at the carboxyl terminus of proteolytic peptides when cleaved by certain proteases. The 4-Da mass difference between the unlabeled and the deuterated lysine assigns a mass signature to all lysine-containing peptides in any pool of proteolytic peptides for protein identification directly through peptide mass mapping. Furthermore, it was used to distinguish between N- and C-terminal fragments for accurate assignments of daughter ions in tandem MS/MS spectra for sequence assignment. This technique simplifies the labeling scheme and the interpretation of the MS/MS spectra by assigning different series of fragment ions correctly and easily and is very useful in de novo peptide sequencing. We have also successfully implemented this approach to the analysis of protein mixtures derived from the human proteome.  相似文献   

8.
Infrared multiphoton dissociation (IRMPD) of N-terminal sulfonated peptides improves de novo sequencing capabilities in a quadrupole ion trap mass spectrometer. Not only does IRMPD promote highly efficient dissociation of the N-terminal sulfonated peptides but also the entire series of y ions down to the y(1) fragment may be detected due to alleviation of the low-mass cutoff problem associated with conventional collisional activated dissociation (CAD) methods in a quadrupole ion trap. Commercial de novo sequencing software was applied for the interpretation of CAD and IRMPD MS/MS spectra collected for seven unmodified peptides and the corresponding N-terminal sulfonated species. In most cases, the additional information obtained by N-terminal sulfonation in combination with IRMPD provided significant improvements in sequence identification. The software sequence tag results were combined with a commercial database searching algorithm to interpret sequence information of a tryptic digest on alpha-casein s1. Energy-variable CAD studies confirmed a 30-40% reduction in the critical energies of the N-terminal sulfonated peptides relative to unmodified peptides. This reduction in dissociation energy facilitates IRMPD in a quadrupole ion trap.  相似文献   

9.
This study demonstrates that 1,5-I-AEDANS (5-({2-[(iodoacetyl)amino]ethyl}amino)naphthalene-1-sulfonic acid) can be used as a versatile fluorescence-based peptide quantification tool and provides readily interpretable tandem mass spectra for de novo peptide sequencing. Two AEDANS-cysteinyl-peptide fractionation strategies were evaluated. One AEDANS-cysteinyl-peptide fractionation strategy employs immobilized metal affinity chromatography (IMAC) to recover AEDANS-labeled peptides and reduce the complexity of peptide mixtures. In an alternate solid-phase approach, 1,5-I-AEDANS was coupled to an o-nitrobenzyl-based photocleavable resin to produce a resin that can label and isolate thiols and cysteine-containing peptides with a modified-AEDANS label (mAEDANS: 5-((4-amino-4-oxobutanoyl){2-[(iodoacetyl)amino]ethyl}amino)naphthalene-1-sulfonic acid). This fractionation protocol enriches cysteine-containing peptides more specifically than the IMAC strategy. Using micro-LC-ESI-MS with an on-line fluorescence detector and a Q-TOF mass spectrometer, we generated fluorescence-based elution profiles and corresponding positive ion mass spectra of AEDANS-labeled peptides. This study demonstrates that AEDANS-peptides produce positive ion ESI-MS mass spectra with detection limits comparable to those of the unlabeled peptide. Collision-induced dissociation (CID) of fluorescent AEDANS-peptides revealed readily interpretable product ion spectra with the label intact. Similar to the AEDANS-labeled peptide, an mAEDANS-labeled thiol is fluorescent and CID of a mAEDANS-labeled peptide also reveals an interpretable product ion spectrum with the label intact.  相似文献   

10.
A novel concept of two-dimensional fragment correlation mass spectrometry and its application to peptide sequencing is described. The daughter ion (MS2) spectrum of a peptide contains the sequence information of the peptide. However, deciphering the MS2 spectrum, and thus deriving the peptide sequence is complex because of the difficulty in distinguishing the N-terminal fragments (e.g., b series) from the C-terminal fragments (e.g., y series). By taking a granddaughter ion (MS3) spectrum of a particular daughter ion, all fragment ions of the opposite terminus are eliminated in the MS3 spectrum. However, some internal fragments of the peptide will appear in the MS3 spectrum. Because internal fragments are rarely present in the MS2 spectrum, the intersection (a spectrum containing peaks that are present in both spectra) of the MS2 and MS3 spectra should contain only fragments of the same terminal type. A two-dimensional plot of the MS2 spectrum versus the intersection spectra (2-D fragment correlation mass spectrum) often gives enough information to derive the complete sequence of a peptide. This paper describes this novel technique and its application in sequencing cytochrome c and apomyoglobin. For a tryptic digest of cytochrome c, approximately 78% of the protein sequence was determined. For the Glu-C/tryptic digest of apomyoglobin, approximately 66% of the protein sequence was determined.  相似文献   

11.
With the increasing availability of de novo sequencing algorithms for interpreting high-mass accuracy tandem mass spectrometry (MS/MS) data, there is a growing need for programs that accurately identify proteins from de novo sequencing results. De novo sequences derived from tandem mass spectra of peptides often contain ambiguous regions where the exact amino acid order cannot be determined. One problem this poses for sequence alignment algorithms is the difficulty in distinguishing discrepancies due to de novo sequencing errors from actual genomic sequence variation and posttranslational modifications. We present a novel, mass-based approach to sequence alignment, implemented as a program called OpenSea, to resolve these problems. In this approach, de novo and database sequences are interpreted as masses of residues, and the masses, rather than the amino acid codes, are compared. To provide further flexibility, the masses can be aligned in groups, which can resolve many de novo sequencing errors. The performance of OpenSea was tested with three types of data: a mixture of known proteins, a mixture of unknown proteins that commonly contain sequence variations, and a mixture of posttranslationally modified known proteins. In all three cases, we demonstrate that OpenSea can identify more peptides and proteins than commonly used database-searching programs (SEQUEST and ProteinLynx) while accurately locating sequence variation sites and unanticipated posttranslational modifications in a high-throughput environment.  相似文献   

12.
We describe the impact of advances in mass measurement accuracy, +/- 10 ppm (internally calibrated), on protein identification experiments. This capability was brought about by delayed extraction techniques used in conjunction with matrix-assisted laser desorption ionization (MALDI) on a reflectron time-of-flight (TOF) mass spectrometer. This work explores the advantage of using accurate mass measurement (and thus constraint on the possible elemental composition of components in a protein digest) in strategies for searching protein, gene, and EST databases that employ (a) mass values alone, (b) fragment-ion tagging derived from MS/MS spectra, and (c) de novo interpretation of MS/MS spectra. Significant improvement in the discriminating power of database searches has been found using only molecular weight values (i.e., measured mass) of > 10 peptide masses. When MALDI-TOF instruments are able to achieve the +/- 0.5-5 ppm mass accuracy necessary to distinguish peptide elemental compositions, it is possible to match homologous proteins having > 70% sequence identity to the protein being analyzed. The combination of a +/- 10 ppm measured parent mass of a single tryptic peptide and the near-complete amino acid (AA) composition information from immonium ions generated by MS/MS is capable of tagging a peptide in a database because only a few sequence permutations > 11 AA's in length for an AA composition can ever be found in a proteome. De novo interpretation of peptide MS/MS spectra may be accomplished by altering our MS-Tag program to replace an entire database with calculation of only the sequence permutations possible from the accurate parent mass and immonium ion limited AA compositions. A hybrid strategy is employed using de novo MS/MS interpretation followed by text-based sequence similarity searching of a database.  相似文献   

13.
A new matrix-assisted laser desorption/ionization (MALDI) time-of-flight/time-of-flight (TOF/TOF) high-resolution tandem mass spectrometer is described for sequencing peptides. This instrument combines the advantages of high sensitivity for peptide analysis associated with MALDI and comprehensive fragmentation information provided by high-energy collision-induced dissociation (CID). Unlike the postsource decay technique that is widely used with MALDI-TOF instruments and typically combines as many as 10 separate spectra of different mass regions, this instrument allows complete fragment ion spectra to be obtained in a single acquisition at a fixed reflectron voltage. To achieve optimum resolution and focusing over the whole mass range, it may be desirable to acquire and combine three separate sections. Different combinations of MALDI matrix and collision gas determine the amount of internal energy deposited by the MALDI process and the CID process, which provide control over the extent and nature of the fragment ions observed. Examples of peptide sequencing are presented that identify sequence-dependent features and demonstrate the value of modifying the ionization and collision conditions to optimize the spectral information.  相似文献   

14.
Interest in peptides incorporating boronic acid moieties is increasing due to their potential as therapeutics/diagnostics for a variety of diseases such as cancer. The utility of peptide boronic acids may be expanded with access to vast libraries that can be deconvoluted rapidly and economically. Unfortunately, current detection protocols using mass spectrometry are laborious and confounded by boronic acid trimerization, which requires time-consuming analysis of dehydration products. These issues are exacerbated when the peptide sequence is unknown, as with de novo sequencing, and especially when multiple boronic acid moieties are present. Thus, a rapid, reliable, and simple method for peptide identification is of utmost importance. Herein, we report the identification and sequencing of linear and branched peptide boronic acids containing up to five boronic acid groups by matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS). Protocols for preparation of pinacol boronic esters were adapted for efficient MALDI analysis of peptides. Additionally, a novel peptide boronic acid detection strategy was developed in which 2,5-dihydroxybenzoic acid (DHB) served as both matrix and derivatizing agent in a convenient, in situ, on-plate esterification. Finally, we demonstrate that DHB-modified peptide boronic acids from a single bead can be analyzed by MALDI-MSMS analysis, validating our approach for the identification and sequencing of branched peptide boronic acid libraries.  相似文献   

15.
The goal of many MS/MS de novo sequencing strategies is to generate a single product ion series that can be used to determine the precursor ion sequence. Most methods fall short of achieving such simplified spectra, and the presence of additional ion series impede peptide identification. The present study aims to solve the problem of confounding ion series by enhancing the formation of "golden" sets of a, b, and c ions for sequencing. Taking advantage of the characteristic mass differences between the golden ions allows N-terminal fragments to be readily identified while other ion series are excluded. By combining the use of Lys-N, an alternate protease, to produce peptides with lysine residues at each N-terminus with subsequent imidazolinylation of the ε-amino group of each lysine, peptides with highly basic sites localized at each N-terminus are generated. Subsequent MS/MS analysis by using 193 nm ultraviolet photodissociation (UVPD) results in enhanced formation of the diagnostic golden pairs and golden triplets that are ideal for de novo sequencing.  相似文献   

16.
There are several computer programs that can match peptide tandem mass spectrometry data to their exactly corresponding database sequences, and in most protein identification projects, these programs are utilized in the early stages of data interpretation. However, situations frequently arise where tandem mass spectral data cannot be correlated with any database sequences. In these cases, the unmatched data could be due to peptides derived from novel proteins, allelic or species-derived variants of known proteins, or posttranslational or chemical modifications. Two additional problems are frequently encountered in high-throughput protein identification. First, it is difficult to quickly sift through large amounts of data to identify those spectra that, due to poor signal or contaminants, can be ignored. Second, it is important to find incorrect database matches (false positives). We have chosen to address these difficulties by performing automatic de novo sequencing using a computer program called Lutefisk. Sequence candidates obtained are used as input in a homology-based database search program called CIDentify to identify variants of known proteins. Comparison of database-derived sequences with de novo sequences allows for electronic validation of database matches even if the latter are not completely correct. Modifications to the original Lutefisk program have been implemented to handle data obtained from triple quadrupole, ion trap, and quadrupole/time-of-flight hybrid (Qtof) mass spectrometers. For example, the linearity of mass errors due to temperature-dependent expansion of the flight tube in a Qtof was exploited such that isobaric amino acids (glutamine/lysine and oxidized methionine/ phenylalanine) can be differentiated without careful attention to mass calibration.  相似文献   

17.
An improved method for peptide de novo sequencing by MALDI mass spectrometry is presented. The method couples a charge derivatization reaction with C-terminal digestion to modify tryptic peptides. The charge derivatization attaches a fixed charge group onto the N-termini of peptides, and the enzymatic digestion after the derivatization step removes C-terminal basic amino acid residues such as arginine and lysine. The fragmentation of the modified peptide(s) under low-energy CID conditions (MALDI Q-TOF mass spectrometer) yields a simplified yet complete ion series of the peptide sequence. The validity of the method is demonstrated by the results from several model protein digests, where peptide sequences were correctly deduced either manually or through an automated sequencing program.  相似文献   

18.
De novo sequencing is a spectrum analysis approach for mass spectrometry data to discover post-translational modifications in proteins; however, such an approach is still in its infancy and is still not widely applied to proteomic practices due to its limited reliability. In this work, we describe a de novo sequencing approach for the discovery of protein modifications based on identification of the proteome UStags (Shen, Y.; Toli?, N.; Hixson, K. K.; Purvine, S. O.; Pasa-Toli?, L.; Qian, W. J.; Adkins, J. N.; Moore, R. J.; Smith, R. D. Anal. Chem. 2008, 80, 1871-1882). The de novo information was obtained from Fourier-transform tandem mass spectrometry data for peptides and polypeptides from a yeast lysate, and the de novo sequences obtained were selected based on filter levels designed to provide a limited yet high quality subset of UStags. The DNA-predicted database protein sequences were then compared to the UStags, and the differences observed across or in the UStags (i.e., the UStags' prefix and suffix sequences and the UStags themselves) were used to infer possible sequence modifications. With this de novo-UStag approach, we uncovered some unexpected variances within several yeast protein sequences due to amino acid mutations and/or multiple modifications to the predicted protein sequences. To determine false discovery rates, two random (false) databases were independently used for sequence matching, and ~3% false discovery rates were estimated for the de novo-UStag approach. The factors affecting the reliability (e.g., existence of de novo sequencing noise residues and redundant sequences) and the sensitivity of the approach were investigated and described. The combined de novo-UStag approach complements the UStag method previously reported by enabling the discovery of new protein modifications.  相似文献   

19.
A rectilinear ion trap (RIT) mass analyzer was incorporated into a mass spectrometer fitted with an electrospray ionization source and an atmospheric pressure interface. The RIT mass spectrometer, which was assembled in two different configurations, was used for the study of biological compounds, for which performance data are given. A variety of techniques, including the use of a balanced rf, elevated background gas pressure, automatic gain control, and resonance ejection waveforms with dynamically adjusted amplitude, were applied to enhance performance. The capabilities of the instrument were characterized using proteins, peptides, and pharmaceutical drugs. Unit resolution and an accuracy of better than m/z 0.2 was achieved for mass-to-charge (m/z) ratios up to 2000 Th at a scan rate of approximately 3000 amu/(charge.s) while reduced scan rates gave greater resolution and peak widths of less than m/z 0.5 over the same range. The mass discrimination in trapping externally generated ions was characterized over the range m/z 190-2000 and an optimized low mass cutoff value of m/z 120-140 was found to give equal trapping efficiencies over the entire range. The radial detection efficiency was measured as a function of m/z ratio and found to rise from 35% at low m/z values to more than 90% for ions of m/z 1800. The way in which the ion trapping capacity depends on the dc trapping potential was investigated by measuring the mass shift due to space charge effects, and it was shown that low trapping potentials minimize space charge effects by increasing the useful volume of the device. The collision-induced dissociation (CID) capabilities of the RIT instrument were evaluated by measuring isolation efficiency as a function of mass resolution as well as measuring peptide CID efficiencies. Overall CID efficiencies of more than 60% were easily reached, while isolation of an ion with unit resolution at m/z 524 was achieved with high rejection (>95%) of the adjacent ions. The overall analytical capabilities of the ESI-RIT instrument were demonstrated with the analysis of a mixture of pharmaceutical compounds using multiple-stage mass spectrometry.  相似文献   

20.
A widespread proteomics procedure for characterizing a complex mixture of proteins combines tandem mass spectrometry and database search software to yield mass spectra with identified peptide sequences. The same peptides are often detected in multiple experiments, and once they have been identified, the respective spectra can be used for future identifications. We present a method for collecting previously identified tandem mass spectra into a reference library that is used to identify new spectra. Query spectra are compared to references in the library to find the ones that are most similar. A dot product metric is used to measure the degree of similarity. With our largest library, the search of a query set finds 91% of the spectrum identifications and 93.7% of the protein identifications that could be made with a SEQUEST database search. A second experiment demonstrates that queries acquired on an LCQ ion trap mass spectrometer can be identified with a library of references acquired on an LTQ ion trap mass spectrometer. The dot product similarity score provides good separation of correct and incorrect identifications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号