首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The theory together with an algorithm for uncorrelated linear discriminant analysis (ULDA) is introduced and applied to explore metabolomics data. ULDA is a supervised method for feature extraction (FE), discriminant analysis (DA) and biomarker screening based on the Fisher criterion function. While principal component analysis (PCA) searches for directions of maximum variance in the data, ULDA seeks linearly combined variables called uncorrelated discriminant vectors (UDVs). The UDVs maximize the separation among different classes in terms of the Fisher criterion. The performance of ULDA is evaluated and compared with PCA, partial least squares discriminant analysis (PLS-DA) and target projection discriminant analysis (TP-DA) for two datasets, one simulated and one real from a metabolomic study. ULDA showed better discriminatory ability than PCA, PLS-DA and TP-DA. The shortcomings of PCA, PLS-DA and TP-DA are attributed to interference from linear correlations in data. PLS-DA and TP-DA performed successfully for the simulated data, but PLS-DA was slightly inferior to ULDA for the real data. ULDA successfully extracted optimal features for discriminant analysis and revealed potential biomarkers. Furthermore, by means of cross-validation, the classification model obtained by ULDA showed better predictive ability than PCA, PLS-DA and TP-DA. In conclusion, ULDA is a powerful tool for revealing discriminatory information in metabolomics data.  相似文献   

2.
Lutz U  Lutz RW  Lutz WK 《Analytical chemistry》2006,78(13):4564-4571
Mass spectrometry (MS) is increasingly being used for metabolic profiling, but detection modes such as constant neutral loss or multiple reaction monitoring have not often been reported. These modes allow focusing on structurally related compounds, which could be advantageous for situations in which the trait under investigation is associated with a particular class of metabolites. In this study, we analyzed endogenous glucuronides excreted in human urine by monitoring characteristic transitions of putative steroid glucuronides by LC-MS/MS for discrimination of females from males. Two methods for data extraction were used: (i) a manual procedure based on visual inspection of the chromatograms and selection of 23 peaks and (ii) a software-supported method (MarkerView) set to extract 100 peaks. Data from 10 female and 10 male students were analyzed by principal component analysis (PCA) and partial least-squares discriminant analysis (PLS-DA) using software SIMCA. With PCA, only the manual peak selection resulted in clustering males and females. With PLS-DA, the manual method provided full separation on the basis of one single discriminant; the software-supported approach required a two-component model for complete separation. Loading plots were analyzed for their ability to reveal peaks with high discriminating power, that is, potential biomarkers. The PLS-DA models were validated with urine samples collected from five new females and five new males. Gender was correctly assigned for all. Our results indicate that inclusion of biological criteria for variable selection coupled to class-specific MS analysis and data extraction by appropriate software may constitute a valuable addition to the methods available for metabolomics.  相似文献   

3.
Currently, no standard metrics are used to quantify cluster separation in PCA or PLS-DA scores plots for metabonomics studies or to determine if cluster separation is statistically significant. Lack of such measures makes it virtually impossible to compare independent or inter-laboratory studies and can lead to confusion in the metabonomics literature when authors putatively identify metabolites distinguishing classes of samples based on visual and qualitative inspection of scores plots that exhibit marginal separation. While previous papers have addressed quantification of cluster separation in PCA scores plots, none have advocated routine use of a quantitative measure of separation that is supported by a standard and rigorous assessment of whether or not the cluster separation is statistically significant. Here quantification and statistical significance of separation of group centroids in PCA and PLS-DA scores plots are considered. The Mahalanobis distance is used to quantify the distance between group centroids, and the two-sample Hotelling's T2 test is computed for the data, related to an F-statistic, and then an F-test is applied to determine if the cluster separation is statistically significant. We demonstrate the value of this approach using four datasets containing various degrees of separation, ranging from groups that had no apparent visual cluster separation to groups that had no visual cluster overlap. Widespread adoption of such concrete metrics to quantify and evaluate the statistical significance of PCA and PLS-DA cluster separation would help standardize reporting of metabonomics data.  相似文献   

4.
A large metabolomics study was performed on 600 plasma samples taken at four time points before and after a single intake of a high fat test meal by obese and lean subjects. All samples were analyzed by a liquid chromatography-mass spectrometry (LC-MS) lipidomic method for metabolic profiling. A pragmatic approach combining several well-established statistical methods was developed for processing this large data set in order to detect small differences in metabolic profiles in combination with a large biological variation. Such metabolomics studies require a careful analytical and statistical protocol. The strategy included data preprocessing, data analysis, and validation of statistical models. After several data preprocessing steps, partial least-squares discriminant analysis (PLS-DA) was used for finding biomarkers. To validate the found biomarkers statistically, the PLS-DA models were validated by means of a permutation test, biomarker models, and noninformative models. Univariate plots of potential biomarkers were used to obtain insight in up- or downregulation. The strategy proposed proved to be applicable for dealing with large-scale human metabolomics studies.  相似文献   

5.
Given the relevance of principal component analysis (PCA) to the treatment of spectrometric data, we have evaluated potentialities and limitations of such useful statistical approach for the harvesting of information in large sets of X-ray photoelectron spectroscopy (XPS) spectra. Examples allowed highlighting the contribution of PCA to data treatment by comparing the results of this data analysis with those obtained by the usual XPS quantification methods. PCA was shown to improve the identification of chemical shifts of interest and to reveal correlations between peak components. First attempts to use the method led to poor results, which showed mainly the distance between series of samples analyzed at different moments. To weaken the effect of variations of minor interest, a data normalization strategy was developed and tested. A second issue was encountered with spectra suffering of an even slightly inaccurate binding energy scale correction. Indeed, minor shifts of energy channels lead to the PCA being performed on incorrect variables and consequently to misleading information. In order to improve the energy scale correction and to speed up this step of data pretreatment, a data processing method based on PCA was used. Finally, the overlap of different sources of variation was studied. Since the intensity of a given energy channel consists of electrons from several origins, having suffered inelastic collisions (background) or not (peaks), the PCA approach cannot compare them separately, which may lead to confusion or loss of information. By extracting the peaks from the background and considering them as new variables, the effect of the elemental composition could be taken into account in the case of spectra with very different backgrounds. In conclusion, PCA is a very useful diagnostic tool for the interpretation of XPS spectra, but it requires a careful and appropriate data pretreatment.  相似文献   

6.
We present a method for the qualitative and quantitative study of transient metabolic flux of phage infection at the molecular level. The method is based on statistical total correlation spectroscopy (STOCSY) and partial least squares discriminant analysis (PLS-DA) applied to nuclear magnetic resonance (NMR) metabonomic data sets. An algorithm for this type of study is developed and demonstrated. The method has been implemented on (1)H NMR data sets of growth media in planktonic cultures of Pseudomonas aeruginosa infected with bacteriophage pf1. Transient metabolic flux of various important metabolites, identified by STOCSY and PLS-DA analysis applied to the NMR data set, are estimated at various stages of growth. The opportunistic and nosocomial pathogen P. aeruginosa is one of the best-studied model organism for bacterial biofilms. Complete information regarding metabolic connectivity of this system is not possible by conventional spectroscopic approach. Our study presents temporal comparative (1)H NMR metabonomic analyses of filamentous phage pf1 infection in planktonic cultures of P. aeruginosa K strain (PAK). We exemplify here the potential of STOCSY and PLS-DA tools to gain mechanistic insight into subtle changes and to determine the transient flux associated with metabolites following metabolic perturbations resulting from phage infection. Our study has given new avenues in correlating existing postgenomic data with current metabonomic results in P. aeruginosa biofilms research.  相似文献   

7.
Application of metabonomics to nutritional sciences, also termed as nutrimetabonomics, offers the possibility to measure metabolic responses associated with the consumption of specific nutrients and foods. As dietary differences generally only lead to subtle metabolic changes, measuring diet associated metabolic phenotypes is a challenge, and also an opportunity to develop and test new chemometric strategies that can highlight metabolic information in relation to different dietary habits. While multivariate statistical techniques have long been used to analyse dietary data from diet records and questionnaires, to date no attempt has been made to link dietary patterns with metabolic profiles. Using a three-step strategy, it was possible to merge 1H NMR plasma metabolic profile data with specific dietary patterns as assessed by Principal Component Analysis (PCA) and Partial Least Squares-Discriminant Analysis (PLS-DA). Five dietary patterns (energy intake, plant versus animal based diet, “traditional diet” versus sugar-rich diet, “traditional” versus “modern” diets, and consumption of skim versus whole dairy products) were found by applying PCA to the food frequency questionnaire data which explained 50% of the variation. Metabolic phenotypes associated with these dietary patterns were obtained by PLS-DA and were mainly based on differences in lipids and amino acid profiles in plasma. This new approach to assess relationships between dietary intake and metabolic profiling data will allow greater steps to be made in merging nutritional epidemiology with metabonomics.  相似文献   

8.
Constant neutral loss (CNL) and precursor ion (PI) scan have been widely used for the in vitro screening of glutathione conjugates derived from reactive metabolites, but these two methods are only applicable to triple quadrupole or hybrid triple quadrupole mass spectrometers. Additionally, the success of CNL and PI scanning largely depends on structure and CID fragmentation pathways of GSH conjugates. In the present study, a highly efficient methodology has been developed as an alternative approach for high-throughput screening and structural characterization of reactive metabolites using the linear ion trap mass spectrometer. In microsomal incubations, a mixture of glutathione [GSH, gamma-glutamyl-cystein-glycin] and the stable-isotope labeled compound [GSX, gamma-glutamyl-cystein-glycin-(13)C2-(15)N] was used to trap reactive metabolites, resulting in formation of both labeled and unlabeled conjugates at a given isotopic ratio. A mass difference of 3.0 Da between the natural and labeled GSH conjugate (mass tag) at a fixed isotopic ratio constitutes a unique mass pattern that can selectively trigger the data-dependent MS(2) scan of both isotopic partner ions, respectively. In order to eliminate the response bias of GSH adducts in the positive and negative mode, a polarity switch is executed between the mass tag-triggered data dependent MS(2) scan, and thus ESI- and ESI+ MS(2) spectra of both labeled and nonlabeled GSH conjugates are obtained in a single LC-MS run. Unambiguous identification of glutathione adducts was readily achieved with great confidence by MS(2) spectra of both labeled and unlabeled conjugates. Reliability of this method was vigorously validated using several model compounds that are known to form reactive metabolites. This approach is not based on the appearance of a particular product ion such as MH(+) - 129 and anion at m/z 272, whose formation can be structure-dependent and sensitive to the collision energy level; therefore, the present method can be suitable for unbiased screening of any reactive metabolites, regardless of their CID fragmentation pathways. Additionally, this methodology can potentially be applied to triple quadrupole or hybrid triple quadrupole mass spectrometers.  相似文献   

9.
The feasibility of using chemometric techniques for the automatic detection of whether a rabbit kidney is pathological or not is studied. Sequential images of the kidney are acquired using Dynamic Contrast-Enhanced Magnetic Resonance Imaging with contrast agent injection. A segmentation approach based upon principal component analysis (PCA) is used to separate out the cortex from the rest of the kidney including the medulla, the renal pelvic, and the background. Two classifiers (Soft Independent Method of Class Analogy, SIMCA; Partial Least Squares Discriminant Analysis, PLS-DA) are tested for various types of data pre-treatment including segmentation, feature extraction, centering, autoscaling, standard normal variate transformation, Savitsky-Golay smoothing, and normalization. It is shown that (i) the renal cortex contains more discriminating information on kidney perfusion changes than the whole kidney, and (ii) the PLS-DA classifiers outperform the SIMCA classifiers. PLS-DA, preceded by an automated PCA-based segmentation of kidney anatomical regions, correctly classified all kidneys and constitutes a classification tool of the renal function that can be useful for the clinical diagnosis of renovascular diseases.  相似文献   

10.
Principal components analysis (PCA) is used to evaluate similarities in the trace element chemistry of groundwaters. Many of the trace elements, however, occur at concentrations below the detection limits (DL), which presents problems for statistical analyses. Since the optimal methods for dealing with the ‘

In this study, a new approach was developed to determine the best substitution methods when dealing with the ‘DL’ values for a given data set. Monte Carlo simulation experiments, using a mixture multivariate model, were performed to test the effects of substitution of the ‘

When ‘相似文献   


11.
Tissue engineering approaches fabricate and subsequently implant cell-seeded and unseeded scaffold biomaterials. Once in the body, these biomaterials are repopulated with somatic cells of various phenotypes whose identification upon explantation can be expensive and time-consuming. We show that imaging time-of-flight secondary ion mass spectrometry (TOF-SIMS) can be used to distinguish mammalian cell types in heterogeneous cultures. Primary rat esophageal epithelial cells (REEC) were cultured with NIH 3T3 mouse fibroblasts on tissue culture polystyrene and freeze-dried before TOF-SIMS imaging. Results show that a short etching sequence with C(60)(+) ions can be used to clean the sample surface and improve the TOF-SIMS image quality. Principal component analysis (PCA) and partial least-squares discriminant analysis (PLS-DA) were used to identify peaks whose contributions to the total variance in the multivariate model were due to either the two cell types or the substrate. Using PLS-DA, unknown regions of cellularity that were otherwise unidentifiable by SIMS could be classified. From the loadings in the PLS-DA model, peaks were selected that were indicative of the two cell types and TOF-SIMS images were created and overlaid that showed the ability of this method to distinguish features visually.  相似文献   

12.
Extracting meaningful information from complex spectroscopic data of metabolite mixtures is an area of active research in the emerging field of "metabolomics", which combines metabolism, spectroscopy, and multivariate statistical analysis (pattern recognition) methods. Chemometric analysis and comparison of 1H NMR1 spectra is commonly hampered by intersample peak position and line width variation due to matrix effects (pH, ionic strength, etc.). Here a novel method for mixture analysis is presented, defined as "targeted profiling". Individual NMR resonances of interest are mathematically modeled from pure compound spectra. This database is then interrogated to identify and quantify metabolites in complex spectra of mixtures, such as biofluids. The technique is validated against a traditional "spectral binning" analysis on the basis of sensitivity to water suppression (presaturation, NOESY-presaturation, WET, and CPMG), relaxation effects, and NMR spectral acquisition times (3, 4, 5, and 6 s/scan) using PCA pattern recognition analysis. In addition, a quantitative validation is performed against various metabolites at physiological concentrations (9 microM-8 mM). "Targeted profiling" is highly stable in PCA-based pattern recognition, insensitive to water suppression, relaxation times (within the ranges examined), and scaling factors; hence, direct comparison of data acquired under varying conditions is made possible. In particular, analysis of metabolites at low concentration and overlapping regions are well suited to this analysis. We discuss how targeted profiling can be applied for mixture analysis and examine the effect of various acquisition parameters on the accuracy of quantification.  相似文献   

13.
Wu Y  Noda I 《Applied spectroscopy》2007,61(10):1040-1044
The present study proposes a new quadrature orthogonal signal correlation (QOSC) filtering method based on principal component analysis (PCA). The external perturbation variable vector typically used in the QOSC operation is replaced with a matrix consisting of the spectral data principal components (PCs) and their quadrature counterparts obtained by using the discrete Hilbert-Noda transformation. Thus, QOSC operation can be carried out for a dataset without the explicit knowledge of the external variables information. The PCA-based QOSC filtering can be most effectively applied to two-dimensional (2D) correlation analysis. The performance of this filtering operation on the simulated spectra data set with the interference of strong random noise demonstrated that the PCA-based QOSC filtering not only eliminates the influence of signals that are unrelated to the final target but also preserves the out-of-phase information in the data matrix essential for asynchronous correlation analysis. The result of 2D correlation analysis has also demonstrated that essentially only one principal component is necessary for PCA-based QOSC to perform well. Although the present PCA-based QOSC filtering scheme is not as powerful as that based on the explicit knowledge of the external variable vector, it still can significantly improve the quality of 2D correlation spectra and enables OSC 2D to deal with the problems of losing the quadrature (or out-of-phase) information. In particular, it opens a way to perform QOSC for the spectral dataset without external variables information. The proposed approach should have wide applications in 2D correlation analysis of spectra driven by multiplicative effects in complicated systems in biological, pharmaceutical, and agriculture fields, and so on, where the explicit nature of the external perturbation cannot always be known.  相似文献   

14.
15.
Static time-of-flight secondary ion mass spectrometry (TOF-SIMS) was performed on monolayers on scribed silicon (Si(scr)) derived from 1-alkenes, 1-alkynes, 1-holoalkanes, aldehydes, and acid chlorides. To rapidly determine the variation in the data without introducing user bias, a multivariate analysis was performed. First, principal components analysis (PCA) was done on data obtained from silicon scribed with homologous series of aldehydes and acid chlorides. For this study, the positive ion spectra, the negative ion spectra, and the concatentated (linked) positive and negative ion spectra were preprocessed by normalization, mean centering, and autoscaling. The mean centered data consistently showed the best correlations between the scores on PC1 and the number of carbon atoms in the adsorbate. These correlations were not as strong for the normalized and autoscaled data. After reviewing these methods, it was concluded that mean centering is the best preprocessing method for TOF-SIMS spectra of monolayers on Si(scr). A PCA analysis of all of the positive ion spectra revealed a good correlation between the number of carbon atoms in all of the adsorbates and the scores on PC1. PCA of all of the negative ion spectra and the concatenated positive and negative ion spectra showed a correlation based on the number of carbon atoms in the adsorbate and the class of the adsorbate. These results imply that the positive ion spectra are most sensitive to monolayer thickness, while the negative ion spectra are sensitive to the nature of the substrate-monolayer interface and the monolayer thickness. Loadings show an inverse relationship between (inorganic) fragments that are expected from the substrate and (organic) fragments expected from the monolayer. Multivariate peak intensity ratios were derived. It is also suggested that PCA can be used to detect outlier surfaces. Partial least squares showed a strong correlation between the number of carbon atoms in the adsorbate and the number it predicted.  相似文献   

16.
高光谱图像技术检测柑橘果锈   总被引:6,自引:1,他引:6  
高光谱图像技术作为农产品无损检测的新技术,探讨了其在柑橘外部品质检测的可行性.以检测柑橘果锈为目的,首先对经预处理的高光谱图像数据进行主成分分析,优选出571 nm、652 nm和741 nm三个特征波长组成新的图像块;再进行第二步主成分分析,得到的第三主成分图像为最适宜检测柑橘果锈的图像;最后对该图像进行中值滤波、平方根变换,阈值分割和数字形态学运算完成特征提取.试验结果表明,此算法对柑橘果锈检测的正确率可达到90%.研究表明,利用高光谱图像技术结合两步主成分分析算法检测柑橘果锈是可行的.  相似文献   

17.
Normal-phase or reverse-phase liquid chromatography has been used in phospholipidomics for lipid separation prior to mass spectrometry analysis. However, separation using a single separation mode is often inadequate, as high-abundance phospholipids can mask large numbers of low-abundance lipids of interest. In order to detect and quantify low-abundance phospholipids, we present a novel two-dimensional (2D) approach for sensitive and quantitative global analysis of phospholipids. The methodology monitors individual glycerolipids and phospholipids through the use of a new quantitative normal-phase, solid-phase extraction procedure, followed by molecular characterization and relative quantification using an ion-trap Orbitrap equipped with a reverse-phase liquid chromatograph, with data processing by MS++ software. The CV (%) of the peak area of each lipid standard was less than 15% with this extraction method. When the method was applied to a liver sample, we could detect more phosphatidylserine (PS) compared to the previous method. Finally, our developed method was applied to Alzheimer's disease (AD) plasma samples. Several hundred peaks were detected from a 60 μL plasma sample. A partial-least-squares discriminant analysis (PLS-DA) plot using peak area ratio gave a unique group of PLS scores which could distinguish plasma samples of Alzheimer's disease (AD) patients from those of age-matched healthy controls.  相似文献   

18.
In metabolomics research a large number of metabolites are measured that reflect the cellular state under the experimental conditions studied. In many occasions the experiments are performed according to an experimental design to make sure that sufficient variation is induced in the metabolite concentrations. However, as metabolomics is a holistic approach, also a large number of metabolites are measured in which no variation is induced by the experimental design. The presence of such non-induced metabolites hampers traditional data analysis methods as PCA to estimate the true model of the induced variation. The greediness of PCA leads to a clear overfit of the metabolomics data and can lead to a bad selection of important metabolites. In this paper we explore how, why and how severe PCA overfits data with an underlying experimental design. Recently new data analysis methods have been introduced that can use prior information of the system to reduce the overfit. We show that incorporation of prior knowledge of the system under investigation leads to a better estimation of the true underlying structure and to less overfit. The experimental design information together with ASCA is used to improve the analysis of metabolomics data. To show the improved model estimation property of ASCA a thorough simulation study is used and the results are extended to a microbial metabolomics batch fermentation study. The ASCA model is much less affected by the non-induced variation and measurement error than PCA, leading to a much better model of the induced variation.  相似文献   

19.
Uncorrelated linear discriminant analysis (ULDA)-based heuristic feature selection (ULDA-HFS) method was proposed for sample classification and feature extraction for SELDI-TOF MS ovarian cancer data. The ULDA-HFS method includes 4 steps: (1) noise reduction and normalization; (2) selection of discriminatory bins with CHI2 method; (3) peak detection and alignment for each selected bins; and (4) selection of several peaks as potential biomarkers by means of ULDA. As a result, 7 m/z locations were selected in this study; they were 245.3, 559.4, 565.6, 704.2, 717.2, 2667 and 4074.4. To evaluate the classification impression, PCA, PLS-DA and ULDA were performed for discriminant analysis and ULDA obtained the perfect separation. Finally, the 7 selected potential biomarkers were evaluated by ULDA, both sensitivity and specificity were 100%. The 7 m/z values obtained may provide clues for ovarian cancer biomarker discovery. Once the proteins were identified at these m/z locations, it can be used as specific protein for early detection and diagnosis for ovarian cancer.  相似文献   

20.
This paper proposes a new method for exploratory analysis and the interpretation of latent structures. The approach is named missing-data methods for exploratory data analysis (MEDA). The MEDA approach can be applied in combination with several models, including Principal Components Analysis (PCA), Factor Analysis (FA) and Partial Least Squares (PLS). It can be seen as a substitute of rotation methods with better properties associated: it is more accurate than rotation methods in the detection of relationships between pairs of variables, it is robust to the overestimation of the number of PCs and it does not depend on the normalization of the loadings. MEDA is useful to infer the structure in the data and also to interpret the contribution of each latent variable. The interpretation of PLS models with MEDA, including variables selection, may be specially valuable for the chemometrics community. The use of MEDA with PCA and PLS models is demonstrated with several simulated and real examples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号