首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Data normalization plays a crucial role in metabolomics to take into account the inevitable variation in sample concentration and the efficiency of sample preparation procedure. The conventional methods such as constant sum normalization (CSN) and probabilistic quotient normalization (PQN) are widely used, but both methods have their own shortcomings. In the current study, a new data normalization method called group aggregating normalization (GAN) is proposed, by which the samples were normalized so that they aggregate close to their group centers in a principal component analysis (PCA) subspace. This is in contrast with CSN and PQN which rely on a constant reference for all samples. The evaluation of GAN method using both simulated and experimental metabolomic data demonstrated that GAN produces more robust model in the subsequent multivariate data analysis, more superior than both CSN and PQN methods. The current study also demonstrated that some of the differential metabolites identified using the CSN or PQN method could be false positives due to improper data normalization.  相似文献   

2.
In this study preprocessing of Raman spectra of different biological samples has been studied, and their effect on the ability to extract robust and quantitative information has been evaluated. Four data sets of Raman spectra were chosen in order to cover different aspects of biological Raman spectra, and the samples constituted salmon oils, juice samples, salmon meat, and mixtures of fat, protein, and water. A range of frequently used preprocessing methods, as well as combinations of different methods, was evaluated. Different aspects of regression results obtained from partial least squares regression (PLSR) were used as indicators for comparing the effect of different preprocessing methods. The results, as expected, suggest that baseline correction methods should be performed in advance of normalization methods. By performing total intensity normalization after adequate baseline correction, robust calibration models were obtained for all data sets. Combination methods like standard normal variate (SNV), multiplicative signal correction (MSC), and extended multiplicative signal correction (EMSC) in their basic form were not able to handle the baseline features present in several of the data sets, and these methods thus provide no additional benefits compared to the approach of baseline correction in advance of total intensity normalization. EMSC provides additional possibilities that require further investigation.  相似文献   

3.
The authors recently proposed an approach to the metabonomic analysis of biofluid mixtures based on the use of the selective TOCSY experiment (Sandusky, P.; Raftery, D. Anal. Chem. 2005, 77, 2455). This method has some significant advantages over standard metabonomic analysis. However, when analyzing overlapped components, the selective TOCSY method can suffer from the relatively high likelihood of simultaneous excitation of several spin systems at once. This multiple excitation can cause problems both with the purity of the individual TOCSY peaks observed and with their assignment into specific spin systems. To address this problem, the possibility of using a more selective excitation is initially explored. Unfortunately, in most cases, greater spin system selectivity can only be gained at the expense of sensitivity. This is obviously an unacceptable tradeoff when dealing with biofluid samples. However, the application of the Pearson product moment correlation to the TOCSY peak integral intensities provides a test for individual TOCSY peak purity and allows for the assignment of the peaks into spin systems. The specific application of this two-stage "semiselective" TOCSY method to rat and human urine is presented. Significantly, it is also demonstrated that the use of semiselective TOCSY spectra as data inputs for PCA calculations provides a more sensitive and reliable method of distinguishing small differences in biofluid composition than the standard metabonomic approach using complete 1D proton NMR spectra of urine samples.  相似文献   

4.
Fourier transform infrared (FT-IR) spectroscopy has become a powerful tool for biodiagnostics and cell line classification. Typical experimental perturbations included in spectra are baseline shift and scale variation between spectra. They have to be removed by data preprocessings to allow further data analysis and classification. In this work, we addressed baseline shift corrections and normalizations in attenuated total reflection (ATR) FT-IR spectra. We compared the efficiency of several preprocessing methods with series of spectra containing typical perturbations (baseline shift, scaling factor, and noise) and a priori known definite spectral difference. Several baseline-correction and normalization possibilities were evaluated. Our results were generally sensitive, selective, and robust with respect to baseline and scaling. Full-range scaling generated more false-positive results. Use of first- and second-derivative spectra was tested. Results obtained on model spectra were confirmed with series of spectra from sensitive and multidrug-resistant leukemia K562 cells. We showed that the use of derived spectra did not provide more efficiency and required additional preprocessing such as smoothing to obtain results similar to those obtained from non-derived ones. On the other hand, results obtained with derivatives were less sensitive to scaling, a useful feature when scaling is problematic.  相似文献   

5.
Probabilistic uncertainty analysis quantifies the effect of input random variables on model outputs. It is an integral part of reliability-based design, robust design, and design for Six Sigma. The efficiency and accuracy of probabilistic uncertainty analysis is a trade-off issue in engineering applications. In this paper, an efficient and accurate mean-value first order Saddlepoint Approximation (MVFOSA) method is proposed. Similar to the mean-value first order Second Moment (MVFOSM) approach, a performance function is approximated with the first order Taylor expansion at the mean values of random input variables. Instead of simply using the first two moments of the random variables as in MVFOSM, MVFOSA estimates the probability density function and cumulative distribution function of the response by the accurate Saddlepoint Approximation. Because of the use of complete distribution information, MVFOSA is generally more accurate than MVFOSM with the same computational effort. Without the nonlinear transformation from non-normal variables to normal variables as required by the first order reliability method (FORM), MVFOSA is also more accurate than FORM in certain circumstances, especially when the transformation significantly increases the nonlinearity of a performance function. It is also more efficient than FORM because an iterative search process for the so-called Most Probable Point is not required. The features of the proposed method are demonstrated with four numerical examples.  相似文献   

6.
Preprocessing of near-infrared spectra to remove unwanted, i.e., non-related spectral variation and selection of informative wavelengths is considered to be a crucial step prior to the construction of a quantitative calibration model. The standard methodology when comparing various preprocessing techniques and selecting different wavelengths is to compare prediction statistics computed with an independent set of data not used to make the actual calibration model. When the errors of reference value are large, no such values are available at all, or only a limited number of samples are available, other methods exist to evaluate the preprocessing method and wavelength selection. In this work we present a new indicator (SE) that only requires blank sample spectra, i.e., spectra of samples that are mixtures of the interfering constituents (everything except the analyte), a pure analyte spectrum, or alternatively, a sample spectrum where the analyte is present. The indicator is based on computing the net analyte signal of the analyte and the total error, i.e., instrumental noise and bias. By comparing the indicator values when different preprocessing techniques and wavelength selections are applied to the spectra, the optimal preprocessing technique and the optimal wavelength selection can be determined without knowledge of reference values, i.e., it minimizes the non-related spectral variation. The SE indicator is compared to two other indicators that also use net analyte signal computations. To demonstrate the feasibility of the SE indicator, two near-infrared spectral data sets from the pharmaceutical industry were used, i.e., diffuse reflectance spectra of powder samples and transmission spectra of tablets. Especially in pharmaceutical spectroscopic applications, it is expected beforehand that the non-related spectral variation is rather large and it is important to remove it. The indicator gave excellent results with respect to wavelength selection and optimal preprocessing. The SE indicator performs better than the two other indicators, and it is also applicable to other situations where the Beer-Lambert law is valid.  相似文献   

7.
We describe here the implementation of the statistical total correlation spectroscopy (STOCSY) analysis method for aiding the identification of potential biomarker molecules in metabonomic studies based on NMR spectroscopic data. STOCSY takes advantage of the multicollinearity of the intensity variables in a set of spectra (in this case 1H NMR spectra) to generate a pseudo-two-dimensional NMR spectrum that displays the correlation among the intensities of the various peaks across the whole sample. This method is not limited to the usual connectivities that are deducible from more standard two-dimensional NMR spectroscopic methods, such as TOCSY. Moreover, two or more molecules involved in the same pathway can also present high intermolecular correlations because of biological covariance or can even be anticorrelated. This combination of STOCSY with supervised pattern recognition and particularly orthogonal projection on latent structure-discriminant analysis (O-PLS-DA) offers a new powerful framework for analysis of metabonomic data. In a first step O-PLS-DA extracts the part of NMR spectra related to discrimination. This information is then cross-combined with the STOCSY results to help identify the molecules responsible for the metabolic variation. To illustrate the applicability of the method, it has been applied to 1H NMR spectra of urine from a metabonomic study of a model of insulin resistance based on the administration of a carbohydrate diet to three different mice strains (C57BL/6Oxjr, BALB/cOxjr, and 129S6/SvEvOxjr) in which a series of metabolites of biological importance can be conclusively assigned and identified by use of the STOCSY approach.  相似文献   

8.
Quantitative analysis of textile blends and textile fabrics is currently of particular interest in the industrial context. In this frame, this work investigates whether the use of Fourier transform (FT) near-infrared (NIR) spectroscopy and chemometrics is powerful for rapid and accurate quantitative analysis of cotton-polyester content in blend products. As samples of the same composition have many sources of variability that affect NIR spectra, indirect prediction is particularly challenging and a large sample population is required to design robust calibration models. Thus, a total of more than three-hundred cotton-polyester samples were selected covering the range from the 0% to 100% cotton and the corresponding NIR reflectance spectra were measured on raw fabrics. The data set obtained was used to develop multivariate models for quantitative prediction from reference measurements. A successful approach was found to rely on partial least squares (PLS) regression combined with genetic algorithms (GAs) for wavelength selection. It involved evaluating a set of calibration models considering different spectral regions. The results obtained considering 27.5% of the original variables yielded a prediction error (RMSEP) of 2.3 in percent cotton content. It demonstrates that FT-NIR spectroscopy has the potential to be used in the textile industry for the prediction of the composition of cotton-polyester blends. As a further consequence, it was observed that the spectral preprocessing and the complexity of the model are simplified compared to the full-spectrum approach. Also, the relevancy of the spectral intervals retained after variable selection can be discussed.  相似文献   

9.
10.
The application of the traditional methods of multivariate statistics, such as the calculation of principle components, to the analysis of NMR spectra taken on sets of biofluid samples is one of the central approaches in the field of metabonomics. While this approach has proven to be a powerful and widely applicable technique, it has an inherent weakness, in that it tends to be dominated by those chemical species present at relatively higher concentrations. Using a set of commercial honey samples, a comparison of this classical metabonomics approach to one based on the use of the selective TOCSY experiment is presented. While the NMR spectrum of honey and its classical metabonomic analysis is completely dominated by a very few chemical species, specifically alpha-glucose and fructose, the statistical signal carried by minor honey components, such as amino acids, may be accessed using a selective TOCSY-based approach. This approach has the intrinsic virtue that it focuses the statistical analysis on a set of predefined chemical species, which might be chosen for their metabolic significance, and could be composed of either major or minor mixture constituents. Furthermore, the selective TOCSY method allows for more certain chemical identification, acquisition times of approximately 1 min, and accurate quantification of the species contributing to the statistical discriminatory signal.  相似文献   

11.
This paper describes mathematical techniques to correct for analyte-irrelevant optical variability in tissue spectra by combining multiple preprocessing techniques to address variability in spectral properties of tissue overlying and within the muscle. A mathematical preprocessing method called principal component analysis (PCA) loading correction is discussed for removal of inter-subject, analyte-irrelevant variations in muscle scattering from continuous-wave diffuse reflectance near-infrared (NIR) spectra. The correction is completed by orthogonalizing spectra to a set of loading vectors of the principal components obtained from principal component analysis of spectra with the same analyte value, across different subjects in the calibration set. Once the loading vectors are obtained, no knowledge of analyte values is required for future spectral correction. The method was tested on tissue-like, three-layer phantoms using partial least squares (PLS) regression to predict the absorber concentration in the phantom muscle layer from the NIR spectra. Two other mathematical methods, short-distance correction to remove spectral interference from skin and fat layers and standard normal variate scaling, were also applied and/or combined with the proposed method prior to the PLS analysis. Each of the preprocessing methods improved model prediction and/or reduced model complexity. The combination of the three preprocessing methods provided the most accurate prediction results. We also performed a preliminary validation on in vivo human tissue spectra.  相似文献   

12.
The present work uses fluorescein as the model fluorophore and points out critical steps in the use of MESF (Molecules of Equivalent Soluble Fluorophores) values for quantitative flow cytometric measurements. It has been found that emission spectrum matching between a reference solution and an analyte and normalization by the corresponding extinction coefficient are required for quantifying fluorescence signals using flow cytometers. Because of the use of fluorescein, the pH value of the medium is also critical for accurate MESF assignments. Given that the emission spectrum shapes of microbead suspensions and stained biological cells are not significantly different, the percentage of error due to spectrum mismatch is estimated. We have also found that the emission spectrum of a microbead with a seven-methylene linker between the fluorescein and the bead surface (bead7) provides the best match with the spectra from biological cells. Therefore, bead7 is potentially a better calibration standard for flow cytometers than the existing one that is commercially available and used in the present study.  相似文献   

13.
Spectral measurements of complex heterogeneous types of mixture samples are often affected by significant multiplicative effects resulting from light scattering, due to physical variations (e.g., particle size and shape, sample packing, and sample surface, etc.) inherent within the individual samples. Therefore, the separation of the spectral contributions due to variations in chemical compositions from those caused by physical variations is crucial to accurate quantitative spectroscopic analysis of heterogeneous samples. In this work, an improved strategy has been proposed to estimate the multiplicative parameters accounting for multiplicative effects in each measured spectrum and, hence, mitigate the detrimental influence of multiplicative effects on the quantitative spectroscopic analysis of heterogeneous samples. The basic assumption of the proposed method is that light scattering due to physical variations has the same effects on the spectral contributions of each of the spectroscopically active chemical components in the same sample mixture. On the basis of this underlying assumption, the proposed method realizes the efficient estimation of the multiplicative parameters by solving a simple quadratic programming problem. The performance of the proposed method has been tested on two publicly available benchmark data sets (i.e., near-infrared total diffuse transmittance spectra of four-component suspension samples and near-infrared spectral data of meat samples) and compared with some empirical approaches designed for the same purpose. It was found that the proposed method provided appreciable improvement in quantitative spectroscopic analysis of heterogeneous mixture samples. The study indicates that accurate quantitative spectroscopic analysis of heterogeneous mixture samples can be achieved through the combination of spectroscopic techniques with smart modeling methodology.  相似文献   

14.
Protein profiling with mass spectrometry is a promising approach for classification and identification of biomarkers; however, there is debate about measurement quality and reliability. Here, we present a pipeline for preprocessing, statistical data analysis and presentation. Serum samples of 16 healthy individuals are used to generate protein profiles with high-resolution MALDI-TOF after isolation of peptides with C8 magnetic beads. Analysis of variance was performed after binning, baseline correction and normalization of the mean spectra. Relative variations in the spectra are expressed as coefficient of variation, which depending on the respective preanalytical variation parameter investigated, was found to range between 0.15 and 0.67 in this study. With this novel method, the reproducibility of our protein profiling procedure could be quantified. We showed that circadian rhythm and the number of freeze-thaw cycles had relatively limited influence on serum protein profiles, whereas the period between collection and serum centrifugation had a more pronounced effect.  相似文献   

15.
Retinal image registration, which is applied in diagnosing and treating eye diseases, plays an important role in medical image analysis. Existing methods suffer from problems due to different imaging viewpoints, times, quality, modalities, and retinal disasters. In this paper, we propose an efficient retinal images registration framework that overcomes these challenges without supervision. We present a layer-wise matching method to achieve a uniform distribution of features in both image-space and scale-space. Then, a novel method called Bayesian integration is generated to accumulate more meaningful inputs. We use the results of different matches as priors, assign a score to each match, and categorize them using a dynamic threshold. Finally, in accordance with previous work, we transform the problem into a probabilistic model, with the asymmetric Gaussian mixture model representing the distribution. A robust estimation is performed on a non-rigid transformation. The experimental results demonstrate that our proposed framework is robust to kinds of retinal image degradation and produces a more stable and accurate result than state-of-the-art methods.  相似文献   

16.
姜红  陆润洲  段斌  刘峰 《包装工程》2021,42(21):79-85
目的 为了探究一种基于光谱分析的检验方法,以达到快速准确地区分检验烟盒物证的目的.方法 采用便携式差分拉曼光谱仪,对39个不同的黄色烟盒样本进行测试,取得各样本的差分拉曼光谱数据.根据烟盒填料种类对样本进行初步分类,再结合化学计量学,通过IBM SPSS Statistics 26.0软件,在主成分分析法对数据进行降维的基础上,对测量结果进行系统聚类和K-Means聚类.结果 39种样本依据填料种类可以区分为3类,结合化学计量学可以更准确地区分为6类.结论 该方法无损检材、快捷准确,且图谱不受荧光干扰,结合化学计量学方法可以对烟盒样本进行分类检验,此方法为公安机关在犯罪现场检验此类物证提供了依据.  相似文献   

17.
Variable (or wavelength) selection plays an important role in the quantitative analysis of near-infrared (NIR) spectra. A modified method of uninformative variable elimination (UVE) was proposed for variable selection in NIR spectral modeling based on the principle of Monte Carlo (MC) and UVE. The method builds a large number of models with randomly selected calibration samples at first, and then each variable is evaluated with a stability of the corresponding coefficients in these models. Variables with poor stability are known as uninformative variable and eliminated. The performance of the proposed method is compared with UVE-PLS and conventional PLS for modeling the NIR data sets of tobacco samples. Results show that the proposed method is able to select important wavelengths from the NIR spectra, and makes the prediction more robust and accurate in quantitative analysis. Furthermore, if wavelet compression is combined with the method, more parsimonious and efficient model can be obtained.  相似文献   

18.
It has been recently suggested [N. E. Marotta and L. A. Bottomley, Appl. Spectrosc. 64, 601-606 (2010)] that previously reported surface-enhanced Raman scattering (SERS) spectra of vegetative bacterial cells are due to residual cell growth media that were not properly removed from samples of the lab-cultured microorganism suspensions. SERS spectra of several commonly used cell growth media are similar to those of bacterial cells, as shown here and reported elsewhere. However, a multivariate data analysis approach shows that SERS spectra of different bacterial species grown in the same growth media exhibit different characteristic vibrational spectra, SERS spectra of the same organism grown in different media display the same SERS spectrum, and SERS spectra of growth media do not cluster near the SERS spectra of washed bacteria. Furthermore, a bacterial SERS spectrum grown in a minimal medium, which uses inorganics for a nitrogen source and displays virtually no SERS features, exhibits a characteristic bacterial SERS spectrum. We use multivariate analysis to show how successive water washing and centrifugation cycles remove cell growth media and result in a robust bacterial SERS spectrum in contrast to the previous study attributing bacterial SERS signals to growth media.  相似文献   

19.
YJ Bae  KM Park  MS Kim 《Analytical chemistry》2012,84(16):7107-7111
Matrix-assisted laser desorption ionization of peptides was investigated using α-cyano-4-hydroxycinnamic acid as the matrix. In each experiment, a set of mass spectra was collected by repetitive irradiation of a spot on a sample. Even though shot-to-shot variation in spectral pattern was significant, it was reproducible for different spots and samples. Each spectrum was tagged with the temperature in the early plume (T(early)) estimated through kinetic analysis of the peptide ion survival probability. T(early) decreased as the shot continued because the thermal conduction got more efficient as the sample got thinner. From each spectral set collected under various experimental conditions, a spectrum tagged with a particular T(early) was selected. Then, patterns of the spectra thus selected were the same. The reaction quotient for the matrix-to-peptide proton transfer determined at a specified T(early) was independent of the sample composition, indicating quasi-thermal equilibrium for this reaction. Furthermore, the van't Hoff plots were linear, also indicating quasi-thermal equilibrium. This, together with the thermal kinetics for the fragmentation of peptide and matrix ions, is responsible for the reproducibility of the mass spectral pattern at a specified T(early).  相似文献   

20.
A theoretical method is proposed for accurate reconstruction of the spectrum using bounded sets of discrete values of the spectrum intensities. The method is based on a well known measurement theorem from optics. This method was used to solve the corresponding integral equation to eliminate instrumental distortions and to accurately reconstruct the spectra using the appropriate discrete values. Institute of Applied Physics Problems, Armenian Academy of Sciences Pis’ma Zh. Tekh. Fiz. 24, 5–9 (July 26, 1998)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号