首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Uncorrelated linear discriminant analysis (ULDA)-based heuristic feature selection (ULDA-HFS) method was proposed for sample classification and feature extraction for SELDI-TOF MS ovarian cancer data. The ULDA-HFS method includes 4 steps: (1) noise reduction and normalization; (2) selection of discriminatory bins with CHI2 method; (3) peak detection and alignment for each selected bins; and (4) selection of several peaks as potential biomarkers by means of ULDA. As a result, 7 m/z locations were selected in this study; they were 245.3, 559.4, 565.6, 704.2, 717.2, 2667 and 4074.4. To evaluate the classification impression, PCA, PLS-DA and ULDA were performed for discriminant analysis and ULDA obtained the perfect separation. Finally, the 7 selected potential biomarkers were evaluated by ULDA, both sensitivity and specificity were 100%. The 7 m/z values obtained may provide clues for ovarian cancer biomarker discovery. Once the proteins were identified at these m/z locations, it can be used as specific protein for early detection and diagnosis for ovarian cancer.  相似文献   

2.
A strategy based on Independent Component Analysis (ICA) and Uncorrelated linear discriminant analysis (ULDA) was proposed for proteomic profile analysis and potential biomarker discovery from proteomic mass spectra of cancer and control samples. The method mainly includes 3 steps: (1) ICA decomposition for the mass spectra; (2) selection of discriminatory independent components (ICs) using nonparametric Mann-Whitney U-test; and (3) selection of special peaks (m/z locations) as potential biomarkers by executing of ULDA on a mass spectra data set which was reconstructed with the m/z locations that collected from the selected discriminatory ICs. A colorectal cancer data set and an ovarian cancer data set were analyzed with the proposed method. As results, 9 and 10 m/z locations were selected as potential biomarkers for the colorectal and ovarian cancer data set respectively. The classification results of ULDA using the selected potential biomarkers yielded better results than fisher discriminant analysis (FDA) and principal component analysis (PCA), and could distinguish the disease samples from healthy controls on the independent test sets with 100% of sensitivities and specificities for the colorectal cancer dataset and 100% of sensitivity and 96.77% of specificity for the ovarian cancer dataset.  相似文献   

3.
Lutz U  Lutz RW  Lutz WK 《Analytical chemistry》2006,78(13):4564-4571
Mass spectrometry (MS) is increasingly being used for metabolic profiling, but detection modes such as constant neutral loss or multiple reaction monitoring have not often been reported. These modes allow focusing on structurally related compounds, which could be advantageous for situations in which the trait under investigation is associated with a particular class of metabolites. In this study, we analyzed endogenous glucuronides excreted in human urine by monitoring characteristic transitions of putative steroid glucuronides by LC-MS/MS for discrimination of females from males. Two methods for data extraction were used: (i) a manual procedure based on visual inspection of the chromatograms and selection of 23 peaks and (ii) a software-supported method (MarkerView) set to extract 100 peaks. Data from 10 female and 10 male students were analyzed by principal component analysis (PCA) and partial least-squares discriminant analysis (PLS-DA) using software SIMCA. With PCA, only the manual peak selection resulted in clustering males and females. With PLS-DA, the manual method provided full separation on the basis of one single discriminant; the software-supported approach required a two-component model for complete separation. Loading plots were analyzed for their ability to reveal peaks with high discriminating power, that is, potential biomarkers. The PLS-DA models were validated with urine samples collected from five new females and five new males. Gender was correctly assigned for all. Our results indicate that inclusion of biological criteria for variable selection coupled to class-specific MS analysis and data extraction by appropriate software may constitute a valuable addition to the methods available for metabolomics.  相似文献   

4.
A large metabolomics study was performed on 600 plasma samples taken at four time points before and after a single intake of a high fat test meal by obese and lean subjects. All samples were analyzed by a liquid chromatography-mass spectrometry (LC-MS) lipidomic method for metabolic profiling. A pragmatic approach combining several well-established statistical methods was developed for processing this large data set in order to detect small differences in metabolic profiles in combination with a large biological variation. Such metabolomics studies require a careful analytical and statistical protocol. The strategy included data preprocessing, data analysis, and validation of statistical models. After several data preprocessing steps, partial least-squares discriminant analysis (PLS-DA) was used for finding biomarkers. To validate the found biomarkers statistically, the PLS-DA models were validated by means of a permutation test, biomarker models, and noninformative models. Univariate plots of potential biomarkers were used to obtain insight in up- or downregulation. The strategy proposed proved to be applicable for dealing with large-scale human metabolomics studies.  相似文献   

5.
Principal component analysis (PCA) is the most commonly used dimensionality reduction technique for detecting and diagnosing faults in chemical processes. Although PCA contains certain optimality properties in terms of fault detection, and has been widely applied for fault diagnosis, it is not best suited for fault diagnosis. Discriminant partial least squares (DPLS) has been shown to improve fault diagnosis for small-scale classification problems as compared with PCA. Fisher's discriminant analysis (FDA) has advantages from a theoretical point of view. In this paper, we develop an information criterion that automatically determines the order of the dimensionality reduction for FDA and DPLS, and show that FDA and DPLS are more proficient than PCA for diagnosing faults, both theoretically and by applying these techniques to simulated data collected from the Tennessee Eastman chemical plant simulator.  相似文献   

6.
为了能够快速判别百合是否掺假,利用激发-发射矩阵(EEM)荧光技术对纯百合和掺假百合样品进行了荧光光谱分析,并构建了百合及其掺假百合的荧光指纹特征图谱;然后借助主成分分析-线性判别分析(PCALDA)和偏最小二乘-判别分析(PLS-DA)两种化学模式识别方法,对百合中掺假粉末的种类进行了快速鉴别和分类。实验结果表明:两个分类模型均能根据百合样本的EEM荧光光谱数据准确识别掺假百合样本,且正确分类率均高达95%。利用PCA-LDA和PLS-DA成功建立了快速判别百合掺假的新方法,同时完善了百合荧光指纹特征图谱,有望为建立更全面、更准确地评价百合药材的质量标准体系打下基础。  相似文献   

7.
基于PCA的多变量控制系统的故障监测与诊断   总被引:5,自引:0,他引:5       下载免费PDF全文
陈勇  梁军  陆浩 《工程设计学报》2002,9(5):257-260
 主元分析(PCA)是一种能够对过程生产进行监测和质量控制的有效方法,在保证数据信息丢失最少的情况下,大大降低了原始数据空间的维数.利用PCA分析建模可以消除变量间的非线性关联,降低噪声影响.通过对某食品厂蒸煮设备控制流程进行大量试验表明,PCA 故障诊断模型能够有效地对设备生产进行监测,并能较准确及时地诊断设备运行中发生的故障.  相似文献   

8.
In the field of metabonomics, 1H NMR and full scan mass spectrometry methods have usually been combined with principal component analysis (PCA) and partial least squares discriminant analysis (PLS-DA) to detect patterns in biofluids that correspond to specific effects, usually a toxic site effect of a compound. Confounders together with great interindividual variation complicate such analysis in humans, and therefore, metabonomic data are almost restricted to animals. In our study, a constant neutral loss (CNL) scan on a linear ion trap demonstrated increased sensitivity and specificity compared to a full scan approach and was performed to detect mercapturic acids (MA), a class of effect markers. The method was applied to human volunteers administered 50 and 500 mg of acetaminophen (AAP), a model compound known to form MAs. Using a new algorithm to prepare the CNL data for chemometrics, discrimination of control and postdose samples could be performed using PCA and PLS-DA. The loadings plots clearly revealed AAP-MA as a marker, even at low-dose levels. Orthogonal signal correction (OSC) was carried out to investigate background information that is not due to exposure. Surprisingly, the OSC data provided a classification of male and female subjects showing the performance of the new approach.  相似文献   

9.
基于最大间距MFA的鉴别分析   总被引:1,自引:1,他引:0  
王勇  卢桂馥 《光电工程》2011,38(2):102-107
针对边界Fisher分析(MFA)所面临的小样本问题,本文基于最大间距准则(MMC),提出了一种基于最大间距的边界Fisher分析(MMMFA)算法.该方法利用描述类间数据可分性的相似度矩阵与描述类内数据紧致性的相似度矩阵之差作为鉴别准则,从而避免了MFA鉴别分析所遇到的小样本问题.然后探讨了本文算法与传统的线性降维算...  相似文献   

10.
Dimensionality reduction is an important technique for preprocessing of high-dimensional data. Because only one side of the original data is represented in a low-dimensional subspace, useful information may be lost. In the present study, novel dimensionality reduction methods were developed that are suitable for metabolome data, where observation varies with time. Metabolomics deal with this type of data, which are often obtained in microorganism fermentation processes. However, no dimensionality reduction method that utilizes information from the original data in a positive manner has been reported to date. The ordinary dimensionality reduction methods of principal component analysis (PCA), partial least squares (PLS), orthonormalized PLS (OPLS), and regularized Fisher discriminant analysis (RFDA) were extended by introducing differential penalties to the latent variables in each class. A nonlinear extension of this approach, using kernel methods, was also proposed in the form of kernel-smoothed PCA, PLS, OPLS, and FDA. Since all of these methods are formulated as generalized eigenvalue problems, the solutions can be computed easily. These methods were then applied to intracellular metabolite data of a xylose-fermenting yeast in ethanol fermentation. Visualization in the low-dimensional subspace suggests that smoothed PCA successfully preserves the information about the time course of observations during fermentation, and that RFDA can produce high separation among different strains.  相似文献   

11.
Tissue engineering approaches fabricate and subsequently implant cell-seeded and unseeded scaffold biomaterials. Once in the body, these biomaterials are repopulated with somatic cells of various phenotypes whose identification upon explantation can be expensive and time-consuming. We show that imaging time-of-flight secondary ion mass spectrometry (TOF-SIMS) can be used to distinguish mammalian cell types in heterogeneous cultures. Primary rat esophageal epithelial cells (REEC) were cultured with NIH 3T3 mouse fibroblasts on tissue culture polystyrene and freeze-dried before TOF-SIMS imaging. Results show that a short etching sequence with C(60)(+) ions can be used to clean the sample surface and improve the TOF-SIMS image quality. Principal component analysis (PCA) and partial least-squares discriminant analysis (PLS-DA) were used to identify peaks whose contributions to the total variance in the multivariate model were due to either the two cell types or the substrate. Using PLS-DA, unknown regions of cellularity that were otherwise unidentifiable by SIMS could be classified. From the loadings in the PLS-DA model, peaks were selected that were indicative of the two cell types and TOF-SIMS images were created and overlaid that showed the ability of this method to distinguish features visually.  相似文献   

12.
In metabolomics research a large number of metabolites are measured that reflect the cellular state under the experimental conditions studied. In many occasions the experiments are performed according to an experimental design to make sure that sufficient variation is induced in the metabolite concentrations. However, as metabolomics is a holistic approach, also a large number of metabolites are measured in which no variation is induced by the experimental design. The presence of such non-induced metabolites hampers traditional data analysis methods as PCA to estimate the true model of the induced variation. The greediness of PCA leads to a clear overfit of the metabolomics data and can lead to a bad selection of important metabolites. In this paper we explore how, why and how severe PCA overfits data with an underlying experimental design. Recently new data analysis methods have been introduced that can use prior information of the system to reduce the overfit. We show that incorporation of prior knowledge of the system under investigation leads to a better estimation of the true underlying structure and to less overfit. The experimental design information together with ASCA is used to improve the analysis of metabolomics data. To show the improved model estimation property of ASCA a thorough simulation study is used and the results are extended to a microbial metabolomics batch fermentation study. The ASCA model is much less affected by the non-induced variation and measurement error than PCA, leading to a much better model of the induced variation.  相似文献   

13.
统计不相关最佳鉴别矢量集的本质研究   总被引:6,自引:0,他引:6  
对统计不相关最佳鉴别矢量集的本质进行研究,在基于总体散布矩阵特征分解的基础上,构造了一种白化变换,使得变换后的样本空间中的总体散布矩阵为单位矩阵,这样使得传统的最佳鉴别矢量集算法得到的均是具有统计不相关的最佳鉴别矢量集,从而揭示了统计不相关最佳鉴别变换的本质——白化变换加普通的线性鉴别变换。该方法的最大优点在于所获得的最优鉴别矢量同时具有正交性和统计不相关性。该方法对代数特征抽取具有普遍适用性。用ORL人脸数据库的数值实验,验证了该方法的有效性。  相似文献   

14.
Multivariate analysis has become increasingly common in the analysis of multidimensional spectral data. We previously showed that the multivariate analysis technique principal component analysis (PCA) is an excellent method for interpreting the static time-of-flight secondary ion mass spectrometry (TOF-SIMS) spectra of adsorbed protein films. PCA is an unsupervised pattern recognition technique that loses resolution between spectra of different proteins as more proteins are added to the data set due to large within-group variation. The supervised pattern recognition techniques discriminant principal component analysis (DPCA) and linear discriminant analysis (LDA), which aim to control within-group variation while maximizing between-group separation to enhance discrimination between groups, were compared with PCA using data sets of TOF-SIMS spectra of proteins adsorbed onto mica and PTFE substrates. DPCA and LDA quantitatively improved discrimination between groups and provided different information about the data than PCA. LDA was able to classify unknown samples with a misclassification rate lower than PCA or DPCA. Both unsupervised and supervised pattern recognition techniques are useful for the interpretation and classification of static TOF-SIMS spectra of adsorbed protein films.  相似文献   

15.
Comparisons of prediction models from the new augmented classical least squares (ACLS) and partial least squares (PLS) multivariate spectral analysis methods were conducted using simulated data containing deviations from the idealized model. The simulated data were based on pure spectral components derived from real near-infrared spectra of multicomponent dilute aqueous solutions. Simulated uncorrelated concentration errors, uncorrelated and correlated spectral noise, and nonlinear spectral responses were included to evaluate the methods on situations representative of experimental data. The statistical significance of differences in prediction ability was evaluated using the Wilcoxon signed rank test. The prediction differences were found to be dependent on the type of noise added, the numbers of calibration samples, and the component being predicted. For analyses applied to simulated spectra with noise-free nonlinear response, PLS was shown to be statistically superior to ACLS for most of the cases. With added uncorrelated spectral noise, both methods performed comparably. Using 50 calibration samples with simulated correlated spectral noise, PLS showed an advantage in 3 out of 9 cases, but the advantage dropped to 1 out of 9 cases with 25 calibration samples. For cases with different noise distributions between calibration and validation, ACLS predictions were statistically better than PLS for two of the four components. Also, when experimentally derived correlated spectral error was added, ACLS gave better predictions that were statistically significant in 15 out of 24 cases simulated. On data sets with nonuniform noise, neither method was statistically better, although ACLS usually had smaller standard errors of prediction (SEPs). The varying results emphasize the need to use realistic simulations when making comparisons between various multivariate calibration methods. Even when the differences between the standard error of predictions were statistically significant, in most cases the differences in SEP were small. This study demonstrated that unlike CLS, ACLS is competitive with PLS in modeling nonlinearities in spectra without knowledge of all the component concentrations. This competitiveness is important when maintaining and transferring models for system drift, spectrometer differences, and unmodeled components, since ACLS models can be rapidly updated during prediction when used in conjunction with the prediction augmented classical least squares (PACLS) method, while PLS requires full recalibration.  相似文献   

16.
《技术计量学》2013,55(4):392-403
Principal components analysis (PCA) is often used in the analysis of multivariate process data to identify important combinations of the original variables on which to focus for more detailed study. However, PCA and other related projection techniques from the standard multivariate repertoire are not explicitly designed to address or to exploit the strong autocorrelation and temporal cross-correlation structures that are often present in multivariate process data. Here we propose two alternative projection techniques that do focus on the temporal structure in such data and that therefore produce components that may have some analytical advantages over those resulting from more conventional multivariate methods. As in PCA, both of our suggested methods linearly transform the original p-variate time series into uncorrelated components; however, unlike PCA, they concentrate on deriving components with particular temporal correlation properties, rather than those with maximal variance. The first technique finds components that exhibit distinctly different autocorrelation structures via modification of a signal-noise decomposition method used in image analysis. The second method draws on ideas from common PCA to produce components that are not only uncorrelated as in PCA, but that also have approximately zero temporally lagged cross-correlations for all time lags. We present the technical details for these two methods, assess their performance through simulation studies, and illustrate their use on multivariate output measures from a fluidized catalytic cracking unit used in petrochemical production, contrasting the results obtained with those from standard PCA.  相似文献   

17.
Data normalization plays a crucial role in metabolomics to take into account the inevitable variation in sample concentration and the efficiency of sample preparation procedure. The conventional methods such as constant sum normalization (CSN) and probabilistic quotient normalization (PQN) are widely used, but both methods have their own shortcomings. In the current study, a new data normalization method called group aggregating normalization (GAN) is proposed, by which the samples were normalized so that they aggregate close to their group centers in a principal component analysis (PCA) subspace. This is in contrast with CSN and PQN which rely on a constant reference for all samples. The evaluation of GAN method using both simulated and experimental metabolomic data demonstrated that GAN produces more robust model in the subsequent multivariate data analysis, more superior than both CSN and PQN methods. The current study also demonstrated that some of the differential metabolites identified using the CSN or PQN method could be false positives due to improper data normalization.  相似文献   

18.
Soil has been utilized in criminal investigations for some time because of its prevalence and transferability. It is usually the physical characteristics that are studied; however, the research carried out here aims to make use of the chemical profile of soil samples. The research we are presenting in this work used sieved (2 mm) soil samples taken from the top soil layer (about 10 cm) that were then analyzed using mid-infrared spectroscopy. The spectra obtained were pretreated and then input into two chemometric classification tools: nonlinear iterative partial least squares followed by linear discriminant analysis (NIPALS-LDA) and partial least squares discriminant analysis (PLS-DA). The models produced show that it is possible to discriminate between soil samples from different land use types and both approaches are comparable in performance. NIPALS-LDA performs much better than PLS-DA in classifying samples to location.  相似文献   

19.
Sufficient dimension reduction (SDR) methods are popular model-free tools for preprocessing and data visualization in regression problems where the number of variables is large. Unfortunately, reduce-and-classify approaches in discriminant analysis usually cannot guarantee improvement in classification accuracy, mainly due to the different nature of the two stages. On the other hand, envelope methods construct targeted dimension reduction subspaces that achieve dimension reduction and improve parameter estimation efficiency at the same time. However, little is known about how to construct envelopes in discriminant analysis models. In this article, we introduce the notion of the envelope discriminant subspace (ENDS) as a natural inferential and estimative object in discriminant analysis that incorporates these considerations. We develop the ENDS estimators that simultaneously achieve sufficient dimension reduction and classification. Consistency and asymptotic normality of the ENDS estimators are established, where we carefully examine the asymptotic efficiency gain under the classical linear and quadratic discriminant analysis models. Simulations and real data examples show superb performance of the proposed method. Supplementary materials for this article are available online.  相似文献   

20.
孟庆龙  冯树南  谭涛  满婷  尚静 《包装工程》2022,43(15):114-119
目的 探究猕猴桃挤压损伤较优的快速无损判别方法。方法 利用高光谱成像系统获得所有猕猴桃的高光谱图像,并提取猕猴桃损伤区域以及完好无损区域的光谱反射率;运用多元散射校正方法对原始反射光谱进行预处理,并运用主成分分析对光谱数据降维;比较并分析Fisher判别分析方法以及简化的K最近邻(Simplified K Nearest Neighbor,SKNN)模式识别方法对猕猴桃挤压损伤的判别效果。结果 在710~850 nm和960~1 030 nm这2个波段内,猕猴桃损伤区域的平均光谱反射率与完好无损区域的平均光谱反射率存在较明显差异;采用主成分分析从256个全波段中筛选了前5个主成分作为新变量,识别模型的检测效率得到了提升;构建的SKNN和Fisher模型对预测集中样本的正确识别率均为93.3%,从SKNN识别模型的混淆矩阵中得出,预测集中仅有2个样本出现误判,并且SKNN模型对校正集中样本的正确识别率高于Fisher模型。结论 在判别猕猴桃挤压损伤时,SKNN识别模型具有相对较好的判别效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号