首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Prediction of sample properties using spectroscopic data with multivariate calibration is often enhanced by wavelength selection. This paper reports on a built-in wavelength selection method in which the estimated regression vector contains zero to near-zero coefficients for undesirable wavelengths. The method is based on Tikhonov regularization with the model 1-norm (TR1) and is applied to simulated and near-infrared (NIR) spectral data. Models are also formed from wavelength subsets determined by the standard method of stepwise regression (SWR). Harmonious (bias/variance tradeoff) and parsimonious considerations are compared with and without wavelength selection for principal component regression (PCR), ridge regression (RR), partial least squares (PLS), and multiple linear regression (MLR). Results show that TR1 models generally contain large baseline regions of near-zero coefficients, thereby essentially achieving built-in wavelength selection. For example, wavelengths with spectral interferences and/or poor signal-to-noise ratios obtain near zero regression coefficients. Results often improve with TR1 models, compared to full wavelength PCR, RR, and PLS models. The SWR subset results are similar to those for the TR1 models using the NIR data and worse with the simulated spectral situations. In general, wavelength selection improves prediction accuracy at a sacrifice to a potential increase in variance and the parsimony remains nearly equivalent compared to full wavelength models. New insights gained from the reported studies provide useful guidelines on when to use full wavelengths or use wavelength selection methods. Specifically, when a small number of large wavelength effects (good sensitivity and selectivity) exist, subset selection by SWR (with caution) and TR1 do well. With a small to moderate number of large to moderate sized wavelength effects, TR1 is better. Lastly, when a large number of small effects are present, full wavelengths with the methods of PCR, RR, or PLS are best.  相似文献   

2.
Estimates of model parameters (regression coefficients forming the regression vector) for a multivariate linear model have been the subject of considerable discussion. Regression diagnostics utilized in chemometrics for a multivariate linear model are often based on a single number such as the coefficient of determination, root mean square error of cross-validation, selectivity, etc. Additionally, regression diagnostics commonly applied focus on model bias and do not include variance or model complexity. This paper demonstrates that substantial information is available through a graphical study of trends in model parameters as determined by plots of regression diagnostics using bias, variance, and/or model complexity measures. Also illustrated is that by using harmonious graphics which simultaneously use bias and variance information, determination of proper model parameters without cross-validation is possible. This paper concludes with comments on the next level of regression diagnostics, including use of color, sound, and virtual reality.  相似文献   

3.
In the prediction of active substance content in pharmaceutical tablets and moisture in wheat, a very large number of wavelengths were used. Hence, a method to identify a limited number of wavelengths was developed. We introduce a novel approach that uses the discrete cosine transform (DCT) for this purpose. The data was obtained using near infrared spectrometer. From the DCT coefficients, a limited number was chosen as predictor variables to be used in partial least square (PLS) regression. Likewise, a limited number of DFT coefficients were also used in the PLS regression. The performance of combining the DCT with PLS was compared with that of the PLS model using the full spectral data and with the discrete Fourier transform (DFT). The results showed that the PLS model using DCT coefficients produced lower root mean square error than using the full NIR spectral data with the PLS and also the DFT.  相似文献   

4.
The focus of this study was to evaluate the applicability of chemometrics to differential scanning calorimetry data (DSC) to evaluate nimodipine polymorphs. Multivariate calibration models were built using DSC data from known mixtures of the nimodipine modification. The linear baseline correction treatment of data was used to reduce dispersion in thermograms. Principal component analysis of the treated and untreated data explained 96% and 89% of the data variability, respectively. Score and loading plots correlated variability between samples with change in proportion of nimodipine modifications. The R2 for principal component regression (PCR) and partial lease square regression (PLS) were found to be 0.91 and 0.92. The root mean square of standard error of the treated samples for calibration and validation in PCR and PLS was found to be lower than the untreated sample. These models were applied to samples recrystallized from a cosolvent system, which indicated different proportion of modifications in the mixtures than those obtained by placing samples under different storage conditions. The model was able to predict the nimodipine modifications with known margin of error. Therefore, these models can be used as a quality control tool to expediently determine the nimodipine modification in an unknown mixture.  相似文献   

5.
Two new approaches to multivariate calibration are described that, for the first time, allow information on measurement uncertainties to be included in the calibration process in a statistically meaningful way. The new methods, referred to as maximum likelihood principal components regression (MLPCR) and maximum likelihood latent root regression (MLLRR), are based on principles of maximum likelihood parameter estimation. MLPCR and MLLRR are generalizations of principal components regression (PCR), which has been widely used in chemistry, and latent root regression (LRR), which has been virtually ignored in this field. Both of the new methods are based on decomposition of the calibration data matrix by maximum likelihood principal component analysis (MLPCA), which has been recently described (Wentzell, P. D.; et al. J. Chemom., in press). By using estimates of the measurement error variance, MLPCR and MLLRR are able to extract the optimum amount of information from each measurement and, thereby, exhibit superior performance over conventional multivariate calibration methods such as PCR and partial least-squares regression (PLS) when there is a nonuniform error structure. The new techniques reduce to PCR and LRR when assumptions of uniform noise are valid. Comparisons of MLPCR, MLLRR, PCR, and PLS are carried out using simulated and experimental data sets consisting of three-component mixtures. In all cases of nonuniform errors examined, the predictive ability of the maximum likelihood methods is superior to that of PCR and PLS, with PLS performing somewhat better than PCR. MLLRR generally performed better than MLPCR, but in most cases the improvement was marginal. The differences between PCR and MLPCR are elucidated by examining the multivariate sensitivity of the two methods.  相似文献   

6.
Visible and near-infrared (Vis-NIR, 350-2500 nm) diffuse reflection spectroscopy (DRS) models built from "as-collected" samples of solid cattle manure accurately predict concentrations of moisture and crude ash. Because different organic molecules emit different spectral signatures, variations in livestock diet composition may affect the predictive accuracy of these models. This study investigates how differences in livestock diet composition affect Vis-NIR DRS prediction of moisture and crude ash. Spectral signatures of solid manure samples (n = 216) from eighteen groups of cattle on six different diets were used to calibrate and validate partial least squares (PLS) regression models. Seven groups of PLS models were created and validated. In the first group, two-thirds of all samples were randomly selected as the calibration set and the remaining one-third were used for the validation set. In the remaining six groups, samples were grouped by livestock diet (ration). Each ration in turn was held out of calibrations and then used as a validation set. When predicting crude ash, the fully random calibration model produced a root mean square deviation (RMSD) of 2.5% on a dry basis (db), ratio of standard error of prediction to the root mean squared deviation (RPD) of 3.1, bias of 0.14% (db), and correlation coefficient r(2) of 0.90., When predicting moisture, an RMSD of 1.5% on a wet basis (wb), RPD of 4.3, bias of -0.09% (wb), and r(2) of 0.95 was achieved. Model accuracy and precision were not impaired by exclusion of any single ration from model calibration.  相似文献   

7.
This paper reports on the transfer of calibration models between Fourier transform near-infrared (FT-NIR) instruments from four different manufacturers. The piecewise direct standardization (PDS) method is compared with the new hybrid calibration method known as prediction augmented classical least squares/partial least squares (PACLS/PLS). The success of a calibration transfer experiment is judged by prediction error and by the number of samples that are flagged as outliers that would not have been flagged as such if a complete recalibration were performed. Prediction results must be acceptable and the outlier diagnostics capabilities must be preserved for the transfer to be deemed successful. Previous studies have measured the success of a calibration transfer method by comparing only the prediction performance (e.g., the root mean square error of prediction, RMSEP). However, our study emphasizes the need to consider outlier detection performance as well. As our study illustrates, the RMSEP values for a calibration transfer can be within acceptable range; however, statistical analysis of the spectral residuals can show that differences in outlier performance can vary significantly between competing transfer methods. There was no statistically significant difference in the prediction error between the PDS and PACLS/PLS methods when the same subset sample selection method was used for both methods. However, the PACLS/PLS method was better at preserving the outlier detection capabilities and therefore was judged to have performed better than the PDS algorithm when transferring calibrations with the use of a subset of samples to define the transfer function. The method of sample subset selection was found to make a significant difference in the calibration transfer results using the PDS algorithm, while the transfer results were less sensitive to subset selection when the PACLS/PLS method was used.  相似文献   

8.
人血清中血糖的近红外光谱快速检测   总被引:2,自引:1,他引:2  
应用傅利叶变换近红外光谱透射技术结合偏最小二乘法 ( PLS) ,快速定量分析了人血清中血糖含量 .利用内部交叉验证和自动优化功能对预测模型进行了优化 ,确定了最优建模参数 .模型对人血清中葡萄糖定标样品集的实测含量与预测含量的相关系数 r=0 .91 48,内部校正均方差 RMSECV=0 .487mmol/L.  相似文献   

9.
The present work aims to analyze the feasibility of different analytical measurement procedures for Madeira wine ageing prediction. In order to properly identify and quantify the chemical compounds qualified for characterizing wine evolution during the ageing period, chromatographic and spectroscopic analyses were carried out. Twenty-six samples, representative of ten harvest years and covering an ageing period of 20 years, were analyzed in terms of their volatile and phenolic composition, as well as characterized in terms of absorbance measurements in the UV and Visible region. Then, multivariate prediction models were established by applying PLS regression to each chemical data set, after which they were compared in terms of their ageing prediction ability. The optimum number of PLS dimensions to consider in each estimated model was obtained based on the minimization of the root mean squared error of Monte Carlo validation. With such estimated models, the prediction interval estimates based on the bootstrap percentile approach were also computed for the available samples, in order to test model's prediction ability, once each sample is successively removed from the data set.Our analysis shows that Madeira wine age, produced from a known grape variety, can be predicted with good accuracy from its volatile and phenolic composition, as well as from UV-vis absorbance measurements. The PLS models estimated are able to predict wine age with a root mean square error of 0.9, 1.1, and 1.4 years, respectively. The sample-specific prediction intervals computed also allowed for the analysis of differences between observed and predicted values, and confirmed the interesting wine age prediction abilities of the proposed methodologies. A compromise between model accuracy and cost of analysis can be established in order to decide which methodology to use, according to the particular application scenario, as the more time-consuming and complex techniques (GC-MS and HPLC-DAD) are also those leading to more accurate results, but UV-vis also enabled us to come up with acceptable age predictions.  相似文献   

10.
One of the most pressing concerns for the consumer market is the detection of adulteration in meat products due to their preciousness. The rapid and accurate identification mechanism for lard adulteration in meat products is highly necessary, for developing a mechanism trusted by consumers and that can be used to make a definitive diagnosis. Fourier Transform Infrared Spectroscopy (FTIR) is used in this work to identify lard adulteration in cow, lamb, and chicken samples. A simplified extraction method was implied to obtain the lipids from pure and adulterated meat. Adulterated samples were obtained by mixing lard with chicken, lamb, and beef with different concentrations (10%–50% v/v). Principal component analysis (PCA) and partial least square (PLS) were used to develop a calibration model at 800–3500 cm−1. Three-dimension PCA was successfully used by dividing the spectrum in three regions to classify lard meat adulteration in chicken, lamb, and beef samples. The corresponding FTIR peaks for the lard have been observed at 1159.6, 1743.4, 2853.1, and 2922.5 cm−1, which differentiate chicken, lamb, and beef samples. The wavenumbers offer the highest determination coefficient R2 value of 0.846 and lowest root mean square error of calibration (RMSEC) and root mean square error prediction (RMSEP) with an accuracy of 84.6%. Even the tiniest fat adulteration up to 10% can be reliably discovered using this methodology.  相似文献   

11.
Spectro-fluorescence signature (SFS) of water samples contains information that may be used to quantify dissolved organic carbon (DOC) if combined with multivariate analyses. A model was built through SFS and partial least squared (PLS) regression. The SFSs of 219 samples of natural water along the Raritan River and Millstone River watersheds located in central New Jersey, and their corresponding DOC concentrations were used to build the model. Calibration, full cross-validation, and prediction performances of various models were statistically compared before optimal model selection. The final selected model, tested on the Passaic River watershed in northern New Jersey, provided a bias of 0.028 mg/l and a root mean squared error of prediction (RMSEP) of 0.35 mg/l. Linked to PLS, SFS can be a quality and cost effective method to perform on-line rapid DOC measurements.  相似文献   

12.
A new method for the determination of the percentage of homopolymer component, using high-temperature cell Fourier transform infrared (FT-IR) by partial least squares (PLS) quantitative analysis technique, was developed and applied to Ziegler Natta linear low-density polyethylene (LLDPE). The method is based on the IR spectrum changes between the 730 cm(-1) band and 720 cm(-1) band at the temperature of 110 degrees C, which is near the melting point of the polyethylene. The HD % (the percentage of high-density component, i.e., the percentage of homopolymer component) results obtained by CTREF (CRYSTAF in TREF mode) technique are used as the input data together with the respective FT-IR spectra for PLS analyses to establish a calibration curve. The PLS quality is characterized by a correlation coefficient of 0.997 (cross-validation) using four factors and a root mean square error of calibration (RMSEC) of 0.772. The HD% of the unknown can then be predicted by the PLS software from the unknown FT-IR spectrum. A control resin was tested seven times by CTREF and FT-IR. The HD% of the control resin was 28.59+/-0.88% by CTREF and 29.05+/-2.37% by FT-IR. It was found that the method was applicable for the same comonomer type of LLDPE within a melt index range and density.  相似文献   

13.
Common methods of building linear calibration models are principal component regression (PCR), partial least squares (PLS), and least squares (LS). Recently, the method of cyclic subspace regression (CSR) has been presented and shown to provide PCR, PLS, LS and other related intermediate regressions with one algorithm. When forming a linear model with spectral data for quantitative analysis, prediction results can be adversely affected by responses that do not conform well to the linear model proposed. Wavelength selection can be used to eliminate wavelengths where such problem responses occur. It has recently been reported that CSR regression vectors can be formed by summing weighted eigenvectors where weights are determined from the hat matrix, singular values, and eigenvectors characterizing the sample space. Investigation of these weights shows that wavelength selection based on loading vectors can be misleading. Specifically, by using CSR it is shown that a small weight for an eigenvector can annihilate a large peak in a loading vector. In this study, correlograms are used with CSR regression vectors and eigenvector weights as wavelength-selection criteria. It is demonstrated that even though a model generated by LS for a wavelength subset produces substantially reduced prediction errors relative to PCR and PLS, CSR weight plots show that the LS model overfits and should not be used. Simulated situations containing spectral regions with excess noise or nonlinear responses are examined to study the effectiveness of wavelength selection based on the previously listed criteria. Near infrared spectra of gasoline samples with several known properties are also studied.  相似文献   

14.
光谱预处理对棉涤混纺面料近红外定量模型的影响   总被引:1,自引:0,他引:1  
以46个棉涤混纺面料样品为研究对象,采集样品的近红外漫反射光谱,光谱范围为12 000~4 000 cm-1,利用偏最小二乘法建立定量校正模型,并用交叉检验法对模型进行检验,以交叉验证均方差RMSECV和决定系数R2作为判断模型优劣的标准.对利用无光谱预处理、一阶导数法、二阶导数法、多元散射校正和矢量归一化五种不同预处理方法所建的模型进行了比较,发现对光谱进行矢量归一化预处理所建模型最优;此外还分析了建立纺织布料的近红外光谱定量分析模型时主要的误差来源及近红外光谱分析技术用于纺织面料定量分析的可行性.  相似文献   

15.
Due to their heterogeneous structure and variability in form, individual corn (Zea mays L.) kernels present an optical challenge for nondestructive spectroscopic determination of their chemical composition. Increasing demand in agricultural science for knowledge of specific traits in kernels is driving the need to find high-throughput methods of examination. In this study macroscopic near-infrared (NIR) reflectance hyperspectral imaging was used to measure small sets of kernels in the spectroscopic range of 950 nm to 1700 nm. Image analysis and principal component analysis (PCA) were used to determine kernel germ from endosperm regions as well as to define individual kernels as objects out of sets of kernels. Partial least squares (PLS) analysis was used to predict oil or oleic acid concentrations derived from germ or full kernel spectra. The relative precision of the minimum cross-validated root mean square error (RMSECV) and root mean square error of prediction (RMSEP) for oil and oleic acid concentration were compared for two sets of two hundred kernels. An optimal statistical prediction method was determined using a limited set of wavelengths selected by a genetic algorithm. Given these parameters, oil content was predicted with an RMSEP of 0.7% and oleic acid content with an RMSEP of 14% for a given corn kernel.  相似文献   

16.
This paper discusses a methodology for selecting the minimum number of calibration samples in principal component regression (PCR) analysis. The method uses only the instrumental responses of a large set of samples to select the optimal subset, which is then submitted to chemical analysis and calibration. The subset is selected to provide a low variance of the regression coefficients. The methodology has been applied to UV-visible spectroscopy data to determine Ca(2+) in water and near-IR spectroscopy data to determine moisture in corn. In both cases, the regression models developed with a reduced number of samples provided accurate results. As far as precision is concerned, a similar root-mean-squared error of cross-validation (RMSECV) is found when comparing the new methodology with the results of the regression models that use the complete set of calibration samples and PCR. The number of analyzed samples in the calibration set can be reduced by up to 50%, which represents a considerable reduction in costs.  相似文献   

17.
近红外光谱法测定茶多酚中总儿茶素含量   总被引:21,自引:7,他引:21  
以高效液相色谱(HPLC)分析结果为参考值,建立了快速测量茶多酚中总儿茶素含量的近红外光谱定标模型.将48份茶多酚样品组成定标样品集,在1000~2500nm(4000~10000cm-1)的近红外漫反射光谱为定标波长范围内,光谱经一阶导数(Firstderivative)、二阶导数(Secondderivative)、标准归一化(Stan-dardnormalvariate,SNV)和多元散射校正(multiplicativesignalcorrection,MSC)处理后结合偏最小二乘回归(PLS)定标.经内部交叉验证表明,光谱经SNV处理后建模结果最佳.模型的相关系数Corr.Coeff=0.997,校正均方根RMSEC=1.71%.比较了经典最小二乘法(CLS)、偏最小二乘法(PLS)和主成分回归(PCR)等方法建模结果,以偏最小二乘回归建模效果最好.  相似文献   

18.
Alternate bearing is a well-marked yield variability phenomenon that occurs in almost all tree-fruit crops. The potential benefits of applying various alternate bearing control measures on alternate bearing crops can only be realized when yield information on individual trees of particular crops is obtained. The objective of this study was to examine the potential of airborne hyperspectral imagery to estimate the fruit yield in citrus. Hyperspectral images in 72 visible and near-infrared (NIR) wavelengths (from 407 to 898 nm) were acquired over a citrus orchard in Japan by an Airborne Imaging Spectrometer for Applications (AISA) Eagle system. The canopy features of individual trees were identified using pixel-based average spectral reflectance values at various wavelengths from the acquired images, which were then used to develop yield prediction models. Yield prediction models were developed using five different techniques — (i) several vegetation indices (VIs), (ii) key wavelengths determined by simple correlation analysis (SCA), (iii) principal components (PCs) based on principal component regression (PCR), and (iv) PLS factors as well as (v) important wavelengths determined by B-matrix based on partial least squares (PLS) regression. The results indicated that the VIs used in this study were poorly correlated with fruit yield on individual trees. The key or important wavelengths determined by the two methods proposed in this study could provide reasonable prediction of fruit yield. Comparatively, the B-matrix method based on the PLS regression was superior to the simple correlation analysis in determining the key or importance wavelengths that are correlated to the fruit yield. However, the PCs extracted from the hyperspectral data were weak predictors of citrus yield. Greater prediction accuracy was obtained with the model based on PLS factors than with the models based on the key or important wavelengths. These results confirmed the hypothesized correlation between canopy features and citrus yield. The methods proposed in this study have considerable promise in estimating fruit yield on individual citrus trees. The yield information is valuable for planning harvest schedules and developing programs for application of tree-specific alternate bearing control measures and other management practices.  相似文献   

19.
The transfer of a multivariate calibration model for quantitative determination of diethylene glycol (DEG) contaminant in pharmaceutical-grade glycerin between five portable Raman spectrometers was accomplished using piecewise direct standardization (PDS). The calibration set was developed using a multi-range ternary mixture design with successively reduced impurity concentration ranges. It was found that optimal selection of calibration transfer standards using the Kennard-Stone algorithm also required application of the algorithm to multiple successively reduced impurity concentration ranges. Partial least squares (PLS) calibration models were developed using the calibration set measured independently on each of the five spectrometers. The performance of the models was evaluated based on the root mean square error of prediction (RMSEP), calculated using independent validation samples. An F-test showed that no statistical differences in the variances were observed between models developed on different instruments. Direct cross-instrument prediction without standardization was performed between a single primary instrument and each of the four secondary instruments to evaluate the robustness of the primary instrument calibration model. Significant increases in the RMSEP values for the secondary instruments were observed due to instrument variability. Application of piecewise direct standardization using the optimal calibration transfer subset resulted in the lowest values of RMSEP for the secondary instruments. Using the optimal calibration transfer subset, an optimized calibration model was developed using a subset of the original calibration set, resulting in a DEG detection limit of 0.32% across all five instruments.  相似文献   

20.
Point estimation for the scale and location parameters of the extreme-value (Type I) distribution by linear functions of order statistics from Type II progressively censored samples is investigated. Four types of linear estimators are considered: the best linear unbiased (BLU), an approximation to the BLU, unweighted regression, and a linearized maximum likelihood. Linear transformations of the estimators are also considered for reducing mean square errors. Exact bias, variance, and mean square error comparisons of the estimators are made for several censoring patterns. Since the natural logarithms of Weibull variates have extreme-value distributions, the investigation is applicable to estimation for Weibull distributions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号