首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Cyclic subspace regression (CSR) is a new approach to the complex multivariate calibration problem. The simple algorithm produces solutions for principal component regression (PCR), partial least squares (PLS), least squares (LS), and other related intermediate regressions. This paper describes further analysis of CSR and shows that by using hat matrices, CSR regression vectors are formed from a summation of weighted eigenvectors where weights are determined from the hat matrix, singular values, and sample space eigenvectors. Examination of CSR weights for PCR and PLS further documents differences and similarities and provides information to assist in determining prediction rank for PCR and PLS. By redefining CSR in terms of weighted eigenvectors, it can be shown when PLS and PCR produce essentially the same results where minor differences stem from overfitting by PLS. Additionally, weights derived from the hat matrix show when PCR and PLS generate different results and why. Equations are shown for the sample space that reveal PLS to be a method based on oblique projections while PCR uses orthogonal projections. The optimal intermediate CSR model can be identified as well. A near infrared data set is studied and illustrates principles involved.  相似文献   

2.
Prediction of sample properties using spectroscopic data with multivariate calibration is often enhanced by wavelength selection. This paper reports on a built-in wavelength selection method in which the estimated regression vector contains zero to near-zero coefficients for undesirable wavelengths. The method is based on Tikhonov regularization with the model 1-norm (TR1) and is applied to simulated and near-infrared (NIR) spectral data. Models are also formed from wavelength subsets determined by the standard method of stepwise regression (SWR). Harmonious (bias/variance tradeoff) and parsimonious considerations are compared with and without wavelength selection for principal component regression (PCR), ridge regression (RR), partial least squares (PLS), and multiple linear regression (MLR). Results show that TR1 models generally contain large baseline regions of near-zero coefficients, thereby essentially achieving built-in wavelength selection. For example, wavelengths with spectral interferences and/or poor signal-to-noise ratios obtain near zero regression coefficients. Results often improve with TR1 models, compared to full wavelength PCR, RR, and PLS models. The SWR subset results are similar to those for the TR1 models using the NIR data and worse with the simulated spectral situations. In general, wavelength selection improves prediction accuracy at a sacrifice to a potential increase in variance and the parsimony remains nearly equivalent compared to full wavelength models. New insights gained from the reported studies provide useful guidelines on when to use full wavelengths or use wavelength selection methods. Specifically, when a small number of large wavelength effects (good sensitivity and selectivity) exist, subset selection by SWR (with caution) and TR1 do well. With a small to moderate number of large to moderate sized wavelength effects, TR1 is better. Lastly, when a large number of small effects are present, full wavelengths with the methods of PCR, RR, or PLS are best.  相似文献   

3.
The need for automated quality surveillance of liquid hydrocarbon fuels has driven the development of rapid fuel property modeling from spectroscopic sensor data. The correlation of near-infrared (NIR) and Raman spectroscopic data with jet and diesel fuel properties can be improved by the deliberate selection of continuous wavelength sub-ranges. An automatic wavelength selection strategy would allow for the unsupervised construction of partial least squares (PLS) regression models of increased predictive utility when supervised model construction and maintenance is not feasible. Changeable size moving window partial least squares (CSMWPLS) is one of the most thorough operations suited for this task. Unfortunately, the necessarily large number of PLS model constructions required by an automated version of this procedure limits the evaluation of the predictive ability of the resulting models through full cross-validation results. Presented here is a novel restricted version of the CSMWPLS algorithm in which the initial spectral range selection is accomplished through multiple interval PLS (iPLS) analyses, where analysis windows for the refinement step no longer move, and size changes are limited to a series of symmetric attenuations. It is shown that the proposed algorithm can provide significant PLS model improvements during the course of a fully automated analysis of jet and diesel fuel spectra in less time than an automated CSMWPLS algorithm.  相似文献   

4.
The limits of quantitative multivariate assays for the analysis of extra virgin olive oil samples from various Greek sites adulterated by sunflower oil have been evaluated based on their Fourier transform (FT) Raman spectra. Different strategies for wavelength selection were tested for calculating optimal partial least squares (PLS) models. Compared to the full spectrum methods previously applied, the optimum standard error of prediction (SEP) for the sunflower oil concentrations in spiked olive oil samples could be significantly reduced. One efficient approach (PMMS, pair-wise minima and maxima selection) used a special variable selection strategy based on a pair-wise consideration of significant respective minima and maxima of PLS regression vectors, calculated for broad spectral intervals and a low number of PLS factors. PMMS provided robust calibration models with a small number of variables. On the other hand, the Tabu search strategy recently published (search process guided by restrictions leading to Tabu list) achieved lower SEP values but at the cost of extensive computing time when searching for a global minimum and less robust calibration models. Robustness was tested by using packages of ten and twenty randomly selected samples within cross-validation for calculating independent prediction values. The best SEP values for a one year's harvest with a total number of 66 Cretian samples were obtained by such spectral variable optimized PLS calibration models using leave-20-out cross-validation (values between 0.5 and 0.7% by weight). For the more complex population of olive oil samples from all over Greece (total number of 92 samples), results were between 0.7 and 0.9% by weight with a cross-validation sample package size of 20. Notably, the calibration method with Tabu variable selection has been shown to be a valid chemometric approach by which a single model can be applied with a low SEP of 1.4% for olive oil samples across three different harvest years.  相似文献   

5.
Six popular approaches of «NIR spectrum–property» calibration model building are compared in this work on the basis of a gasoline spectral data. These approaches are: multiple linear regression (MLR), principal component regression (PCR), linear partial least squares regression (PLS), polynomial partial least squares regression (Poly-PLS), spline partial least squares regression (Spline-PLS) and artificial neural networks (ANN). The best preprocessing technique is found for each method. Optimal calibration parameters (number of principal components, ANN structure, etc.) are also found. Accuracy, computational complexity and application simplicity of different methods are compared on an example of prediction of six important gasoline properties (density and fractional composition). Errors of calibration using different approaches are found. An advantage of neural network approach to solution of «NIR spectrum–gasoline property» problem is illustrated. An effective model for gasoline properties prediction based on NIR data is built.  相似文献   

6.
Recent work has shown that ridge regression (RR) is Pareto to partial least squares (PLS) and principal component regression (PCR) when the variance indicator Euclidian norm of the regression coefficients, //p//, is plotted against the bias indicator root mean square error of calibration (RMSEC). Simplex optimization demonstrates that RR is Pareto for several other spectral data sets when //p// is used with RMSEC and the root mean square error of evaluation (RMSEE) as optimization criteria. From this investigation, it was observed that while RR is Pareto optimal, PLS and PCR harmonious models are near equivalent to harmonious RR models. Additionally, it was found that RR is Pareto robust, i.e., models formed at one temperature were then used to predict samples at another temperature. Wavelength selection is commonly performed to improve analysis results such that bias indicators RMSEC, RMSEE, root mean square error of validation, or root mean square error of cross-validation decrease using a subset of wavelengths. Just as critical to an analysis of selected wavelengths is an assessment of variance. Using wavelengths deemed optimal in a previous study, this paper reports on the variance/bias tradeoff. An approach that forms the Pareto model with a Pareto wavelength subset is suggested.  相似文献   

7.
Several analytical applications of spectroscopy are based on the assessment of a linear model, linking laboratory values to spectral data. Among various procedures, the following three methods have been used, i.e. principal component regression (PCR), partial least squares (PLS) and latent root regression (LRR). These methods can be applied in order to tackle the high collinearity commonly observed with spectral data. A collection of 99 near-infrared spectra, each including 351 data points, was used for the comparison of the 3 methods. The dependent variable was the specific production of pelleting. The spectral collection was divided into 49 and 50 observations for calibration and validation, respectively. The main elements of comparison were the minimum error observed on the verification set, the number of regressors introduced in the models and the stability of the errors around the minimum values. The minimum errors were 3.29, 3.13 and 3.07 for PCR, PLS and LRR, respectively. LRR required a large number of regressors in order to obtain the minimum error. Nevertheless, it gave very stable results, and the errors were not markedly increased when an arbitrary large number of regressors was introduced into the LRR model.  相似文献   

8.
Fu GH  Xu QS  Li HD  Cao DS  Liang YZ 《Applied spectroscopy》2011,65(4):402-408
In this paper a novel wavelength region selection algorithm, called elastic net grouping variable selection combined with partial least squares regression (EN-PLSR), is proposed for multi-component spectral data analysis. The EN-PLSR algorithm can automatically select successive strongly correlated prediction variable groups related to the response variable using two steps. First, a portion of the correlated predictors are selected and divided into subgroups by means of the grouping effect of elastic net estimation. Then, a recursive leave-one-group-out strategy is employed to further shrink the variable groups in terms of the root mean square error of cross-validation (RMSECV) criterion. The performance of the algorithm with real near-infrared (NIR) spectroscopic data sets shows that the EN-PLSR algorithm is competitive with full-spectrum PLS and moving window partial least squares (MWPLS) regression methods and it is suitable for use with strongly correlated spectroscopic data.  相似文献   

9.
A new wavelength interval selection procedure, moving window partial least-squares regression (MWPLSR), is proposed for multicomponent spectral analysis. This procedure builds a series of PLS models in a window that moves over the whole spectral region and then locates useful spectral intervals in terms of the least complexity of PLS models reaching a desired error level. Based on a proposed theory demonstrating the necessity of wavelength selection, it is shown that MWPLSR provides a viable approach to eliminate the extra variability generated by non-composition-related factors such as the perturbations in experimental conditions and physical properties of samples. A salient advantage of MWPLSR is that the calibration model is very stable against the interference from non-composition-related factors. Moreover, the selection of spectral intervals in terms of the least model complexity enables the reduction of the size of a calibration sample set in calibration modeling. Two strategies are suggested for coupling the MWPLSR procedure with PLS for multicomponent spectral analysis: One is the inclusion of all selected intervals to develop a PLS calibration model, and the other is the combination of the PLS models built separately in each interval. The combination of multiple PLS models offers a novel potential tool for improving the performance of individual models. The proposed procedures are evaluated using two open-path Fourier transform infrared data sets and one near-infrared data set, each having different noise characteristics. The results reveal that the proposed procedures are very promising for vibrational spectroscopy-based multicomponent analyses and give much better prediction than the full-spectrum PLS modeling.  相似文献   

10.
There are many chemometric applications, such as spectroscopy, where the objective is to explain a scalar response from a functional variable (the spectrum) whose observations are functions of wavelengths rather than vectors. In this paper, PLS regression is considered for estimating the linear model when the predictor is a functional random variable. Due to the infinite dimension of the space to which the predictor observations belong, they are usually approximated by curves/functions within a finite dimensional space spanned by a basis of functions. We show that PLS regression with a functional predictor is equivalent to finite multivariate PLS regression using expansion basis coefficients as the predictor, in the sense that, at each step of the PLS iteration, the same prediction is obtained. In addition, from the linear model estimated using the basis coefficients, we derive the expression of the PLS estimate of the regression coefficient function from the model with a functional predictor. The results provided by this functional PLS approach are compared with those given by functional PCR and discrete PLS and PCR using different sets of simulated and spectrometric data.  相似文献   

11.
In multivariate calibration methods like partial least squares (PLS), especially when the spectra data consists of measurements at hundreds and even thousands of analytical channels, it is widely accepted that before a multivariate regression model is built, a well-performed variable selection can be helpful to improve the predictive ability of the model. In the present paper, the idea of variable selection is extended. Unlike in traditional variable selection methods, where the deleted variables and the variables included in the regression model are essentially weighted with discrete values 0 and 1, respectively, the strategy adopted in this paper is to weight the variables with continuous non-negative values. A recently proposed global optimization method, particle swarm optimization (PSO) algorithm is used to search for the weights of variables optimizing the training of a calibration set and the prediction of an independent validation set. Since variable selection is just a special case of variable weighting, the latter is expected to be more rational and flexible. Variable weighting would reduce the negative influence of wavelengths with undesirable qualities while retaining the useful information carried by them. Variable weighting would also prevent the possible spoiling of the multi-channel advantage of the model by variable selection, which would happen when the number of selected wavelengths is small. Two real data sets are investigated and the results of variable-weighted PLS and those of PLS are compared to demonstrate the advantages of the proposed method.  相似文献   

12.
This paper describes an adaptation of Ergon's 2PLS approach (Compression into two-component PLS factorizations. J. Chemom. 2003; 17: 303-312.) to represent a single predictor regression model in terms of a two-factor latent vector model. The purpose of this reduction is to aid model interpretation and diagnostics. Non-orthogonal score vectors are produced from two orthonormal loading vectors: one identical to the first PLS loading vector, and a second built from the regression vector. Using an invertible matrix, the factorization can be alternatively represented by two orthogonal score vectors, one of which is proportional to centred predictions. An auxiliary set of loadings is also calculated, which captures a different model space, but is provided since its associated residuals have useful properties. Identities connecting the two model spaces are provided. The latent vector regression coefficients are not always least-squares estimates but can be represented as the solution to a two-term generalized ridge regression. Consequences of this are addressed. The utility of TinyLVR is demonstrated with example models built using stepwise variate selection and ridge regression.  相似文献   

13.
A class of multivariate calibration methods called augmented classical least squares (ACLS) has been proposed which combines an explicit linear additive model with the predictive power of inverse models, such as principal component regression (PCR) and partial least squares (PLS). Because of its use of the explicit linear additive model, ACLS provides an interesting framework to incorporate different sources of prior information, such as measured pure component spectra, in the model. In this study, the predictive power of ACLS models incorporating different amounts of prior information has been compared to that of PCR and PLS using two examples, a designed experiment and one with biological samples. In both cases, the ACLS models showed predictive power comparable to PLS under idealized validation conditions. When a different interferent structure was present in the validation samples, the predictive power of the inverse models (PCR and PLS) dramatically decreased, with an increase in root-mean-squared error of prediction by a factor of 3.5 for the first example and a factor of 2 in the second example. The incorporation of prior information in the ACLS framework was found to considerably reduce or even completely remove these dramatic effects, especially when the pure component contributions for the interferents were taken into account.  相似文献   

14.
This work presents a new method for variable selection in complex spectral profiles. The method is validated by comparing samples from cerebrospinal fluid (CSF) with the same samples spiked with peptide and protein standards at different concentration levels. Partial least squares discriminant analysis (PLS-DA) attempts to separate two groups of samples by regressing on a y-vector consisting of zeros and ones in the PLS decomposition. In most cases, several PLS components are needed to optimize the discrimination between groups. This creates difficulties for the interpretation of the model. By using the y-vector as a target, it is possible to transform the PLS components to obtain a single predictive target-projected component analogously to the predictive component in orthogonal partial least squares discriminant analysis (OPLS-DA). By calculating the ratio between explained and residual variance of the spectral variables on the target-projected component, a selectivity ratio plot is obtained that can be used for variable selection. Used on whole mass spectral profiles of pure and spiked CSF, we can detect peptide in the low molecular mass range (740–9000 Da) at least down to 400 pM level without severe problems with false biomarker candidates. Similarly, we detect added proteins at least down to 2 nM level in the medium mass range (6000–17,500 Da). Target projection represents the optimal way to fit a latent variable decomposition to a known target, but the selectivity ratio plot can be used for OPLS as well as other methods that produce a single predictive component. Comparison with some commonly used tools for variable selection shows that the selectivity ratio plot has the best performance. This observation is attributed to the fact that target projection utilizes both the predictive ability (regression coefficients) and the explanatory ability (spectral variance/covariance matrix) for the calculation of the selectivity ratio.  相似文献   

15.
16.
Polarization eigenstates for twisted-nematic liquid-crystal displays   总被引:7,自引:0,他引:7  
Davis JA  Moreno I  Tsai P 《Applied optics》1998,37(5):937-945
We derive theoretical expressions for the eigenvalues and the eigenvectors for a twisted-nematic liquid-crystal display (LCD) as a function of the twist angle and the birefringence by use of the Jones-matrix formalism. These polarization eigenvectors are of particular interest for phase-only transmission because they propagate unchanged through the display. We find that the eigenvectors are elliptically polarized and that the ellipticity changes as a function of the birefringence of the LCD (which is proportional to the external voltage applied to the display). We can define an average eigenvector over a desired range for the applied voltage. We show, using Jones matrices, how this average eigenvector can be generated using a quarter-wave plate and a linear polarizer having appropriate orientation angles. Using this average eigenvector, we show that superior phase-only operation can be obtained over a given operating range for the LCD compared with other approaches.  相似文献   

17.
It is well known that the sensitivity analysis of the eigenvectors corresponding to multiple eigenvalues is a difficult problem. The main difficulty is that for given multiple eigenvalues, the eigenvector derivatives can be computed for a specific eigenvector basis, the so-called adjacent eigenvector basis. These adjacent eigenvectors depend on individual variables, which makes the eigenvector derivative calculation elaborate and expensive from a computational perspective. This research presents a method that avoids passing through adjacent eigenvectors in the calculation of the partial derivatives of any prescribed eigenvector basis. As our method fits into the adjoint sensitivity analysis , it is efficient for computing the complete Jacobian matrix because the adjoint variables are independent of each variable. Thus our method clarifies and unifies existing theories on eigenvector sensitivity analysis. Moreover, it provides a highly efficient computational method with a significant saving of the computational cost. Additional benefits of our approach are that one does not have to solve a deficient linear system and that the method is independent of the existence of repeated eigenvalue derivatives of the multiple eigenvalues. Our method covers the case of eigenvectors associated to a single eigenvalue. Some examples are provided to validate the present approach.  相似文献   

18.
This paper describes mathematical techniques to correct for analyte-irrelevant optical variability in tissue spectra by combining multiple preprocessing techniques to address variability in spectral properties of tissue overlying and within the muscle. A mathematical preprocessing method called principal component analysis (PCA) loading correction is discussed for removal of inter-subject, analyte-irrelevant variations in muscle scattering from continuous-wave diffuse reflectance near-infrared (NIR) spectra. The correction is completed by orthogonalizing spectra to a set of loading vectors of the principal components obtained from principal component analysis of spectra with the same analyte value, across different subjects in the calibration set. Once the loading vectors are obtained, no knowledge of analyte values is required for future spectral correction. The method was tested on tissue-like, three-layer phantoms using partial least squares (PLS) regression to predict the absorber concentration in the phantom muscle layer from the NIR spectra. Two other mathematical methods, short-distance correction to remove spectral interference from skin and fat layers and standard normal variate scaling, were also applied and/or combined with the proposed method prior to the PLS analysis. Each of the preprocessing methods improved model prediction and/or reduced model complexity. The combination of the three preprocessing methods provided the most accurate prediction results. We also performed a preliminary validation on in vivo human tissue spectra.  相似文献   

19.
For scenes with complicated environments, the object is hard to discriminate from a background of various colors in color vision applications. This paper presents a partial least squares (PLS) method for improving discrimination of colored surfaces by selecting appropriate spectral intervals for illumination from the visible spectrum. First, the reflectance functions of all the surfaces are calibrated by multiple standard references. Second, the spectral intervals with high variables important in projection (VIP) scores of PLS analysis are selected for LED illumination. Afterwards, by using the selected wavelength intervals of LEDs for experiment, surfaces of the captured image can be clearly distinguished. Compared with the images obtained under illumination of unselected wavelength intervals of LEDs, the discriminations of most surfaces are more effective. The experiment result demonstrates the usefulness of this method.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号