首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Although, in many cases parallel factor analysis (PARAFAC) resolves the trilinear data arrays to the true physical factors that form the data, i.e., unique solution can be found, the algorithm does not always converge to chemically meaningful solutions. Kiers and Smilde [J. Chemom. 1995; 9: 179-195] rigorously proved that unique decomposition does not hold in cases with ‘rank overlap’. They showed when PARAFAC is applied on a three-way data array which has rank overlap in one of its loading modes; the solution obtained is not unique and at best cannot be easily compared with the underlying physical factors owing to a rotational ambiguity.An aspect which is significantly less documented in the previous publications is the reliable detection of rotational ambiguities in multi-way methods. A few reported methods are based on bilinear models for calculating the feasible bands of three-way data. In this paper we propose a method to calculate feasible bands of resolved profiles of components in three-way methods and visualize the rotational ambiguity in three-way data in the results of the three-way methods. Most of discussion is in the PARAFAC algorithm. The principle behind the algorithm is described in detail and tested for simulated data set. Completely general and exhaustive results are presented for the two-component cases. In particular, the effect of the noise is investigated and a comparison is made between feasible solutions obtained from PARAFAC and matrix-augmented with trilinearity. It is shown that the results obtained from both methods are identical.  相似文献   

2.
The single stranded DNA can be adsorbed on the negatively charged surface of gold nanoparticles (AuNPs), but the rigid structure of double stranded DNA prevents it from adsorption. Signal of a tagged single stranded DNA will be quenched by the plasmon effect of the AuNP surface after its adsorption. This phenomenon has been used to study the DNA hybridization and interactions of two complementary 21mer oligonucleotides each tagged with a different fluorescent dye in the presence of 13 nm gold nanoparticles. The DNA strands used in this study belong to the genome of HIV. The obtained rank deficient three-way fluorescence data sets were resolved by both PARAFAC and restricted Tucker3 models. This is the first successful application of a multiway chemometric technique to analyze multidimensional nanobiological data. The restricted Tucker3 showed a better performance compared to PARAFAC in resolving the data sets. The advantages of restricted Tucker3 analysis over the unrestricted one, i.e., the limited rotational freedom (more unique results) and better interpretability of the obtained results, were experienced in this study. The resolved excitation, emission, and concentration profiles and specially fluorescence resonance energy transfer (FRET) profiles obtained by restricted Tucker3 were chemically more meaningful than those obtained from PARAFAC.  相似文献   

3.
In chemometrics, two-way singular value decomposition (SVD), CANDECOMP-PARAFAC decomposition (PARAFAC), and Tucker decomposition (TUKER) are three main array decomposition methods. There are disadvantages with the three methods. If multiway data are indeed multilinear, PARAFAC and TUCKER can provide more robust and interpretable models compared to two-way SVD. However, PARAFAC is sometimes numerically unstable, and TUCKER cannot guarantee the uniqueness of an approximate solution. This paper proposes a new array decomposition model with multiple bilinear structure. Then, utilizing this model, a new method, called multiple bilinear decomposition (MBD), is proposed as a generalization of two-way SVD. An algorithm is established to successively decompose an array without a full decomposition, which is not based on alternating least squares. Theoretically, the proposed method has an advantage over PARAFAC and TUCKER in its three important properties, including orthonormality of loading vectors, closed-form decomposition, and successive decomposition of variation. The simulation results based on orthogonal PARAFAC models show that the proposed method outperforms PARAFAC with respect to accuracy and robustness of loading estimate and data-fitting of model, even though the former does not use the priori information of multilinear structure. And, especially in the simulation under no noise, the equivalence of loading estimates indicates that as a successive decomposition, MBD is a superior alternative to PARAFAC.  相似文献   

4.
Appropriate closed-form expressions are known for estimating analyte sensitivities when calibrating with one-, two-, and three-way data (vectors, matrices, and three-dimensional arrays, respectively, built with data for a group of samples). In this report, sensitivities are estimated for calibration with four-way data using the quadrilinear parallel factor (PARAFAC) model, making it possible to assess important figures of merit for method comparison or optimization. The strategy is based on the computation of the uncertainty in the fitted PARAFAC parameters through the Jacobian matrix. Extensive Monte Carlo noise addition simulations in four-way data systems having widely different overlapping situations are helpful in supporting the present approach, which was also applied to two experimental analytical systems. With this proposal, the estimation of the PARAFAC sensitivity for calibration scenarios involving three- and four-way data may be considered complete.  相似文献   

5.
Herbal preparations represent very complex mixtures, potentially containing multiple pharmacologically active entities. Methods for global characterization of the composition of such mixtures are therefore of pertinent interest. In this work, chemometric analysis of high-performance liquid chromatography with photodiode-array detection (HPLC-PDA) data from extracts of commercial preparations of Hypericum perforatum (St. John's wort) that originate from several continents is described. The spectral HPLC profiles were aligned in the elution mode using correlation optimized warping in order to remove peak misalignment caused by retention time shifts due to matrix effects. Furthermore, the warping was assisted by HPLC-PDA-SPE-NMR-MS (SPE = solid-phase extraction) experiments that yielded 1H NMR and 13C NMR data (from 1H-detected heteronuclear correlations), as well as ESI-MS and HRMS data, which enabled the identification of all major mixture constituents. The preprocessed HPLC-PDA data were subjected to parallel factor analysis (PARAFAC), a chemometric method that is a generalization of principal component analysis (PCA) to multi-way data arrays. PCA of the peak areas obtained from the PARAFAC analysis was used to facilitate sample comparison and allowed straightforward interpretation of constituents responsible for the differences in composition between individual preparations. In addition, loadings from the PARAFAC analysis provided pure elution profiles and pure UV spectra even for coeluting peaks, thus enabling the identification of chromatographically unresolved components. In conclusion, PARAFAC analysis of the readily accessible HPLC-PDA data provides the means for unsupervised and unbiased assessment of the composition of herbal preparations, of interest for assessment of their pharmacological activity and clinical efficacy.  相似文献   

6.
This paper describes an improved three-way alternating least-squares multivariate curve resolution algorithm that makes use of the recently introduced multi-dimensional arrays of MATLAB®. Multi-dimensional arrays allow for a convenient way to apply chemically sound constraints, such as closure, in the third dimension. The program is designed for kinetic studies on liquid chromatography with diode array detection but can be used for other three-way data analysis. The program is tested with a large number of synthetic data sets and its flexibility is demonstrated, especially when non-trilinear data sets are fit. In this case, the algorithm finds a solution with a better fit than direct trilinear decomposition (DTD). When trilinear data are used, the optimal fit is not as good as when a direct decomposition method is used. Most real data sets, however, have some degree of non-trilinearity. This makes this method a better choice to analyze non-trilinear, three-way data than direct trilinear decomposition.  相似文献   

7.
Multi-way data analysis techniques are becoming ever more widely used to extract information from data, such as 3D excitation-emission fluorescence spectra, that are structured in (hyper-) cubic arrays. Parallel Factor Analysis (PARAFAC) is very commonly applied to resolve 3D-fluorescence data and to recover the signals corresponding to the various fluorescent constituents of the sample. The choice of the appropriate number of factors to use in PARAFAC is one of the crucial steps in the analysis. When the signals in the data come from a relatively small number of easily distinguished constituents, the choice of the appropriate number of factors is usually easy and the mathematical diagnostic tools such as the Core Consistency, in general give good results. However, when the data is from a set of natural samples, the core consistency may not be a good indicator for the choice of the appropriate number of factors.In this work, Multi-way Principal Component Analysis (MPCA) and the Durbin-Watson criterion (DW) are utilized to choose the number of factors to use in PARAFAC decomposition. This is demonstrated in a case where 3D-front-face fluorescence spectroscopy is used to monitor of the evolution of naturally occurring and neo-formed fluorescent components in oils during thermal treatment.  相似文献   

8.
Ni Y  Lai Y  Kokot S 《Applied spectroscopy》2012,66(7):810-819
An analytical method for the classification of complex real-world samples was researched and developed with the use of excitation-emission fluorescence matrix (EEFM) spectroscopy, using the medicinal herbs, Rhizoma corydalis decumbentis (RCD) and Rhizoma corydalis (RC) as example samples. The data set was obtained from various authentic RCD-A and RC-A, adulterated AD, and commercial RCD-C and RC-C samples. The spectra (range: λ(ex) = 215~395 nm and λ(em) = 290~560 nm), arranged in two- and three-way data matrix formats, were processed using principal component analysis (PCA) and parallel factor analysis (PARAFAC) to produce two-dimensional component-by-component plots for qualitative data classification. The RCD-A and RC-A object groups were clearly discriminated, but the AD and the RCD-C as well as RC-C samples were less well separated. PARAFAC analysis produced somewhat better discrimination, and loadings plots revealed the presence of the marker compound Protopine-a strongly fluorescing substance-as well as at least two other unidentified fluorescent components. Classification performance of the common K-nearest neighbors (KNN) and linear discrimination analysis (LDA) methods was relatively poor when compared with that of the back propagation- and radial basis function-artificial neural networks (BP-ANN and RBF-ANN) models on the basis of two- and three-way formatted data. The best results were obtained with the three-way fingerprints and the RBF-ANN model. Subsequently, the quality of the commercial samples (RCD-C and RC-C) was classified on the best optimized RBF-ANN model. Thus, EEFM spectroscopy, which provides three-way measured data, is potentially a powerful analytical technique for the analysis of complex real-world substances provided the classification is performed by the RBF-ANN or similar ANN methods.  相似文献   

9.
Time-of-flight secondary ion mass spectrometry (TOF-SIMS) spectra of mineral samples are complex, comprised of large mass ranges and many peaks. Consequently, characterization and classification analysis of these systems is challenging. In this study, different chemometric and statistical data evaluation methods, based on monolayer sensitive TOF-SIMS data, have been tested for the characterization and classification of copper-iron sulfide minerals (chalcopyrite, chalcocite, bornite, and pyrite) at different flotation pulp conditions (feed, conditioned feed, and Eh modified). The complex mass spectral data sets were analyzed using the following chemometric and statistical techniques: principal component analysis (PCA); principal component-discriminant functional analysis (PC-DFA); soft independent modeling of class analogy (SIMCA); and k-Nearest Neighbor (k-NN) classification. PCA was found to be an important first step in multivariate analysis, providing insight into both the relative grouping of samples and the elemental/molecular basis for those groupings. For samples exposed to oxidative conditions (at Eh ~430 mV), each technique (PCA, PC-DFA, SIMCA, and k-NN) was found to produce excellent classification. For samples at reductive conditions (at Eh ~ -200 mV SHE), k-NN and SIMCA produced the most accurate classification. Phase identification of particles that contain the same elements but a different crystal structure in a mixed multimetal mineral system has been achieved.  相似文献   

10.
An efficient method is proposed for determining the chemical rank of three-way fluorescence data arrays. At first, the original three-way fluorescence data arrays are preprocessed by Monte Carlo simulation and a new set of data arrays is generated. The new set of data arrays obtained does not only keep all the useful information, but the noises from the common background are largely removed, which results in the improvement of the signal to noise ratio of the data and is beneficial for the later frequency analysis. Then, we perform singular value decomposition over the new data and frequency analysis on the subsequent eigenvectors, with which it is very easy to distinguish the spectra from the noises. Furthermore, a new quantity frequency localization is introduced to quantify the frequency characteristics of the eigenvectors. With this quantity, we can easily and accurately select out the spectra from the mess of data. The feasibility of the method is verified by determining the chemical rank of two-component mixtures with simple calculation procedures and high efficiency. Finally, the efficiency of our method is further illustrated by comparison with the core consistency diagnostic (CORCONDIA) method in the analysis of mixtures with different concentration and different number of components.  相似文献   

11.
The aim of this work was to propose a quick and cost-effective procedure, which could help to identify the types of fat (rapeseed, a mixture of rapeseed and soybean, and lard oils) added to feed used for raising pigs. For this purpose, liver samples were examined and their near-infrared reflectance spectra served as data for the construction of classic and robust soft independent modeling of class analogy (SIMCA) models. The results showed that the near-infrared reflectance spectra contained information sufficient to build good classification models that enabled three types of fat additions to be distinguished. The best classification results were obtained from robust SIMCA, indicating its superior performance in terms of high sensitivity and specificity in comparison with classic SIMCA. Specifically, robust models had sensitivities of 100% and specificities of 96.05%, 97.73% and 100%, for rapeseed, mixture of rapeseed and soybean, and lard enriched feed, respectively.  相似文献   

12.
A spectrofluorimetric method has been developed for the quantitative determination of mefenamic, flufenamic, and meclofenamic acids in urine samples. The method is based on second-order data multivariate calibration (unfolded partial least squares (unfolded-PLS), multi-way PLS (N-PLS), parallel factor analysis (PARAFAC), self-weighted alternating trilinear decomposition (SWATLD), and bilinear least squares (BLLS)). The analytes were extracted from the urine samples in chloroform prior to the determination. The chloroform extraction was optimized for each analyte, studying the agitation time and the extraction pH, and the optimum values were 10 minutes and pH 3.5, respectively. The concentration ranges in chloroform solution of each of the analytes, used to construct the calibration matrix, were selected in the ranges from 0.15 to 0.8 microg mL-1 for flufenamic and meclofenamic acids and from 0.25 to 3.0 microg mL-1 for mefenamic acid. The combination of chloroform extraction and second-order calibration methods, using the excitation-emission matrices (EEMs) of the three analytes as analytical signals, allowed their simultaneous determination in human urine samples, in the range of approximately 80 mg L-1 to 250 mg L-1, with satisfactory results for all the assayed methods. Improved results over unfolded-PLS and N-PLS were found with PARAFAC, SWATLD, and BLLS, methods that exploit the second-order advantage.  相似文献   

13.
A new methodology for the alignment of matrix chromatographic data is proposed, based on the decomposition of a three-way array composed of a test and a reference data matrix using a suitably initialized and constrained parallel factor (PARAFAC) model. It allows one to perform matrix alignment when the test data matrix contains unexpected chemical interferences, in contrast to most of the available algorithms. A series of simulated analytical systems is studied, as well as an experimental one, all having calibrated analytes and also potential interferences in the test samples, i.e., requiring the second-order advantage for successful analyte quantitation. The results show that the newly proposed method is able to properly align the different data matrix, restoring the trilinearity which is required to process the calibration and test data with second-order multivariate calibration algorithms such as PARAFAC. Recent models including unfolded partial least-squares regression (U-PLS) and N-dimensional PLS (N-PLS), combined with residual bilinearization (RBL), are also applied to both simulated and experimental data. The latter one corresponds to the determination of the polycyclic aromatic hydrocarbons benzo[b]fluoranthene and benzo[k]fluoranthene in the presence of benzo[j]fluoranthene as interference. The analytical figures of merit provided by the second-order calibration models are compared and discussed.  相似文献   

14.
A Diltiazem kinetic spectrophotometric UV–Vis method, based on a reaction of the Diltiazem with hidroxylamine and a ferric salt, was used for the quantification of Diltiazem in different pharmaceutical formulations. This method is based on the acquisition of three-way data structures [wavelength (nm) × time (s) × concentration (mg/L)] followed by chemometric analysis by an appropriate PARAFAC2 or MCR-ALS second-order calibration model. The results obtained are compared with those obtained by direct determination, at maximum wavelength, and by the United States Pharmacopeia (USP) standard chromatographic method. For all the pharmaceutical formulations analysed good quantification results were found with PARAFAC2 and MCR-ALS second-order calibration models. For bulk drug analysis, detection limits of 6 and 2 mg/L, and for pharmaceutical formulations analysis, an average detection limit of 41 and 39 mg/L were found, respectively with PARAFAC2 and MCR-ALS.  相似文献   

15.
In the global market, it is vital to design products and work environments to satisfactorily meet the regional variations of human body dimensions. Few studies to date have, however, attempted to examine regional differences due to unavailability of data. Jürgens, H.W., Aune, I.A. and Peiper, U., 1990, International Data on Anthropometry, Occupational Safety and Health Series Report #65 (Geneva: International Labor Office) employed various sources to assemble anthropometric data for 20 world regions. The data provided estimates for the 5th, 50th and 95th percentiles of 19 body dimensions for both genders. In this paper, the method parallel factor analysis (PARAFAC), a natural extension of principal component analysis (PCA) to so-called multi-way array, is performed and favourably compared with the results from PCA. Several comparative studies are performed to justify the use of percentiles rather than individual subject body dimensions. It is found that the outcomes using either are comparable, especially if the body dimensions are mean-centred. Body dimensions related to height (such as stature, buttock–heel length, sitting height and forward reach) are the most critical in describing the international variations; hip breadth showing gender difference is the second most important. People in European regions as well as Australia are the tallest and largest, whereas people in South Indian and Latin American (Indian) regions are the shortest and smallest. The 20 world regions are grouped into four groups that are relatively homogeneous in body dimensions. Potential application of PARAFAC is discussed for the areas in which the data are three-dimensional in nature (such as body dimensions?×?gender/percentile?×?age).  相似文献   

16.
Metabolic profiling of natural products is used to map correlated concentration variances of known and unknown secondary metabolites in extracts. NMR-spectroscopy is in this respect regarded as a convenient and reproducible technique with the ability to detect a wide range of small organic compounds. Two-dimensional J-resolved NMR-spectra are used in this context to resolve overlapping signals by separating the effect of J-coupling from the effect of chemical shifts. Often one-dimensional projections of these data are used as input for standard multivariate statistical methods, and only the intensity variances along the chemical shift axis are taken into account. Here, we describe the use of parallel factor analysis (PARAFAC) as a tool to preprocess a set of two-dimensional J-resolved spectra with the aim of keeping the J-coupling information intact. PARAFAC is a mathematical decomposition method that fits three-way experimental data to a model whose parameters in this case reflect concentrations and individual component spectra along the chemical shift axis and corresponding profiles along the J-coupling axis. A set of saffron samples, directly extracted with methanol-d(4), were used as a model system to evaluate the feasibility and merits of the method. To successfully use PARAFAC, the two-dimensional spectra (n = 96) had to be aligned and processed in narrow windows (0.04 ppm wide) along the chemical shift axis. Selection of windows and number of components for each PARAFAC-model was done automatically by evaluating amount of explained variance and core consistency values. Score plots showing the distribution of objects in relation to each other, and loading plots in the form of two-dimensional pseudospectra with the same appearance as the original J-resolved spectra but with positive and negative contributions are presented. Loadings are interpreted not only in terms of signals with different chemical shifts but also the associated J-coupling profiles.  相似文献   

17.
Wine tannins are fundamental to the determination of wine quality. However, the chemical and sensorial analysis of these compounds is not straightforward and a simple and rapid technique is necessary. We analyzed the mid-infrared spectra of white, red, and model wines spiked with known amounts of skin or seed tannins, collected using Fourier transform mid-infrared (FT-MIR) transmission spectroscopy (400-4000 cm(-1)). The spectral data were classified according to their tannin source, skin or seed, and tannin concentration by means of discriminant analysis (DA) and soft independent modeling of class analogy (SIMCA) to obtain a probabilistic classification. Wines were also classified sensorially by a trained panel and compared with FT-MIR. SIMCA models gave the most accurate classification (over 97%) and prediction (over 60%) among the wine samples. The prediction was increased (over 73%) using the leave-one-out cross-validation technique. Sensory classification of the wines was less accurate than that obtained with FT-MIR and SIMCA. Overall, these results show the potential of FT-MIR spectroscopy, in combination with adequate statistical tools, to discriminate wines with different tannin levels.  相似文献   

18.
Derde, M.P. and Massart, D.L., 1988. Comparison of the performance of the class modelling techniques UNEQ, SIMCA, and PRIMA. Chemometrics and Intelligent Laboratory Systems, 4: 65-93By means of a Monte Carlo study a systematic comparison of the supervised pattern recognition techniques of the class modelling type, UNEQ, SIMCA and PRIMA is made. In particular, the success rate of the classification decisions and the influence of the sample size on it were investigated.It was concluded that better class models are obtained when a technique is used that takes the shape of the population distribution into account. If the actual distribution cannot be determined, then use should be made of techniques that make no or only weak assumptions about the shape of the distribution. However, even then it remains worthwhile to investigate whether the variables are correlated and to take this information into account.When using SIMCA and PRIMA, attention should also be paid to the way the class models are defined: an approach that makes use of certain sample parameters such as the range of the variables or the maximum distance between a training object and the class model might lead to overly broad models, especially for large training sets.  相似文献   

19.
For many years it has been known that PARAFAC offers a very attractive approach for modeling fluorescence excitation-emission matrices. Due to the uniqueness of the PARAFAC model and analogy between the structure of fluorescence data and the PARAFAC model, it is apparent that PARAFAC can resolve overlapping signals into pure spectra and relative concentrations under mild conditions. There are hundreds of applications exemplifying this, but still the use of PARAFAC has not spread from chemometrics to more main-stream analytical chemistry. Many reasons can be offered to explain this, but one seems to be that in practice it is difficult for chemometric novices to make use of PARAFAC. Selection of wavelengths, handling of scatter and of outliers are all issues that must be dealt with in order to build a good PARAFAC model. In this paper, a new algorithm called EEMizer is developed that aims to automate the use of PARAFAC. Through several examples it is shown how this algorithm can provide appealing PARAFAC models of data that would otherwise be hard to model.  相似文献   

20.
Imaging mass spectrometry (IMS) is a promising technology which allows for detailed analysis of spatial distributions of (bio)molecules in organic samples. In many current applications, IMS relies heavily on (semi)automated exploratory data analysis procedures to decompose the data into characteristic component spectra and corresponding abundance maps, visualizing spectral and spatial structure. The most commonly used techniques are principal component analysis (PCA) and independent component analysis (ICA). Both methods operate in an unsupervised manner. However, their decomposition estimates usually feature negative counts and are not amenable to direct physical interpretation. We propose probabilistic latent semantic analysis (pLSA) for non-negative decomposition and the elucidation of interpretable component spectra and abundance maps. We compare this algorithm to PCA, ICA, and non-negative PARAFAC (parallel factors analysis) and show on simulated and real-world data that pLSA and non-negative PARAFAC are superior to PCA or ICA in terms of complementarity of the resulting components and reconstruction accuracy. We further combine pLSA decomposition with a statistical complexity estimation scheme based on the Akaike information criterion (AIC) to automatically estimate the number of components present in a tissue sample data set and show that this results in sensible complexity estimates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号