首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
A very important problem in industrial applications of PCA and PLS models, such as process modelling or monitoring, is the estimation of scores when the observation vector has missing measurements. The alternative of suspending the application until all measurements are available is usually unacceptable. The problem treated in this work is that of estimating scores from an existing PCA or PLS model when new observation vectors are incomplete. Building the model with incomplete observations is not treated here, although the analysis given in this paper provides considerable insight into this problem. Several methods for estimating scores from data with missing measurements are presented, and analysed: a method, termed single component projection, derived from the NIPALS algorithm for model building with missing data; a method of projection to the model plane; and data replacement by the conditional mean. Expressions are developed for the error in the scores calculated by each method. The error analysis is illustrated using simulated data sets designed to highlight problem situations. A larger industrial data set is also used to compare the approaches. In general, all the methods perform reasonable well with moderate amounts of missing data (up to 20% of the measurements). However, in extreme cases where critical combinations of measurements are missing, the conditional mean replacement method is generally superior to the other approaches.  相似文献   

3.
Principal component regression (PCR) is unique in that the principal component analysis (PCA) step is explicitly involved in the central part of the method. In the present paper, the PCA part is examined in order to study the influence of noise in spectra on PCR by spectral simulation. It has been suggested, as a result, that PCR calibration would have a large inaccuracy when the estimated number of basis factors analyzed by the eigenvalue method is less than that by cross-validation, which was studied by use of synthesized spectra. This instability is because the minute noise is largely enhanced by the PCA calculation via the normalization of loadings. At the same time, the noise enhancement by PCA has also been characterized to influence the estimation of basis factors.  相似文献   

4.
Rural non-interstate crashes induce a significant amount of severe injuries and fatalities. Examination of such injury patterns and the associated contributing factors is of practical importance. Taking into account the ordinal nature of injury severity levels and the hierarchical feature of crash data, this study employs a hierarchical ordered logit model to examine the significant factors in predicting driver injury severities in rural non-interstate crashes based on two-year New Mexico crash records. Bayesian inference is utilized in model estimation procedure and 95% Bayesian Credible Interval (BCI) is applied to testing variable significance. An ordinary ordered logit model omitting the between-crash variance effect is evaluated as well for model performance comparison. Results indicate that the model employed in this study outperforms ordinary ordered logit model in model fit and parameter estimation. Variables regarding crash features, environment conditions, and driver and vehicle characteristics are found to have significant influence on the predictions of driver injury severities in rural non-interstate crashes. Factors such as road segments far from intersection, wet road surface condition, collision with animals, heavy vehicle drivers, male drivers and driver seatbelt used tend to induce less severe driver injury outcomes than the factors such as multiple-vehicle crashes, severe vehicle damage in a crash, motorcyclists, females, senior drivers, driver with alcohol or drug impairment, and other major collision types. Research limitations regarding crash data and model assumptions are also discussed. Overall, this research provides reasonable results and insight in developing effective road safety measures for crash injury severity reduction and prevention.  相似文献   

5.
The majority of works in metabolomics employ approaches based on principal components analysis (PCA) and partial least-squares, primarily to determine whether samples fall within large groups. However, analytical chemists rarely tackle the problem of individual fingerprinting, and in order to do this effectively, it is necessary to study a large number of small groups rather than a small number of large groups and different approaches are required, as described in this paper. Furthermore, many metabolomic studies on mammals and humans involve analyzing compounds (or peaks) that are present in only a certain portion of samples, and conventional approaches of PCA do not cope well with sparse matrices where there may be many 0s. There is, however, a large number of qualitative similarity measures available for this purpose that can be exploited via principal coordinates analysis (PCO). It can be shown that PCA scores are a specific case of PCO scores, using a quantitative similarity measure. A large-scale study of human sweat consisting of nearly 1000 gas chromatography/mass spectrometry analyses from the sweat of an isolated population of 200 individuals in Carinthia (Southern Austria) sampled once per fortnight over 10 weeks was employed in this study and grouped into families. The first step was to produce a peak table requiring peak detection, alignment, and integration. Peaks were reduced from 5080 to 373 that occurred in at least 1 individual over 4 out of 5 fortnights. Both qualitative (presence/absence) and quantitative (equivalent to PCA) similarity measures can be computed. PCO and the Kolomorogov-Smirnoff (KS) rank test are applied to these similarity matrices. It is shown that for this data set there is a reproducible individual fingerprint, which is best represented using the qualitative similarity measure as assessed both by the Hotelling t2 statistic as applied to PCO scores and the probabilities associated with the KS rank test.  相似文献   

6.
Many dynamic models are used for risk assessment and decision support in ecology and crop science. Such models generate time-dependent model predictions, with time either discretised or continuous. Their global sensitivity analysis is usually applied separately on each time output, but Campbell et al. (2006 [1]) advocated global sensitivity analyses on the expansion of the dynamics in a well-chosen functional basis. This paper focuses on the particular case when principal components analysis is combined with analysis of variance. In addition to the indices associated with the principal components, generalised sensitivity indices are proposed to synthesize the influence of each parameter on the whole time series output. Index definitions are given when the uncertainty on the input factors is either discrete or continuous and when the dynamic model is either discrete or functional. A general estimation algorithm is proposed, based on classical methods of global sensitivity analysis.The method is applied to a dynamic wheat crop model with 13 uncertain parameters. Three methods of global sensitivity analysis are compared: the Sobol'-Saltelli method, the extended FAST method, and the fractional factorial design of resolution 6.  相似文献   

7.
The paper discusses the nested logit model for choices between a set of mutually exclusive alternatives (e.g. brand choice, strategy decisions, modes of transportation, etc.). Due to the ability of the nested logit model to allow and account for similarities between pairs of alternatives, the model has become very popular for the empirical analysis of choice decisions. However the fact that there are two different specifications of the nested logit model (with different outcomes) has not received adequate attention. The utility maximization nested logit (UMNL) model and the non-normalized nested logit (NNNL) model have different properties, influencing the estimation results in a different manner. This paper introduces distinct specifications of the nested logit model and indicates particularities arising from model estimation. The effects of using various software packages on the estimation results of a nested logit model are shown using simulated data sets for an artificial decision situation. Financial support by the German Research Foundation (DFG) through the research project #BO1952/1 and the SFB 649 “Economic Risk” is gratefully acknowledged. The authors would like to thank two anonymous reviewers for their helpful and constructive comments.  相似文献   

8.
Recent studies in the area of highway safety have demonstrated the usefulness of logit models for modeling crash injury severities. Use of these models enables one to identify and quantify the effects of factors that contribute to certain levels of severity. Most often, these models are estimated assuming equal probability of the occurrence for each injury severity level in the data. However, traffic crash data are generally characterized by underreporting, especially when crashes result in lower injury severity. Thus, the sample used for an analysis is often outcome-based, which can result in a biased estimation of model parameters. This is more of a problem when a nested logit model specification is used instead of a multinomial logit model and when true shares of the outcomes-injury severity levels in the population are not known (which is almost always the case). This study demonstrates an application of a recently proposed weighted conditional maximum likelihood estimator in tackling the problem of underreporting of crashes when using a nested logit model for crash severity analyses.  相似文献   

9.
Many models of plasticity are built using multiple, simple yield surfaces. Examples include geomechanical models and crystal plasticity. This leads to numerical difficulties, most particularly during the stress update procedure, because the combined yield surface is nondifferentiable, and when employing implicit time stepping to solve numerical models, because the Jacobian is often poorly conditioned. A method is presented that produces a single C2 differentiable and convex yield function from a plastic model that contains multiple yield surfaces that are individually C2 differentiable and convex. C2 differentiability ensures quadratic convergence of implicit stress-update procedures; convexity ensures a unique solution to the stress update problem, whereas smoothness means the Jacobian is much better conditioned. The method contains just one free parameter, and the error incurred through the smoothing procedure is quantified in terms of this parameter. The method is illustrated through three different constitutive models. The method's performance is quantified in terms of the number of iterations required during stress update as a function of the smoothing parameter. Two simple finite-element models are also solved to compare this method with existing approaches. The method has been added to the open-source “MOOSE” framework, for perfect, nonperfect, associated, and nonassociated plasticity.  相似文献   

10.
The use of kernel density estimates in discriminant analysis is quite well known among scientists and engineers interested in statistical pattern recognition. Using a kernel density estimate involves properly selecting the scale of smoothing, namely the bandwidth parameter. The bandwidth that is optimum for the mean integrated square error of a class density estimator may not always be good for discriminant analysis, where the main emphasis is on the minimization of misclassification rates. On the other hand, cross-validation–based methods for bandwidth selection, which try to minimize estimated misclassification rates, may require huge computation when there are several competing populations. Besides, such methods usually allow only one bandwidth for each population density estimate, whereas in a classification problem, the optimum bandwidth for a class density estimate may vary significantly, depending on its competing class densities and their prior probabilities. Therefore, in a multiclass problem, it would be more meaningful to have different bandwidths for a class density when it is compared with different competing class densities. Moreover, good choice of bandwidths should also depend on the specific observation to be classified. Consequently, instead of concentrating on a single optimum bandwidth for each population density estimate, it is more useful in practice to look at the results for different scales of smoothing for the kernel density estimates. This article presents such a multiscale approach along with a graphical device leading to a more informative discriminant analysis than the usual approach based on a single optimum scale of smoothing for each class density estimate. When there are more than two competing classes, this method splits the problem into a number of two-class problems, which allows the flexibility of using different bandwidths for different pairs of competing classes and at the same time reduces the computational burden that one faces for usual cross-validation–based bandwidth selection in the presence of several competing populations. We present some benchmark examples to illustrate the usefulness of the proposed methodology.  相似文献   

11.
Quantile regression, as a generalization of median regression, has been widely used in statistical modeling. To allow for analyzing complex data situations, several flexible regression models have been introduced. Among these are the varying coefficient models, that differ from a classical linear regression model by the fact that the regression coefficients are no longer constant but functions that vary with the value taken by another variable, such as for example, time. In this paper, we study quantile regression in varying coefficient models for longitudinal data. The quantile function is modeled as a function of the covariates and the main task is to estimate the unknown regression coefficient functions. We approximate each coefficient function by means of P-splines. Theoretical properties of the estimators, such as rate of convergence and an asymptotic distribution are established. The estimation methodology requests solving an optimization problem that also involves a smoothing parameter. For a special case the optimization problem can be transformed into a linear programming problem for which then a Frisch–Newton interior point method is used, leading to a computationally fast and efficient procedure. Several data-driven choices of the smoothing parameters are briefly discussed, and their performances are illustrated in a simulation study. Some real data analysis demonstrates the use of the developed method.  相似文献   

12.
In this paper, a breast tissue density classification and image retrieval model is studied and a model for the data reduction is presented. This model is based on two-directional two-dimensional principal component analysis ((2D)2PCA) technique, and a support vector machine (SVM) with the radial basis function (RBF) for mammographic images classification and retrieval. The model is formed based on breast density, according to the categories defined by the breast imaging-reporting and data system (BIRADS) which is a standard on the assessment of mammographic images and is tested on the Mammographic Image Analysis Society (MIAS) database. The five-fold cross-validation has been used for the parameters selection in SVM to avoid the over-fitting error in the data classification. The average precision rates of the model are in the range from 87·34% to 99·12%.  相似文献   

13.
This paper presents an approach for detecting and identifying faults in railway infrastructure components. The method is based on pattern recognition and data analysis algorithms. Principal component analysis (PCA) is employed to reduce the complexity of the data to two and three dimensions. PCA involves a mathematical procedure that transforms a number of variables, which may be correlated, into a smaller set of uncorrelated variables called ‘principal components’. In order to improve the results obtained, the signal was filtered. The filtering was carried out employing a state–space system model, estimated by maximum likelihood with the help of the well‐known recursive algorithms such as Kalman filter and fixed interval smoothing. The models explored in this paper to analyse system data lie within the so‐called unobserved components class of models. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

14.
This paper focuses on the relevance of alternate discrete outcome frameworks for modeling driver injury severity. The study empirically compares the ordered response and unordered response models in the context of driver injury severity in traffic crashes. The alternative modeling approaches considered for the comparison exercise include: for the ordered response framework-ordered logit (OL), generalized ordered logit (GOL), mixed generalized ordered logit (MGOL) and for the unordered response framework-multinomial logit (MNL), nested logit (NL), ordered generalized extreme value logit (OGEV) and mixed multinomial logit (MMNL) model. A host of comparison metrics are computed to evaluate the performance of these alternative models. The study provides a comprehensive comparison exercise of the performance of ordered and unordered response models for examining the impact of exogenous factors on driver injury severity. The research also explores the effect of potential underreporting on alternative frameworks by artificially creating an underreported data sample from the driver injury severity sample. The empirical analysis is based on the 2010 General Estimates System (GES) data base—a nationally representative sample of road crashes collected and compiled from about 60 jurisdictions across the United States. The performance of the alternative frameworks are examined in the context of model estimation and validation (at the aggregate and disaggregate level). Further, the performance of the model frameworks in the presence of underreporting is explored, with and without corrections to the estimates. The results from these extensive analyses point toward the emergence of the GOL framework (MGOL) as a strong competitor to the MMNL model in modeling driver injury severity.  相似文献   

15.
Monitoring multichannel profiles has important applications in manufacturing systems improvement, but it is nontrivial to develop efficient statistical methods because profiles are high-dimensional functional data with intrinsic inner- and interchannel correlations, and that the change might only affect a few unknown features of multichannel profiles. To tackle these challenges, we propose a novel thresholded multivariate principal component analysis (PCA) method for multichannel profile monitoring. Our proposed method consists of two steps of dimension reduction: It first applies the functional PCA to extract a reasonably large number of features under the in-control state, and then uses the soft-thresholding techniques to further select significant features capturing profile information under the out-of-control state. The choice of tuning parameter for soft-thresholding is provided based on asymptotic analysis, and extensive numerical studies are conducted to illustrate the efficacy of our proposed thresholded PCA methodology.  相似文献   

16.
The theory together with an algorithm for uncorrelated linear discriminant analysis (ULDA) is introduced and applied to explore metabolomics data. ULDA is a supervised method for feature extraction (FE), discriminant analysis (DA) and biomarker screening based on the Fisher criterion function. While principal component analysis (PCA) searches for directions of maximum variance in the data, ULDA seeks linearly combined variables called uncorrelated discriminant vectors (UDVs). The UDVs maximize the separation among different classes in terms of the Fisher criterion. The performance of ULDA is evaluated and compared with PCA, partial least squares discriminant analysis (PLS-DA) and target projection discriminant analysis (TP-DA) for two datasets, one simulated and one real from a metabolomic study. ULDA showed better discriminatory ability than PCA, PLS-DA and TP-DA. The shortcomings of PCA, PLS-DA and TP-DA are attributed to interference from linear correlations in data. PLS-DA and TP-DA performed successfully for the simulated data, but PLS-DA was slightly inferior to ULDA for the real data. ULDA successfully extracted optimal features for discriminant analysis and revealed potential biomarkers. Furthermore, by means of cross-validation, the classification model obtained by ULDA showed better predictive ability than PCA, PLS-DA and TP-DA. In conclusion, ULDA is a powerful tool for revealing discriminatory information in metabolomics data.  相似文献   

17.
前后向时间序列模型联合估计的时变结构模态参数辨识   总被引:1,自引:0,他引:1  
为提高时变结构模态参数辨识精度和抗噪声能力,提出一种前后向泛函向量时变自回归滑动平均(FS-VTARMA)时间序列模型联合估计的模态参数辨识方法。首先建立前后向FS-VTARMA模型联合估计的均方误差形式的费用函数,其次引入非平稳信号中前向模型和后向模型估计系数的近似共轭关系,再利用两步最小二乘法(2SLS)得到时变模型系数,最后把时变模型特征方程转换为广义特征值问题提取出模态参数。利用时变刚度系统非平稳振动信号验证该方法,结果表明:能有效地克服前向模型估计中模态参数一步延迟以及起始时刻无法准确获得,以及后向模型估计中模态参数一步超前以及终止时刻无法准确获得的缺点,具有更高的模态参数辨识精度和更强的抗噪声能力。  相似文献   

18.
Robust parameter design with computer experiments is becoming increasingly important for product design. Existing methodologies for this problem are mostly for finding optimal control factor settings. However, in some cases, the objective of the experimenter may be to understand how the noise and control factors contribute to variation in the response. The functional analysis of variance (ANOVA) and variance decompositions of the response, in addition to the mean and variance models, help achieve this objective. Estimation of these quantities is not easy and few methods are able to quantity the estimation uncertainty. In this article, we show that the use of an orthonormal polynomial model of the simulator leads to simple formulas for functional ANOVA and variance decompositions, and the mean and variance models. We show that estimation uncertainty can be taken into account in a simple way by first fitting a Gaussian process model to experiment data and then approximating it with the orthonormal polynomial model. This leads to a joint normal distribution for the polynomial coefficients that quantifies estimation uncertainty. Supplementary materials for this article are available online.  相似文献   

19.
In some cases of attribute gauge, there is a continuous variable (reference value) behind the attribute‐type decision. In the recent literature, a fixed effect logit model is used for gauge study. In this paper, the random effect concept is applied to the problem. Two different alternatives are studied, the random intercept and the random intercept–random slope model. The random effect concept enables us to characterise the operators in general and to estimate the conditional probabilities of misclassification. Different estimation methods are proposed and compared through simulation. The theoretically less correct, but computationally much simpler estimation method using a fixed effect model proved to be only a slightly less effective than the estimation using a mixed effect model. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

20.
In this work, a technique for simultaneous untangling and smoothing of meshes is presented. It is based on an extension of an earlier mesh smoothing strategy developed to solve the computational mesh dynamics stage in fluid–structure interaction problems. In moving grid problems, mesh untangling is necessary when element inversion happens as a result of a moving domain boundary. The smoothing strategy, formerly published by the authors, is defined in terms of the minimization of a functional associated with the mesh distortion by using a geometric indicator of the element quality. This functional becomes discontinuous when an element has null volume, making it impossible to obtain a valid mesh from an invalid one. To circumvent this drawback, the functional proposed is transformed in order to guarantee its continuity for the whole space of nodal coordinates, thus achieving the untangling technique. This regularization depends on one parameter, making the recovery of the original functional possible as this parameter tends to 0. This feature is very important: consequently, it is necessary to regularize the functional in order to make the mesh valid; then, it is advisable to use the original functional to make the smoothing optimal. Finally, the simultaneous untangling and smoothing technique is applied to several test cases, including 2D and 3D meshes with simplicial elements. As an additional example, the application of this technique to a mesh generation case is presented. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号