首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Typically, the fundamental assumption in non-linear regression models is the normality of the errors. Even though this model offers great flexibility for modeling these effects, it suffers from the same lack of robustness against departures from distributional assumptions as other statistical models based on the Gaussian distribution. It is of practical interest, therefore, to study non-linear models which are less sensitive to departures from normality, as well as related assumptions. Thus the current methods proposed for linear regression models need to be extended to non-linear regression models. This paper discusses non-linear regression models for longitudinal data with errors that follow a skew-elliptical distribution. Additionally, we discuss Bayesian statistical methods for the classification of observations into two or more groups based on skew-models for non-linear longitudinal profiles. Parameter estimation for a discriminant model that classifies individuals into distinct predefined groups or populations uses appropriate posterior simulation schemes. The methods are illustrated with data from a study involving 173 pregnant women. The main objective in this study is to predict normal versus abnormal pregnancy outcomes from beta human chorionic gonadotropin data available at early stages of pregnancy.  相似文献   

2.
Bounded-influence estimation is a well developed and useful theory. It provides fairly efficient estimators which are robust to outliers and local model departures. However, its use has been limited thus far, mainly because of computational difficulties. A careful implementation in modern statistical software can effectively overcome the numerical problems of bounded-influence estimators. The proposed approach is based on general methods for solving estimating equations, together with suitable methods developed in the statistical literature, such as the delta algorithm and nested iterations. The focus is on Mallows estimation in generalized linear models and on optimal bias-robust estimation in models for independent data, such as regression models with asymmetrically distributed errors.  相似文献   

3.
《Computers & Geosciences》2006,32(8):1040-1051
Conventional statistical methods are often ineffective to evaluate spatial regression models. One reason is that spatial regression models usually have more parameters or smaller sample sizes than a simple model, so their degree of freedom is reduced. Thus, it is often unlikely to evaluate them based on traditional tests. Another reason, which is theoretically associated with statistical methods, is that statistical criteria are crucially dependent on such assumptions as normality, independence, and homogeneity. This may create problems because the assumptions are open for testing. In view of these problems, this paper proposes an alternative empirical evaluation method. To illustrate the idea, a few hedonic regression models for a house and land price data set are evaluated, including a simple, ordinary linear regression model and three spatial models. Their performance as to how well the price of the house and land can be predicted is examined. With a cross-validation technique, the prices at each sample point are predicted with a model estimated with the samples excluding the one being concerned. Then, empirical criteria are established whereby the predicted prices are compared with the real, observed prices. The proposed method provides an objective guidance for the selection of a suitable model specification for a data set. Moreover, the method is seen as an alternative way to test the significance of the spatial relationships being concerned in spatial regression models.  相似文献   

4.
Application of neural networks for predicting program faults   总被引:1,自引:0,他引:1  
Accurately predicting the number of faults in program modules is a major problem in the quality control of large software development efforts. Some software complexity metrics are closely related to the distribution of faults across program modules. Using these relationships, software engineers develop models that provide early estimates of quality metrics that do not become available until late in the development cycle. By considering these early estimates, software engineers can take actions to avoid or prepare for emerging quality problems. Most often, the predictive models are based upon multiple regression analysis. However, measures of software quality and complexity exhibit systematic departures from the assumptions of these analyses. With extreme violations of these assumptions, multiple regression models become unstable and lose most of their predictive quality. Since neural network models carry no data assumptions, these models could be more appropriate than regression models for modeling software faults. In this paper, we explore a neural network methodology for developing models that predict the number of faults in program modules. We apply this methodology to develop neural network models based upon data collected during the development of two commercial software systems. After developing neural network models, we apply multiple linear regression methods to develop regression models on the same data. For the data sets considered, the neural network methodology produced better predictive models in terms of both quality of fit and predictive quality.  相似文献   

5.
A new method in two variations for the identification of most relevant covariates in linear models with homoscedastic errors is proposed. In contrast to many known selection criteria, the method is based on an interpretable scaled quantity. This quantity measures a maximal relative error one makes by selecting covariates from a given set of all available covariates. The proposed model selection procedures rely on asymptotic normality of test statistics, and therefore normality of the errors in the regression model is not required. In a simulation study the performance of the suggested methods along with the performance of the standard model selection criteria AIC, BIC, Lasso and relaxed Lasso is examined. The simulation study illustrates the favorable performance of the proposed method as compared to the above reference criteria, especially when regression effects possess influence of several orders in magnitude. The accuracy of the normal approximation to the test statistics is also investigated; it has been already satisfactory for sample sizes 50 and 100. As an illustration the US college spending data from 1994 is analyzed.  相似文献   

6.
An extension of the Shapiro-Wilk test to verify the hypothesis of normality in the presence of nuisance regression and scale has been previously considered. Such a test is typically based on the pair of the maximum likelihood and BLUE estimators of the standard deviation in the linear regression model. It has been shown that the asymptotic null distribution of the test criterion, extended to the regression model, is equivalent to that of the original Shapiro-Wilk test for the location-scale model. A simulation study is performed in order to show that both criteria are close under the normality hypothesis for moderate as well for large data sets. The power of the test against various alternative distributions of the model errors is illustrated. Furthermore, it is shown that the probabilities of errors of both the first and second kinds do not depend on the design matrix or on the parameters of the linear model.  相似文献   

7.
IntroductionSeveral statistical methods of assessing seasonal variation are available. Brookhart and Rothman [3] proposed a second-order moment-based estimator based on the geometrical model derived by Edwards [1], and reported that this estimator is superior in estimating the peak-to-trough ratio of seasonal variation compared with Edwards’ estimator with respect to bias and mean squared error. Alternatively, seasonal variation may be modelled using a Poisson regression model, which provides flexibility in modelling the pattern of seasonal variation and adjustments for covariates.MethodBased on a Monte Carlo simulation study three estimators, one based on the geometrical model, and two based on log-linear Poisson regression models, were evaluated in regards to bias and standard deviation (SD). We evaluated the estimators on data simulated according to schemes varying in seasonal variation and presence of a secular trend. All methods and analyses in this paper are available in the R package Peak2Trough [13].ResultsApplying a Poisson regression model resulted in lower absolute bias and SD for data simulated according to the corresponding model assumptions. Poisson regression models had lower bias and SD for data simulated to deviate from the corresponding model assumptions than the geometrical model.ConclusionThis simulation study encourages the use of Poisson regression models in estimating the peak-to-trough ratio of seasonal variation as opposed to the geometrical model.  相似文献   

8.
The objective in the construction of models of software quality is to use measures that may be obtained relatively early in the software development life cycle to provide reasonable initial estimates of the quality of an evolving software system. Measures of software quality and software complexity to be used in this modeling process exhibit systematic departures of the normality assumptions of regression modeling. Two new estimation procedures are introduced, and their performances in the modeling of software quality from software complexity in terms of the predictive quality and the quality of fit are compared with those of the more traditional least squares and least absolute value estimation techniques. The two new estimation techniques did produce regression models with better quality of fit and predictive quality when applied to data obtained from two software development projects  相似文献   

9.
Normality is one of the most common assumptions made in the development of statistical models such as the fixed effect model and the random effect model. White and MacDonald [1980. Some large-sample tests for normality in the linear regression model. JASA 75, 16-18] and Bonett and Woodward [1990. Testing residual normality in the ANOVA model. J. Appl. Statist. 17, 383-387] showed that many tests of normality perform well when applied to the residuals of a fixed effect model. The elements of the error vector are not independent in random effects models and standard tests of normality are not expected to perform properly when applied to the residuals of a random effects model.In this paper, we propose a transformation method to convert the correlated error vector into an uncorrelated vector. Moreover, under the normality assumption, the uncorrelated vector becomes an independent vector. Thus, all the existing methods can then be implemented. Monte-Carlo simulations are used to evaluate the feasibility of the transformation. Results show that this transformation method can preserve the Type I error and provide greater powers under most alternatives.  相似文献   

10.
Inference on the association between a primary endpoint and features of longitudinal profiles of a continuous response is of central interest in medical and public health research. Joint models that represent the association through shared dependence of the primary and longitudinal data on random effects are increasingly popular; however, existing inferential methods may be inefficient or sensitive to assumptions on the random effects distribution. We consider a semiparametric joint model that makes only mild assumptions on this distribution and develop likelihood-based inference on the association and distribution, which offers improved performance relative to existing methods that is insensitive to the true random effects distribution. Moreover, the estimated distribution can reveal interesting population features, as we demonstrate for a study of the association between longitudinal hormone levels and bone status in peri-menopausal women.  相似文献   

11.
Predicting tunnel boring machine (TBM) performance is a crucial issue for the accomplishment of a mechanical tunnel project, excavating via full face tunneling machine. Many models and equations have previously been introduced to estimate TBM performance based on properties of both rock and machine employing various statistical analysis techniques. However, considering the nature of the problem, it is relatively difficult to estimate tunnel boring machine performance by linear prediction models. Artificial neural networks (ANNs) and non-linear multiple regression models have great potential for establishing such prediction models. The purpose of the present study is the construction of non-linear multivariable prediction models to estimate TBM performance as a function of rock properties. For this purpose, rock properties and machine data were collected from recently completed TBM tunnel project in the City of New York, USA and consequently the database was established to develop performance prediction models utilizing the ANN and the non-linear multiple regression methods. This paper presents the results of study into the application of the non-linear prediction approaches providing the acceptable precise performance estimations.  相似文献   

12.
Traditional clustering methods assume that there is no measurement error, or uncertainty, associated with data. Often, however, real world applications require treatment of data that have such errors. In the presence of measurement errors, well-known clustering methods like k-means and hierarchical clustering may not produce satisfactory results.In this article, we develop a statistical model and algorithms for clustering data in the presence of errors. We assume that the errors associated with data follow a multivariate Gaussian distribution and are independent between data points. The model uses the maximum likelihood principle and provides us with a new metric for clustering. This metric is used to develop two algorithms for error-based clustering, hError and kError, that are generalizations of Ward's hierarchical and k-means clustering algorithms, respectively.We discuss types of clustering problems where error information associated with the data to be clustered is readily available and where error-based clustering is likely to be superior to clustering methods that ignore error. We focus on clustering derived data (typically parameter estimates) obtained by fitting statistical models to the observed data. We show that, for Gaussian distributed observed data, the optimal error-based clusters of derived data are the same as the maximum likelihood clusters of the observed data. We also report briefly on two applications with real-world data and a series of simulation studies using four statistical models: (1) sample averaging, (2) multiple linear regression, (3) ARIMA models for time-series, and (4) Markov chains, where error-based clustering performed significantly better than traditional clustering methods.  相似文献   

13.
Traditional clustering methods assume that there is no measurement error, or uncertainty, associated with data. Often, however, real world applications require treatment of data that have such errors. In the presence of measurement errors, well-known clustering methods like k-means and hierarchical clustering may not produce satisfactory results.In this article, we develop a statistical model and algorithms for clustering data in the presence of errors. We assume that the errors associated with data follow a multivariate Gaussian distribution and are independent between data points. The model uses the maximum likelihood principle and provides us with a new metric for clustering. This metric is used to develop two algorithms for error-based clustering, hError and kError, that are generalizations of Ward's hierarchical and k-means clustering algorithms, respectively.We discuss types of clustering problems where error information associated with the data to be clustered is readily available and where error-based clustering is likely to be superior to clustering methods that ignore error. We focus on clustering derived data (typically parameter estimates) obtained by fitting statistical models to the observed data. We show that, for Gaussian distributed observed data, the optimal error-based clusters of derived data are the same as the maximum likelihood clusters of the observed data. We also report briefly on two applications with real-world data and a series of simulation studies using four statistical models: (1) sample averaging, (2) multiple linear regression, (3) ARIMA models for time-series, and (4) Markov chains, where error-based clustering performed significantly better than traditional clustering methods.  相似文献   

14.
This paper presents an assessment of several published statistical regression models that relate software development effort to software size measured in function points. The principal concern with published models has to do with the number of observations upon which the models were based and inattention to the assumptions inherent in regression analysis. The research describes appropriate statistical procedures in the context of a case study based on function point data for 104 software development projects and discusses limitations of the resulting model in estimating development effort. The paper also focuses on a problem with the current method for measuring function points that constrains the effective use of function points in regression models and suggests a modification to the approach that should enhance the accuracy of prediction models based on function points in the future  相似文献   

15.
Remote sensing often involves the estimation of in situ quantities from remote measurements. Linear regression, where there are no non-linear combinations of regressors, is a common approach to this prediction problem in the remote sensing community. A review of recent remote sensing articles using univariate linear regression indicates that in the majority of cases, ordinary least squares (OLS) linear regression has been applied, with approximately half the articles using the in situ observations as regressors and the other half using the inverse regression with remote measurements as regressors. OLS implicitly assume an underlying normal structural data model to arrive at unbiased estimates of the response. OLS regression can be a biased predictor in the presence of measurement errors when the regression problem is based on a functional rather than structural data model. Parametric (Modified Least Squares) and non-parametric (Theil-Sen) consistent predictors are given for linear regression in the presence of measurement errors together with analytical approximations of their prediction confidence intervals. Three case studies involving estimation of leaf area index from nadir reflectance estimates are used to compare these unbiased estimators with OLS linear regression. A comparison to Geometric Mean regression, a standardized version of Reduced Major Axis regression, is also performed. The Theil-Sen approach is suggested as a potential replacement of OLS for linear regression in remote sensing applications. It offers simplicity in computation, analytical estimates of confidence intervals, robustness to outliers, testable assumptions regarding residuals and requires limited a priori information regarding measurement errors.  相似文献   

16.
The problems associated with testing a dynamical model, using a data record of finite length, are insufficiency of the data for statistically meaningful decisions, coupling of mean, covariance, and correlation related errors, difficulty of detecting midcourse model departures, and inadequacy of traditional techniques for computing test power for given model alternatives. This paper attempts to provide a comprehensive analysis of nonstationary models via significance tests, specifically addressing these problems. Data records from single and from multiple system operations are analyzed, and the models considered are possibly varying both with respect to time and with respect to operations. Quadratic form distributions prove effective in the statistical analysis.  相似文献   

17.
综述了合成孔径雷达(SAR)图像统计建模研究中的概率分布模型,按照模型起源将所有模型分为了先验假设统计模型和经验分布模型两大类。介绍了一种模型参数估计的新方法——对数累量法,并根据对数累量法计算了统计模型参数估计的对数累量表达式;发展了统计模型建模精度的评估准则,并应用到了SAR图像数据统计建模实验中。通过实验结果得出了每一种统计模型适宜建模的SAR图像地表类型。  相似文献   

18.
This paper studies the asymptotic properties (strong consistency, convergence rate, asymptotic normality) of a generalized weighted nonlinear least-squares estimator under weak noise assumptions. Both deterministic and stochastic weighting are handled and the presence of model errors is considered. For particular models, estimators, and noise assumptions the general framework boils down to known time and frequency-domain estimators  相似文献   

19.
In this paper radial basis function (RBF) networks are used to model general non-linear discrete-time systems. In particular, reciprocal multiquadric functions are used as activation functions for the RBF networks. A stepwise regression algorithm based on orthogonalization and a series of statistical tests is employed for designing and training of the network. The identification method yields non-linear models, which are stable and linear in the model parameters. The advantages of the proposed method compared to other radial basis function methods and backpropagation neural networks are described. Finally, the effectiveness of the identification method is demonstrated by the identification of two non-linear chemical processes, a simulated continuous stirred tank reactor and an experimental pH neutralization process.  相似文献   

20.
The zero-inflated negative binomial model is used to account for overdispersion detected in data that are initially analyzed under the zero-inflated Poisson model. A frequentist analysis, a jackknife estimator and a non-parametric bootstrap for parameter estimation of zero-inflated negative binomial regression models are considered. In addition, an EM-type algorithm is developed for performing maximum likelihood estimation. Then, the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and some ways to perform global influence analysis are derived. In order to study departures from the error assumption as well as the presence of outliers, residual analysis based on the standardized Pearson residuals is discussed. The relevance of the approach is illustrated with a real data set, where it is shown that zero-inflated negative binomial regression models seems to fit the data better than the Poisson counterpart.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号