首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A regression model whose regression function is the sum of a linear and a nonparametric component is presented. The design is random and the response and explanatory variables satisfy mixing conditions. A new local polynomial type estimator for the nonparametric component of the model is proposed and its asymptotic normality is obtained. Specifically, this estimator works on a prewhitening transformation of the dependent variable, and the results show that it is asymptotically more efficient than the conventional estimator (which works on the original dependent variable) when the errors of the model are autocorrelated. A simulation study and an application to a real data set give promising results.  相似文献   

2.
在非参数统计中,局部多项式回归是重要的工具,然而以往研究的算法基本都是非递推的.本文研究递推的局部线性回归估计及其应用.首先推导出递推算法,给出了回归函数及其导函数的非参数估计.在一定的条件下,证明了算法的强一致性.并且通过仿真例子研究了它在非线性条件异方差模型的回归函数估计和非线性ARX(nonlinear autoregressive system with exogenous inputs,NARX)系统辨识中的应用.  相似文献   

3.
The purpose of this paper is to propose a nonparametric circular–linear multivariate regression model using a kernel-weighted local linear method. The case of several linear regressors and one circular regressor is considered. We extend results on the asymptotic bias and variance of the linear multivariate variable to the case of circular–linear multivariate variable. The rule-of-thumb selector is used to establish the optimal bandwidths for the nonparametric model. The suitability of the model is judged from the coefficient of determination. One simulation experiment and one real problem concerning wind energy are used to study the power performance of the nonparametric model.  相似文献   

4.
Quantile regression has emerged as one of the standard tools for regression analysis that enables a proper assessment of the complete conditional distribution of responses even in the presence of heteroscedastic errors. Quantile regression estimates are obtained by minimising an asymmetrically weighted sum of absolute deviations from the regression line, a decision theoretic formulation of the estimation problem that avoids a full specification of the error term distribution. Recent advances in mean regression have concentrated on making the regression structure more flexible by including nonlinear effects of continuous covariates, random effects or spatial effects. These extensions often rely on penalised least squares or penalised likelihood estimation with quadratic penalties and may therefore be difficult to combine with the linear programming approaches often considered in quantile regression. As a consequence, geoadditive expectile regression based on minimising an asymmetrically weighted sum of squared residuals is introduced. Different estimation procedures are presented including least asymmetrically weighted squares, boosting and restricted expectile regression. The properties of these procedures are investigated in a simulation study and an analysis on rental fees in Munich is provided where the geoadditive specification allows for an analysis of nonlinear effects of the size of flats or the year of construction and the spatial distribution of rents simultaneously.  相似文献   

5.
Variable selection for Poisson regression when the response variable is potentially underreported is considered. A logistic regression model is used to model the latent underreporting probabilities. An efficient MCMC sampling scheme is designed, incorporating uncertainty about which explanatory variables affect the dependent variable and which affect the underreporting probabilities. Validation data is required in order to identify and estimate all parameters. A simulation study illustrates favorable results both in terms of variable selection and parameter estimation. Finally, the procedure is applied to a real data example concerning deaths from cervical cancer.  相似文献   

6.
基于数据驱动的故障诊断方法近些年来得到广泛的研究和应用,但这些方法主要针对于故障检测,对于故障根源的定位尚未得到充分解决。本文提出一种基于主成分分析(PCA)和随机森林回归(PFR)的因果分析故障定位方法(PCA-PFR)。该方法通过将离线故障数据段中的变量作为输入,与之对应的统计量作为输出建立随机森林回归模型,然后通过模型的变量重要性度量来得到过程变量对统计量的因果关系系数,其中值越大的变量被认为越有可能是引起故障发生的故障变量。最后通过一个数值案例和TE过程仿真实验,表明该方法的有效性。  相似文献   

7.
The problem of building a regression tree is considered when the response variable is a probability density function. Splitting criteria which are well adapted to measure the dissimilarity between densities are proposed using the Csiszár's f-divergence. The comparison between performances of trees constructed with various criteria is tackled through numerical simulations. Afterwards, a tree is constructed to predict the size distribution of a zooplankton community using a set of explanatory environmental variables. Functional PCA is used in order to interpret the main modes of variation of the size spectra around the predicted density in each terminal node. Finally, a bagging procedure is used to increase the accuracy of the tree-based model.  相似文献   

8.
This study focuses on the one of the most critical issues of modeling under severe conditions of uncertainty: determining the relative importance (weight) of the explanatory variables. The ability to determine relative importance of explanatory variables and the reliability of such outcome are of utmost importance to the decision makers, who utilize such models as components of decision support or decision making. We compare the reliability of traditional method multiple linear regression versus fuzzy logic‐based soft regression. We provide a case study (cross‐national model of background factors facilitating economic growth) to illustrate the performance of both methods. We conclude that soft regression is definitely more reliable and consistent tool to determine relative importance of explanatory variables.  相似文献   

9.
The goal of this paper is to handle the large variation issues in fuzzy data by constructing a variable spread multivariate adaptive regression splines (MARS) fuzzy regression model with crisp parameters estimation and fuzzy error terms. It deals with imprecise measurement of response variable and crisp measurement of explanatory variables. The proposed method is a two-phase procedure which applies the MARS technique at phase one and an optimization problem at phase two to estimate the center and fuzziness of the response variable. The proposed method, therefore, handles two problems simultaneously: the problem of large variation issue and the problem of variation spreads in fuzzy observations. A realistic application of the proposed method is also presented, by which the suspended load is modeled using discharge in a hydrology engineering problem. Empirical results demonstrate that the proposed approach is more efficient and more realistic than some well-known least-squares fuzzy regression models.  相似文献   

10.
The problem of automatic bandwidth selection in nonparametric regression is considered when a local linear estimator is used to derive nonparametrically the unknown regression function. A plug-in method for choosing the smoothing parameter based on the use of the neural networks is presented. The method applies to dependent data generating processes with nonlinear autoregressive time series representation. The consistency of the method is shown in the paper, and a simulation study is carried out to assess the empirical performance of the procedure.  相似文献   

11.
In this paper we introduce a class of fuzzy clusterwise regression models with LR fuzzy response variable and numeric explanatory variables, which embodies fuzzy clustering, into a fuzzy regression framework. The model bypasses the heterogeneity problem that could arise in fuzzy regression by subdividing the dataset into homogeneous clusters and performing separate fuzzy regression on each cluster. The integration of the clustering model into the regression framework allows us to simultaneously estimate the regression parameters and the membership degree of each observation to each cluster by optimizing a single objective function. The class of models proposed here includes, as special cases, the fuzzy clusterwise linear regression model and the fuzzy clusterwise polynomial regression model. We also introduce a set of goodness of fit indices to evaluate the fit of the regression model within each cluster as well as in the whole dataset. Finally, we consider some cluster validity criteria that are useful in identifying the “optimal” number of clusters. Several applications are provided in order to illustrate the approach.  相似文献   

12.
In physiological system modelling for control or decision support, model validation is a critical element. A nonparametric approach for assessing the validity of deterministic dynamic models against empirical data is developed, based on kernel regression and kernel density estimation, yielding visual graphical assessment tools as well as numerical metrics of compatibility between the model and the data. Nonparametric regression has been suggested for assessing a parametric statistical model by constructing a confidence band for the proposed model and then checking whether the nonparametric regression curve lies within the band. However, for deterministic models, there is no confidence band that can be constructed. A reversal of roles is therefore suggested--construct a probability band for the nonparametric regression curve and check whether the proposed model lies within the band. This approach extends the utility of nonparametric regression for model assessment to deterministic models. Weighted kernel density estimation is incorporated to derive a density profile for the regression curve, creating a local graphical validation tool. In addition, the density profile is used to define and compute two numerical measures--average normalized density (AND) and relative average normalized density (RAND), representing global statistical validity measures. These tools are demonstrated using a biomedical system model for agitation-sedation and sedation management control.  相似文献   

13.
Simultaneous confidence intervals are used in Scheffé (1953) to assess any contrasts of several normal means. In this paper, the problem of assessing any contrasts of several simple linear regression models by using simultaneous confidence bands is considered. Using numerical integration, Spurrier (1999) constructed exact simultaneous confidence bands for all the contrasts of several regression lines over the whole range (−,) of the explanatory variable when the design matrices of the regression lines are all equal. In this paper, a simulation-based method is proposed to construct simultaneous confidence bands for all the contrasts of the regression lines when the explanatory variable is restricted to an interval and the design matrices of the regression lines may be different. The critical value calculated by this method can be as close to the exact critical value as required if the number of replications in the simulation is chosen sufficiently large. The methodology is illustrated with a real problem in which sizes of the left atrium of infants in three diagnostic groups (severely impaired, mildly impaired and normal) are compared.  相似文献   

14.
车辆跟驰投影寻踪回归模型   总被引:1,自引:0,他引:1       下载免费PDF全文
车辆跟驰模型是微观交通仿真的一个基本模型,基于非参数回归算法的跟车模型较好地解决了以往模型存在的典型问题,但随着样本维数增加,容易出现“维数祸根”现象。提出一种基于投影寻踪回归(PPR)技术的车辆跟驰模型,解决了“维数祸根”和高维数据间的非正态、非线性问题。PPR建模不需要对数据结构作任何假定,而只通过直接审视和分析数据进行建模,因此,该方法能充分地发掘数据中存在的信息,建立的模型符合客观实际,精度较高。经过实测数据验证,该算法用于车辆跟驰模型的研究是可行的。  相似文献   

15.
Fuzzy regression models have been applied to operational research (OR) applications such as forecasting. Some of previous studies on fuzzy regression analysis obtain crisp regression coefficients for eliminating the problem of increasing spreads for the estimated fuzzy responses as the magnitude of the independent variable increases; however, they still cannot cope with the situation of decreasing or variable spreads. This paper proposes a three-phase method to construct the fuzzy regression model with variable spreads to resolve this problem. In the first phase, on the basis of the extension principle, the membership functions of the least-squares estimates of regression coefficients are constructed to conserve completely the fuzziness of observations. In the second phase, then they are defuzzified by the center of gravity method to obtain crisp regression coefficients. In the third phase, the error terms of the proposed model are determined by setting each estimated spread equals its corresponding observed spread. Furthermore, the Mamdani fuzzy inference system is adopted for improving the accuracy of its forecasts. Compared to the previous studies, the results from five examples and an application example of Japanese house prices show that the proposed fuzzy linear regression model has higher explanatory power and forecasting performance.  相似文献   

16.
In this paper, a class of denoised nonlinear regression estimators is suggested for a nonlinear measurement error model where the variables in error are observed together with an auxiliary variable. The programming involved in this denoised nonlinear regression estimation is relatively simple and it can be modified with a little effort from the existing programs for nonlinear regression estimation. We establish the consistency and asymptotic normality of such denoised estimators based on the least squares and M-methods. A simulation study is carried out to illustrate the performance of these estimates. An empirical application of the model to production models in economics further demonstrates the potential of the proposed modeling procedures.  相似文献   

17.
In this study, it is aimed that comparing logistic regression model with classification tree method in determining social-demographic risk factors which have effected depression status of 1447 women in separate postpartum periods. In determination of risk factors, data obtained from prevalence study of postpartum depression were used. Cut-off value of postpartum depression scores that calculated was taken as 13. Social and demographic risk factors were brought up by helping of the classification tree and logistic regression model. According to optimal classification tree total of six risk factors were determined, but in logistic regression model 3 of their effect were found significantly. In addition, during the relations among risk factors in tree structure were being evaluated, in logistic regression model corrected main effects belong to risk factors were calculated. In spite of, classification success of maximal tree was found better than both optimal tree and logistic regression model, it is seen that using this tree structure in practice is very difficult. But we say that the logistic regression model and optimal tree had the lower sensitivity, possibly due to the fact that numbers of the individuals in both two groups were not equal and clinical risk factors were not considered in this study. Classification tree method gives more information with detail on diagnosis by evaluating a lot of risk factors together than logistic regression model. But making correct selection through constructed tree structures is very important to increase the success of results and to reach information which can provide appropriate explanations.  相似文献   

18.
受特征重要性不平衡的影响,随机森林可能随机抽取到弱特征子集,从而生成“弱决策树”,进而导致模型的收敛速度降低、模型的性能下降。鉴于此,提出融合因子分析的随机森林模型,主要创新在于采用因子分析法构建特征组,再按特征个数比随机抽取特征形成每个分裂节点的候选子集。以模型的分类预测、回归拟合、特征重要性分析的准确率和运行时间为评价指标,选取了9组UCI数据综合考察模型的整体性能,并与决策树、随机森林对比实验。结果表明:融合因子分析的随机森林模型基本消除了准确率低的决策树产生,提高了模型的准确率和收敛速度,泛化性更强,更加有利于高维大数据,可行有效。  相似文献   

19.
In this paper a robust linear regression method with variable selection is proposed for predicting desirable end-of-line quality variables in complex industrial processes. The development of such prediction models is challenging because there is usually a large pool of candidate explanatory variables, limited sample data, and multicollinearity among explanatory variables. The proposed method is named as the enumerative partial least square based nonnegative garrote regression. It employs partial least square regression in enumerative manner to generate initial model coefficients and then uses a nonnegative garrote method to shrink original coefficients so that irrelevant variables can be eliminated implicitly. Analysis about the advantages of the proposed method is provided compared to existing state-of-art model construction methods. Two simulation examples as well as an industrial application in a local semiconductor factory unit are used to validate the proposed method. These examples witness substantial improvement in terms of accuracy and robustness in variable selection compared to existing methods. Specifically, for the industrial case the percentages of improvement in terms of root mean squared error is up to 24.3% compared with the previous work.  相似文献   

20.
Susceptibility or hazard models are often established by means of logistic regression techniques in order to describe the effect of a group of explanatory variables on the probability of a dichotomous or binary response. Since the available variables do not always meet the assumptions of logit-linearity of the logistic regression, a modified approach is proposed. Firstly a favorability function associated with each explanatory variable based on the conditional probability measures is introduced. Next, a simple transformation based on the empirical probability function for non-continuous variables is suggested, while nonparametric kernel estimation is considered for continuous ones. The favorability-based transformations lead to new explanatory variables for the logistic regression model. The performance of the method is evaluated using simulated data. In addition, a real case-study is presented, in which a GIS-based landslides susceptibility model is carried out.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号