首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Multicollinearity can seriously affect least-squares parameter estimates. Many methods have been suggested to determine those parameters most involved. This paper, beginning with the contributions of Belsley, Kuh, and Welsch (1980) and Belsley (1991), forges a new direction. A decomposition of the variable space allows the near dependencies to be isolated in one sub-space. And this, in turn, allows a corresponding decomposition of the main statistics, as well as a new one proposed here, to provide better information on the structure of the collinear relations.  相似文献   

2.
Stochastic adaptive estimation and control algorithms involving recursive prediction estimates have guaranteed convergence rates when the noise is not ‘too’ coloured, as when a positive-real condition on the noise mode is satisfied. Moreover, the whiter the noise environment the more robust are the algorithms. This paper shows that for linear regression signal models, the suitable introduction of while noise into the estimation algorithm can make it more robust without compromising on convergence rates. Indeed, there are guaranteed attractive convergence rates independent of the process noise colour. No positive-real condition is imposed on the noise model.  相似文献   

3.
Constrained linear regression models for symbolic interval-valued variables   总被引:3,自引:0,他引:3  
This paper introduces an approach to fitting a constrained linear regression model to interval-valued data. Each example of the learning set is described by a feature vector for which each feature value is an interval. The new approach fits a constrained linear regression model on the midpoints and range of the interval values assumed by the variables in the learning set. The prediction of the lower and upper boundaries of the interval value of the dependent variable is accomplished from its midpoint and range, which are estimated from the fitted linear regression models applied to the midpoint and range of each interval value of the independent variables. This new method shows the importance of range information in prediction performance as well as the use of inequality constraints to ensure mathematical coherence between the predicted values of the lower () and upper () boundaries of the interval. The authors also propose an expression for the goodness-of-fit measure denominated determination coefficient. The assessment of the proposed prediction method is based on the estimation of the average behavior of the root-mean-square error and square of the correlation coefficient in the framework of a Monte Carlo experiment with different data set configurations. Among other aspects, the synthetic data sets take into account the dependence, or lack thereof, between the midpoint and range of the intervals. The bias produced by the use of inequality constraints over the vector of parameters is also examined in terms of the mean-square error of the parameter estimates. Finally, the approaches proposed in this paper are applied to a real data set and performances are compared.  相似文献   

4.
Ribeiro F  Opper M 《Neural computation》2011,23(4):1047-1069
We discuss the expectation propagation (EP) algorithm for approximate Bayesian inference using a factorizing posterior approximation. For neural network models, we use a central limit theorem argument to make EP tractable when the number of parameters is large. For two types of models, we show that EP can achieve optimal generalization performance when data are drawn from a simple distribution.  相似文献   

5.
Searching for an effective dimension reduction space is an important problem in regression, especially for high-dimensional data such as microarray data. A major characteristic of microarray data consists in the small number of observations n and a very large number of genes p. This “large p, small n” paradigm makes the discriminant analysis for classification difficult. In order to offset this dimensionality problem a solution consists in reducing the dimension. Supervised classification is understood as a regression problem with a small number of observations and a large number of covariates. A new approach for dimension reduction is proposed. This is based on a semi-parametric approach which uses local likelihood estimates for single-index generalized linear models. The asymptotic properties of this procedure are considered and its asymptotic performances are illustrated by simulations. Applications of this method when applied to binary and multiclass classification of the three real data sets Colon, Leukemia and SRBCT are presented.  相似文献   

6.
A robust likelihood approach is proposed for inference about regression parameters in partially-linear models. More specifically, normality is adopted as the working model and is properly corrected to accomplish the objective. Knowledge about the true underlying random mechanism is not required for the proposed method. Simulations and illustrative examples demonstrate the usefulness of the proposed robust likelihood method, even in irregular situations caused by the components of the nonparametric smooth function in partially-linear models.  相似文献   

7.
A regression model whose regression function is the sum of a linear and a nonparametric component is presented. The design is random and the response and explanatory variables satisfy mixing conditions. A new local polynomial type estimator for the nonparametric component of the model is proposed and its asymptotic normality is obtained. Specifically, this estimator works on a prewhitening transformation of the dependent variable, and the results show that it is asymptotically more efficient than the conventional estimator (which works on the original dependent variable) when the errors of the model are autocorrelated. A simulation study and an application to a real data set give promising results.  相似文献   

8.
We propose a Bayesian framework for regression problems, which covers areas usually dealt with by function approximation. An online learning algorithm is derived which solves regression problems with a Kalman filter. Its solution always improves with increasing model complexity, without the risk of over-fitting. In the infinite dimension limit it approaches the true Bayesian posterior. The issues of prior selection and over-fitting are also discussed, showing that some of the commonly held beliefs are misleading. The practical implementation is summarised. Simulations using 13 popular publicly available data sets are used to demonstrate the method and highlight important issues concerning the choice of priors.  相似文献   

9.
Consider the semi-parametric linear regression model Y=βX+ε, where ε has an unknown distribution F0. The semi-parametric MLE of β under this set-up is called the generalized semi-parametric MLE(GSMLE). Although the GSML estimation of the linear regression model is statistically appealing, it has never been attempted due to difficulties with obtaining the GSML estimates of β and F until recent work on linear regression for complete data and for right-censored data by Yu and Wong [2003a. Asymptotic properties of the generalized semi-parametric MLE in linear regression. Statistica Sinica 13, 311-326; 2003b. Semi-parametric MLE in simple linear regression analysis with interval-censored data. Commun. Statist.—Simulation Comput. 32, 147-164; 2003c. The semi-parametric MLE in linear regression with right censored data. J. Statist. Comput. Simul. 73, 833-848]. However, after obtaining all candidates, their algorithm simply does an exhaustive search to find the GSML estimators. In this paper, it is shown that Yu and Wong's algorithm leads to the so-called dimension disaster. Based on their idea, a simulated annealing algorithm for finding semi-parametric MLE is proposed along with techniques to reduce computations. Experimental results show that the new algorithm runs much faster for multiple linear regression models while keeping the nice features of Yu and Wong's original one.  相似文献   

10.
This article addresses some problems in outlier detection and variable selection in linear regression models. First, in outlier detection there are problems known as smearing and masking. Smearing means that one outlier makes another, non-outlier observation appear as an outlier, and masking that one outlier prevents another one from being detected. Detecting outliers one by one may therefore give misleading results. In this article a genetic algorithm is presented which considers different possible groupings of the data into outlier and non-outlier observations. In this way all outliers are detected at the same time. Second, it is known that outlier detection and variable selection can influence each other, and that different results may be obtained, depending on the order in which these two tasks are performed. It may therefore be useful to consider these tasks simultaneously, and a genetic algorithm for a simultaneous outlier detection and variable selection is suggested. Two real data sets are used to illustrate the algorithms, which are shown to work well. In addition, the scalability of the algorithms is considered with an experiment using generated data.I would like to thank Dr Tero Aittokallio and an anonymous referee for useful comments.  相似文献   

11.
A new concept and method of imposing imprecise (fuzzy) input and output data upon the conventional linear regression model is proposed. Under the considerations of fuzzy parameters and fuzzy arithmetic operations (fuzzy addition and multiplication), we propose a fuzzy linear regression model which has the similar form as that of conventional one. We conduct the h-level (conventional) linear regression models of fuzzy linear regression model for the sake of invoking the statistical techniques in (conventional) linear regression analysis for real-valued data. In order to determine the sign (nonnegativity or nonpositivity) of fuzzy parameters, we perform the statistical testing hypotheses and evaluate the confidence intervals. Using the least squares estimators obtained from the h-level linear regression models, we can construct the membership functions of fuzzy least squares estimators via the form of “Resolution Identity” which is well-known in fuzzy sets theory. In order to obtain the membership degree of any given estimate taken from the fuzzy least squares estimator, optimization problems have to be solved. We also provide two computational procedures to deal with those optimization problems.  相似文献   

12.
Partial F tests play a central role in model selections in multiple linear regression models. This paper studies the partial F tests from the view point of simultaneous confidence bands. It first shows that there is a simultaneous confidence band associated naturally with a partial F test. This confidence band provides more information than the partial F test and the partial F test can be regarded as a side product of the confidence band. This view point of confidence bands also leads to insights of the major weakness of the partial F tests, that is, a partial F test requires implicitly that the linear regression model holds over the entire range of the covariates in concern. Improved tests are proposed and they are induced by simultaneous confidence bands over restricted regions of the covariates. Power comparisons between the partial F tests and the new tests have been carried out to assess when the new tests are more or less powerful than the partial F tests. Computer programmes have been developed for easy implements of these new confidence band based inferential methods. An illustrative example is provided.  相似文献   

13.
The Cox proportional hazards (PH) model usually assumes linearity of the covariates on the log hazard function, which may be violated because linearity cannot always be guaranteed. We propose a partially linear single-index proportional hazards regression model, which can model both linear and nonlinear covariate effects on the log hazard in the proportional hazards model. We adopt a polynomial spline smoothing technique to model the structured nonparametric single-index component for the nonlinear covariate effects. This method can reduce the dimensionality of the covariates being modeled, while, at the same time, can provide efficient estimates of the covariate effects. A two-step iterative algorithm to estimate the nonparametric component and the covariate effects is used for facilitating implementation. Asymptotic properties of the estimators are derived. Monte Carlo simulation studies are presented to compare the new method with the standard Cox linear PH model and some other comparable models. A case study with clinical trial data is presented for illustration.  相似文献   

14.
In this paper, we present a new nonparametric calibration method called ensemble of near-isotonic regression (ENIR). The method can be considered as an extension of BBQ (Naeini et al., in: Proceedings of twenty-ninth AAAI conference on artificial intelligence, 2015b), a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression (IsoRegC) (Zadrozny and Elkan, in: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining 2002). ENIR is designed to address the key limitation of IsoRegC which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus, it can be used with many existing classification models to generate accurate probabilistic predictions. We demonstrate the performance of ENIR on synthetic and real datasets for commonly applied binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular, on the real data, we evaluated ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large-scale datasets, as it is \(O(N \log N)\) time, where N is the number of samples.  相似文献   

15.
The Cox proportional hazards (PH) model usually assumes linearity of the covariates on the log hazard function, which may be violated because linearity cannot always be guaranteed. We propose a partially linear single-index proportional hazards regression model, which can model both linear and nonlinear covariate effects on the log hazard in the proportional hazards model. We adopt a polynomial spline smoothing technique to model the structured nonparametric single-index component for the nonlinear covariate effects. This method can reduce the dimensionality of the covariates being modeled, while, at the same time, can provide efficient estimates of the covariate effects. A two-step iterative algorithm to estimate the nonparametric component and the covariate effects is used for facilitating implementation. Asymptotic properties of the estimators are derived. Monte Carlo simulation studies are presented to compare the new method with the standard Cox linear PH model and some other comparable models. A case study with clinical trial data is presented for illustration.  相似文献   

16.
Bayesian approach has become a commonly used method for inverse problems arising in signal and image processing. One of the main advantages of the Bayesian approach is the possibility to propose unsupervised methods where the likelihood and prior model parameters can be estimated jointly with the main unknowns. In this paper, we propose to consider linear inverse problems in which the noise may be non-stationary and where we are looking for a sparse solution. To consider both of these requirements, we propose to use Student-t prior model both for the noise of the forward model and the unknown signal or image. The main interest of the Student-t prior model is its Infinite Gaussian Scale Mixture (IGSM) property. Using the resulted hierarchical prior models we obtain a joint posterior probability distribution of the unknowns of interest (input signal or image) and their associated hidden variables. To be able to propose practical methods, we use either a Joint Maximum A Posteriori (JMAP) estimator or an appropriate Variational Bayesian Approximation (VBA) technique to compute the Posterior Mean (PM) values. The proposed method is applied in many inverse problems such as deconvolution, image restoration and computed tomography. In this paper, we show only some results in signal deconvolution and in periodic components estimation of some biological signals related to circadian clock dynamics for cancer studies.  相似文献   

17.
Testing methods are introduced in order to determine whether there is some ‘linear’ relationship between imprecise predictor and response variables in a regression analysis. The variables are assumed to be interval-valued. Within this context, the variables are formalized as compact convex random sets, and an interval arithmetic-based linear model is considered. Then, a suitable equivalence for the hypothesis of linear independence in this model is obtained in terms of the mid-spread representations of the interval-valued variables. That is, in terms of some moments of random variables. Methods are constructed to test this equivalent hypothesis; in particular, the one based on bootstrap techniques will be applicable in a wide setting. The methodology is illustrated by means of a real-life example, and some simulation studies are considered to compare techniques in this framework.  相似文献   

18.
《Knowledge》2002,15(3):169-175
Clustered linear regression (CLR) is a new machine learning algorithm that improves the accuracy of classical linear regression by partitioning training space into subspaces. CLR makes some assumptions about the domain and the data set. Firstly, target value is assumed to be a function of feature values. Second assumption is that there are some linear approximations for this function in each subspace. Finally, there are enough training instances to determine subspaces and their linear approximations successfully. Tests indicate that if these approximations hold, CLR outperforms all other well-known machine-learning algorithms. Partitioning may continue until linear approximation fits all the instances in the training set — that generally occurs when the number of instances in the subspace is less than or equal to the number of features plus one. In other case, each new subspace will have a better fitting linear approximation. However, this will cause over fitting and gives less accurate results for the test instances. The stopping situation can be determined as no significant decrease or an increase in relative error. CLR uses a small portion of the training instances to determine the number of subspaces. The necessity of high number of training instances makes this algorithm suitable for data mining applications.  相似文献   

19.
Klopfenstein  Quentin  Vaiter  Samuel 《Machine Learning》2021,110(7):1939-1974
Machine Learning - This paper studies the addition of linear constraints to the Support Vector Regression when the kernel is linear. Adding those constraints into the problem allows to add prior...  相似文献   

20.
The development of flexible parametric classes of probability models in Bayesian analysis is a very popular approach. This study is designed for heterogeneous population for a two-component mixture of the Laplace probability distribution. When a process initially starts, the researcher expects that the failure components will be very high but after some improvement/inspection it is assumed that the failure components will decrease sufficiently. That is why in such situation the Laplace model is more suitable as compared to the normal distribution due to its fatter tails behaviour. We considered the derivation of the posterior distribution for censored data assuming different conjugate informative priors. Various kinds of loss functions are used to derive these Bayes estimators and their posterior risks. A method of elicitation of hyperparameter is discussed based on a prior predictive approach. The results are also compared with the non-informative priors. To examine the performance of these estimators we have evaluated their properties for different sample sizes, censoring rates and proportions of the component of the mixture through the simulation study. To highlight the practical significance we have included an illustrative application example based on real-life mixture data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号