首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
The statistical properties of the k-NN estimators are investigated in a design-based framework, avoiding any assumption about the population under study. The issue of coupling remotely sensed digital imagery with data arising from forest inventories conducted using probabilistic sampling schemes is considered. General results are obtained for the k-NN estimator at the pixel level. When averages (or totals) of forest attributes for the whole study area or sub-areas are of interest, the use of the empirical difference estimator is proposed. The estimator is shown to be approximately unbiased with a variance admitting unbiased or conservative estimators. The performance of the empirical difference estimator is evaluated by an extensive simulation study performed on several populations whose dimensions and covariate values are taken from a real case study. Samples are selected from the populations by means of simple random sampling without replacement. Comparisons with the generalized regression estimator and Horvitz-Thompson estimators are also performed. An application to a local forest inventory on a test area of central Italy is considered.  相似文献   

2.
Recently, least absolute deviation (LAD) estimator for median regression models with doubly censored data was proposed and the asymptotic normality of the estimator was established, and the methods based on bootstrap and random weighting were proposed respectively to approximate the distribution of the LAD estimators. But the calculation of the estimators requires solving a non-convex and non-smooth minimization problem, resulting in high computational costs in implementing the bootstrap or random weighting method directly. In this paper, computationally simple resampling methods are proposed to approximate the distribution of the doubly censored LAD estimators. The objective functions in the resampling stage of the new methods are piece-wise linear and convex, and their minimizer can be obtained by the linear programming in the same way as that for the case of uncensored median regression.  相似文献   

3.
Nadeau  Claude  Bengio  Yoshua 《Machine Learning》2003,52(3):239-281
In order to compare learning algorithms, experimental results reported in the machine learning literature often use statistical tests of significance to support the claim that a new learning algorithm generalizes better. Such tests should take into account the variability due to the choice of training set and not only that due to the test examples, as is often the case. This could lead to gross underestimation of the variance of the cross-validation estimator, and to the wrong conclusion that the new algorithm is significantly better when it is not. We perform a theoretical investigation of the variance of a variant of the cross-validation estimator of the generalization error that takes into account the variability due to the randomness of the training set as well as test examples. Our analysis shows that all the variance estimators that are based only on the results of the cross-validation experiment must be biased. This analysis allows us to propose new estimators of this variance. We show, via simulations, that tests of hypothesis about the generalization error using those new variance estimators have better properties than tests involving variance estimators currently in use and listed in Dietterich (1998). In particular, the new tests have correct size and good power. That is, the new tests do not reject the null hypothesis too often when the hypothesis is true, but they tend to frequently reject the null hypothesis when the latter is false.  相似文献   

4.
Standard errors for bagged and random forest estimators   总被引:1,自引:0,他引:1  
Bagging and random forests are widely used ensemble methods. Each forms an ensemble of models by randomly perturbing the fitting of a base learner. The standard errors estimation of the resultant regression function is considered. Three estimators are discussed. One, based on the jackknife, is applicable to bagged estimators and can be computed using the bagged ensemble. The two other estimators target the bootstrap standard error estimator, and require fitting multiple ensemble estimators, one for each bootstrap sample. It is shown that these bootstrap ensemble sizes can be small, which reduces the computation involved in forming the estimator. The estimators are studied using both simulated and real data.  相似文献   

5.
In this paper, we first discuss the origin, developments and various thoughts by several researchers on the generalized linear regression estimator (GREG) due to Deville and Särndal [Deville, J.C., Särndal, C.E., 1992. Calibration estimators in survey sampling. J. Amer. Statist. Assoc. 87, 376-382]. Then, the problem of estimation of the general parameter of interest considered by Rao [Rao, J.N.K., 1994. Estimating totals and distribution functions using auxiliary information at the estimation stage. J. Official Statist. 10 (2), 153-165], and Singh [Singh, S., 2001. Generalized calibration approach for estimating the variance in survey sampling. Ann. Inst. Statist. Math. 53 (2), 404-417; Singh, S., 2004. Golden and Silver Jubilee Year-2003 of the linear regression estimators. In: Proceedings of the Joint Statistical Meeting, Toronto (Available on the CD), 4382-4380; Singh, S., 2006. Survey statisticians celebrate Golden Jubilee Year-2003 of the linear regression estimator. Metrika 1-18] is further investigated. In addition to that it is shown that the Farrell and Singh [Farrell, P.J., Singh, S., 2005. Model-assisted higher order calibration of estimators of variance. Australian & New Zealand J. Statist. 47 (3), 375-383] estimators are also a special case of the proposed methodology. Interestingly, it has been noted that the single model assisted calibration constraint studied by Farrell and Singh [Farrell, P.J., Singh, S., 2002. Re-calibration of higher order calibration weights. Presented at Statistical Society of Canada conference, Hamilton (Available on CD); Farrell, P.J., Singh, S., 2005. Model-assisted higher order calibration of estimators of variance. Australian & New Zealand J. Statist. 47 (3), 375-383] and Wu [Wu, C., 2003. Optimal calibration estimators in survey sampling. Biometrika 90, 937-951] is not helpful for calibrating the Sen [Sen, A.R., 1953. On the estimate of the variance in sampling with varying probabilities. J. Indian Soc. Agril. Statist. 5, 119-127] and Yates and Grundy [Yates, F., Grundy, P.M., 1953. Selection without replacement from within strata with probability proportional to size. J. Roy. Statist. Soc. Ser. 15, 253-261] estimator of the variance of the linear regression estimator under the optimal designs of Godambe and Joshi [Godambe, V.P., Joshi, V.M., 1965. Admissibility and Bayes estimation in sampling finite populations—I. Ann. Math. Statist. 36, 1707-1722]. Three new estimators of the variance of the proposed linear regression type estimator of the general parameters of interest are introduced and compared with each other. The newly proposed two-dimensional linear regression models are found to be useful, unlike a simulation based on a couple of thousands of random samples, in comparing the estimators of variance. The use of knowledge of the model parameters in assisting the estimators of variance has been found to be beneficial. The most attractive feature is that it has been shown theoretically that the proposed method of calibration always remains more efficient than the GREG estimator.  相似文献   

6.
《国际计算机数学杂志》2012,89(8):1565-1572
Recently, the estimation of a population quantile has received quite attention. Existing quantile estimators generally assume that values of an auxiliary variable are known for the entire population, and most of them are defined under simple random sampling without replacement. Assuming two-phase sampling for stratification with arbitrary sampling designs in each of the two phases, a new quantile estimator and its variance estimator are defined. The proposed estimators can be used when the population auxiliary information is not available, which is a common situation in practice. Desirable properties such as the unbiasedness are derived. Suggested estimators are compared numerically with an alternative stratification estimator and its variance estimator, and desirable results are observed. Confidence intervals based upon the proposed estimators are also defined, and they are compared via simulation studies with the confidence intervals based upon the stratification estimator. The proposed confidence intervals give desirable coverage probabilities with the smallest interval lengths.  相似文献   

7.
A convenient and often used summary measure to quantify the firing variability in neurons is the coefficient of variation (CV), defined as the standard deviation divided by the mean. It is therefore important to find an estimator that gives reliable results from experimental data, that is, the estimator should be unbiased and have low estimation variance. When the CV is evaluated in the standard way (empirical standard deviation of interspike intervals divided by their average), then the estimator is biased, underestimating the true CV, especially if the distribution of the interspike intervals is positively skewed. Moreover, the estimator has a large variance for commonly used distributions. The aim of this letter is to quantify the bias and propose alternative estimation methods. If the distribution is assumed known or can be determined from data, parametric estimators are proposed, which not only remove the bias but also decrease the estimation errors. If no distribution is assumed and the data are very positively skewed, we propose to correct the standard estimator. When defining the corrected estimator, we simply use that it is more stable to work on the log scale for positively skewed distributions. The estimators are evaluated through simulations and applied to experimental data from olfactory receptor neurons in rats.  相似文献   

8.
In random effects meta-analysis, an overall effect is estimated using a weighted mean, with weights based on estimated marginal variances. The variance of the overall effect is often estimated using the inverse of the sum of the estimated weights, and inference about the overall effect is typically conducted using this ‘usual’ variance estimator, which is not robust to errors in the estimated marginal variances. In this paper, robust estimation for the asymptotic variance of a weighted overall effect estimate is explored by considering a robust variance estimator in comparison with the usual variance estimator and another less frequently used estimator, a weighted version of the sample variance. Three illustrative examples are presented to demonstrate and compare the three estimation methods. Furthermore, a simulation study is conducted to assess the robustness of the three variance estimators using estimated weights. The simulation results show that the robust variance estimator and the weighted sample variance estimator both estimate the variance of an overall effect more accurately than the usual variance estimator when the weights are imprecise due to the use of estimated marginal variances, as is typically the case in practice.Therefore, we argue that inference about an overall effect should be based on the robust variance estimator or the weighted sample variance, which provide protection against the practice of using estimated weights in meta-analytical inference.  相似文献   

9.
Matrix models are often used to model the dynamics of age-structured or size-structured populations. The Usher model is an important particular case that relies on the following hypothesis: between time steps t and t+1, individuals either remain in the same class, move up to the following class, or die. There are then two ways of handling data that do not meet this condition: either remove them prior to data analysis or rectify them. These two ways correspond to two estimators of transition parameters. The former, which corresponds to the classical estimator, is obtained from the latter by a data trimming. The two estimators of transition parameters are compared on the basis of their robustness in order to obtain a criterion of choice between the two estimators. The influence curve of both estimators is first computed, then their gross sensitivity and their asymptotic variance. The untrimmed estimator is more robust than the classical one. Its asymptotic variance can be lower or greater than that of the classical estimator depending on the boundaries used for data trimming. The results are applied to a tropical rain forest in French Guiana, with a discussion on the role of the class width.  相似文献   

10.
In this paper, we propose a novel and highly robust estimator, called MDPE1 (Maximum Density Power Estimator). This estimator applies nonparametric density estimation and density gradient estimation techniques in parametric estimation (model fitting). MDPE optimizes an objective function that measures more than just the size of the residuals. Both the density distribution of data points in residual space and the size of the residual corresponding to the local maximum of the density distribution, are considered as important characteristics in our objective function. MDPE can tolerate more than 85% outliers. Compared with several other recently proposed similar estimators, MDPE has a higher robustness to outliers and less error variance.We also present a new range image segmentation algorithm, based on a modified version of the MDPE (Quick-MDPE), and its performance is compared to several other segmentation methods. Segmentation requires more than a simple minded application of an estimator, no matter how good that estimator is: our segmentation algorithm overcomes several difficulties faced with applying a statistical estimator to this task.  相似文献   

11.
Several new estimators of the marginal likelihood for complex non-Gaussian models are developed. These estimators make use of the output of auxiliary mixture sampling for count data and for binary and multinomial data. One of these estimators is based on combining Chib’s estimator with data augmentation as in auxiliary mixture sampling, while the other estimators are importance sampling and bridge sampling based on constructing an unsupervised importance density from the output of auxiliary mixture sampling. These estimators are applied to a logit regression model, to a Poisson regression model, to a binomial model with random intercept, as well as to state space modeling of count data.  相似文献   

12.
In this article, two semiparametric approaches are developed for analyzing randomized response data with missing covariates in logistic regression model. One of the two proposed estimators is an extension of the validation likelihood estimator of Breslow and Cain [Breslow, N.E., and Cain, K.C. 1988. Logistic regression for two-stage case-control data. Biometrika. 75, 11-20]. The other is a joint conditional likelihood estimator based on both validation and non-validation data sets. We present a large sample theory for the proposed estimators. Simulation results show that the joint conditional likelihood estimator is more efficient than the validation likelihood estimator, weighted estimator, complete-case estimator and partial likelihood estimator. We also illustrate the methods using data from a cable TV study.  相似文献   

13.
For a linear multilevel model with 2 levels, with equal numbers of level-1 units per level-2 unit and a random intercept only, different empirical Bayes estimators of the random intercept are examined. Studied are the classical empirical Bayes estimator, the Morris version of the empirical Bayes estimator and Rao's estimator. It is unclear which of these estimators performs best in terms of Bayes risk. Of these three, the Rao estimator is optimal in case the covariance matrix of random coefficients may be negative definite. However, in the multilevel model this matrix is restricted to be positive semi-definite. The Morris version, replaces the weights of the empirical Bayes estimator by unbiased estimates. This correction, however, is based on known level-1 variances, which in many empirical settings are unknown. A fourth estimator is proposed, a variant of Rao's estimator which restricts the estimated covariance matrix of random coefficients to be positive semi-definite. Since there are no closed-form expressions for estimators involved in the empirical Bayes estimators (except for the Rao estimator), Monte Carlo simulations are done to evaluate the performance of these different empirical Bayes estimators. Only for small sample sizes there are clear differences between these estimators. As a consequence, for larger sample sizes the formula for the Bayes risk of the Rao estimator can be used to calculate the Bayes risk for the other estimators proposed.  相似文献   

14.
We address the problem of estimating discrete, continuous, and conditional joint densities online, i.e., the algorithm is only provided the current example and its current estimate for its update. The family of proposed online density estimators, estimation of densities online (EDO), uses classifier chains to model dependencies among features, where each classifier in the chain estimates the probability of one particular feature. Because a single chain may not provide a reliable estimate, we also consider ensembles of classifier chains and ensembles of weighted classifier chains. For all density estimators, we provide consistency proofs and propose algorithms to perform certain inference tasks. The empirical evaluation of the estimators is conducted in several experiments and on datasets of up to several millions of instances. In the discrete case, we compare our estimators to density estimates computed by Bayesian structure learners. In the continuous case, we compare them to a state-of-the-art online density estimator. Our experiments demonstrate that, even though designed to work online, EDO delivers estimators of competitive accuracy compared to other density estimators (batch Bayesian structure learners on discrete datasets and the state-of-the-art online density estimator on continuous datasets). Besides achieving similar performance in these cases, EDO is also able to estimate densities with mixed types of variables, i.e., discrete and continuous random variables.  相似文献   

15.
Nearest neighbors techniques have been shown to be useful for estimating forest attributes, particularly when used with forest inventory and satellite image data. Published reports of positive results have been truly international in scope. However, for these techniques to be more useful, they must be able to contribute to scientific inference which, for sample-based methods, requires estimates of uncertainty in the form of variances or standard errors. Several parametric approaches to estimating uncertainty for nearest neighbors techniques have been proposed, but they are complex and computationally intensive. For this study, two resampling estimators, the bootstrap and the jackknife, were investigated and compared to a parametric estimator for estimating uncertainty using the k-Nearest Neighbors (k-NN) technique with forest inventory and Landsat data from Finland, Italy, and the USA. The technical objectives of the study were threefold: (1) to evaluate the assumptions underlying a parametric approach to estimating k-NN variances; (2) to assess the utility of the bootstrap and jackknife methods with respect to the quality of variance estimates, ease of implementation, and computational intensity; and (3) to investigate adaptation of resampling methods to accommodate cluster sampling. The general conclusions were that support was provided for the assumptions underlying the parametric approach, the parametric and resampling estimators produced comparable variance estimates, care must be taken to ensure that bootstrap resampling mimics the original sampling, and the bootstrap procedure is a viable approach to variance estimation for nearest neighbor techniques that use very small numbers of neighbors to calculate predictions.  相似文献   

16.
A fast algorithm for calculating the simplicial depth of a single parameter vector of a polynomial regression model is derived. Additionally, an algorithm for calculating the parameter vectors with maximum simplicial depth within an affine subspace of the parameter space or a polyhedron is presented. Since the maximum simplicial depth estimator is not unique, l1 and l2 methods are used to make the estimator unique. This estimator is compared with other estimators in examples of linear and quadratic regression. Furthermore, it is shown how the maximum simplicial depth can be used to derive distribution-free asymptotic α-level tests for testing hypotheses in polynomial regression models. The tests are applied on a problem of shape analysis where it is tested how the relative head length of the fish species Lepomis gibbosus depends on the size of these fishes. It is also tested whether the dependency can be described by the same polynomial regression function within different populations.  相似文献   

17.
《Graphical Models》2000,62(1):56-70
In this paper we study random walk estimators for radiosity with generalized absorption probabilities. That is, a path will either die or survive on a patch according to an arbitrary probability. The estimators studied so far, the infinite path length estimator and the finite path length one, can be considered as particular cases. Practical applications of the random walks with generalized probabilities are given. A necessary and sufficient condition for the existence of the variance is given, together with a heuristic to be used in practical cases. The optimal probabilities are also found for the case when we are interested in the whole scene and are equal to the reflectivities.  相似文献   

18.
A critical challenge in multistage process monitoring is the complex relationships between quality characteristics at different stages. A popular method to deal with this problem is regression adjustment in which each quality characteristic is regressed on its preceding quality characteristics and the resulting residual is monitored to detect changes in local variations. However, the performance of this method depends on the accuracy of the regression coefficient estimation. One source of the estimation errors is measurement errors which commonly exist in practice. To provide guidance on the use of regression-adjusted monitoring methods, this study investigates the effect of measurement errors on the bias of regression estimation theoretically and numerically. Two estimators, the ordinary least squares (OLS) estimator and the total least squares (TLS) estimator, are compared, and insights regarding their performance are obtained.  相似文献   

19.
Semiparametric models combining both non-parametric trends and small area random effects are now currently being investigated in small area estimation (SAE). These models can prevent bias when the functional form of the relationship between the response and the covariates is unknown. Furthermore, penalized spline regression can be a good tool to incorporate non-parametric regression models into the SAE techniques, as it can be represented as a mixed effects model. A penalized spline model is considered to analyze trends in small areas and to forecast future values of the response. The prediction mean squared error (MSE) for the fitted and the predicted values, together with estimators for those quantities, are derived. The procedure is illustrated with real data consisting of average prices per squared meter of used dwellings in nine neighborhoods of the city of Vitoria, Spain, during the period 1993-2007. Dwelling prices for the next five years are also forecast. A simulation study is conducted to assess the performance of both the small area trend estimator and the prediction MSE estimators. The results confirm a good behavior of the proposed estimators in terms of bias and variability.  相似文献   

20.
Genetic adaptive state estimation   总被引:1,自引:0,他引:1  
A genetic algorithm (GA) uses the principles of evolution, natural selection, and genetics to offer a method for parallel search of complex spaces. This paper describes a GA that can perform on-line adaptive state estimation for linear and nonlinear systems. First, it shows how to construct a genetic adaptive state estimator where a GA evolves the model in a state estimator in real time so that the state estimation error is driven to zero. Next, several examples are used to illustrate the operation and performance of the genetic adaptive state estimator. Its performance is compared to that of the conventional adaptive Luenberger observer for two linear system examples. Next, a genetic adaptive state estimator is used to predict when surge and stall occur in a nonlinear jet engine. Our main conclusion is that the genetic adaptive state estimator has the potential to offer higher performance estimators for nonlinear systems over current methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号