首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Locally adaptive density estimation presents challenges for parametric or non-parametric estimators. Several useful properties of tessellation density estimators (TDEs), such as low bias, scale invariance and sensitivity to local data morphology, make them an attractive alternative to standard kernel techniques. However, simple TDEs are discontinuous and produce highly unstable estimates due to their susceptibility to sampling noise. With the motivation of addressing these concerns, we propose applying TDEs within a bootstrap aggregation algorithm, and incorporating model selection with complexity penalization. We implement complexity reduction of the TDE via sub-sampling, and use information-theoretic criteria for model selection, which leads to an automatic and approximately ideal bias/variance compromise. The procedure yields a stabilized estimator that automatically adapts to the complexity of the generating distribution and the quantity of information at hand, and retains the highly desirable properties of the TDE. Simulation studies presented suggest a high degree of stability and sensitivity can be obtained using this approach.  相似文献   

2.
3.
The principal response curve (PRC) model is of use to analyse multivariate data resulting from experiments involving repeated sampling in time. The time-dependent treatment effects are represented by PRCs, which are functional in nature. The sample PRCs can be estimated using a raw approach, or the newly proposed smooth approach. The generalisability of the sample PRCs can be judged using confidence bands. The quality of various bootstrap strategies to estimate such confidence bands for PRCs is evaluated. The best coverage was obtained with BCa intervals using a non-parametric bootstrap. The coverage appeared to be generally good, except for the case of exactly zero population PRCs for all conditions. Then, the behaviour is irregular, which is caused by the sign indeterminacy of the PRCs. The insights obtained into the optimal bootstrap strategy are useful to apply in the PRC model, and more generally for estimating confidence intervals in singular value decomposition based methods.  相似文献   

4.
Nearest neighbors techniques have been shown to be useful for estimating forest attributes, particularly when used with forest inventory and satellite image data. Published reports of positive results have been truly international in scope. However, for these techniques to be more useful, they must be able to contribute to scientific inference which, for sample-based methods, requires estimates of uncertainty in the form of variances or standard errors. Several parametric approaches to estimating uncertainty for nearest neighbors techniques have been proposed, but they are complex and computationally intensive. For this study, two resampling estimators, the bootstrap and the jackknife, were investigated and compared to a parametric estimator for estimating uncertainty using the k-Nearest Neighbors (k-NN) technique with forest inventory and Landsat data from Finland, Italy, and the USA. The technical objectives of the study were threefold: (1) to evaluate the assumptions underlying a parametric approach to estimating k-NN variances; (2) to assess the utility of the bootstrap and jackknife methods with respect to the quality of variance estimates, ease of implementation, and computational intensity; and (3) to investigate adaptation of resampling methods to accommodate cluster sampling. The general conclusions were that support was provided for the assumptions underlying the parametric approach, the parametric and resampling estimators produced comparable variance estimates, care must be taken to ensure that bootstrap resampling mimics the original sampling, and the bootstrap procedure is a viable approach to variance estimation for nearest neighbor techniques that use very small numbers of neighbors to calculate predictions.  相似文献   

5.
沈乐君  游志胜  李晓峰 《自动化学报》2012,38(10):1663-1670
多目标视觉跟踪的主要困难来自于多个目标交互(部分或完全遮挡)导致的歧义性. 马尔可夫随机场(Markov random field, MRF)可以消除这种歧义性且无需显式的数据关联. 但是, 通用概率推理算法的计算代价很高. 针对上述问题, 本文做出了3点贡献: 1)设计了新的具有"分散-集中-分散"结构的递归贝叶斯跟踪框架—自助重要性采样粒子滤波器, 它 使用融入当前时刻观测的重要性密度函数解决维数灾难问题, 将计算复杂度从指数增长变为线性增长; 2)提出了新的蒙特卡洛策略— 自助重要性采样, 利用MRF的因子分解性质进行重要性采样, 并使用自助法产生低成本高质量的样本、降低似然度计算次数和维持多模式分布; 3)采用了新的边缘化技术—使用辅助变量采样进行边缘化, 使用自助直方图对边缘后验分布进行密度估计. 实验结果表明, 本文提出的算法能够对大量目标进行实时跟踪, 能够处理目标间复杂的交互, 能够在目标消失后维持多模式分布.  相似文献   

6.
Importance sampling is an efficient strategy for reducing the variance of certain bootstrap estimates. It has found wide applications in bootstrap quantile estimation, proportional hazards regression, bootstrap confidence interval estimation, and other problems. Although estimation of the optimal sampling weights is a special case of convex programming, generic optimization methods are frustratingly slow on problems with large numbers of observations. For instance, interior point and adaptive barrier methods must cope with forming, storing, and inverting the Hessian of the objective function. In this paper, we present an efficient procedure for calculating the optimal importance weights and compare its performance to standard optimization methods on a representative data set. The procedure combines several potent ideas for large-scale optimization.  相似文献   

7.
Efron (1979) introduced the bootstrap method for independent data but it cannot be easily applied to spatial data because of their dependency. For spatial data that are correlated in terms of their locations in the underlying space the moving block bootstrap method is usually used to estimate the precision measures of the estimators. The precision of the moving block bootstrap estimators is related to the block size which is difficult to select. In the moving block bootstrap method also the variance estimator is underestimated. In this paper, first the semi-parametric bootstrap is used to estimate the precision measures of estimators in spatial data analysis. In the semi-parametric bootstrap method, we use the estimation of the spatial correlation structure. Then, we compare the semi-parametric bootstrap with a moving block bootstrap for variance estimation of estimators in a simulation study. Finally, we use the semi-parametric bootstrap to analyze the coal-ash data.  相似文献   

8.
This paper's aim is to evaluate the effectiveness of bootstrap methods in improving estimation of clutter properties in speckled imagery. Estimation is performed by standard maximum likelihood methods. We show that estimators obtained this way can be quite biased in finite samples, and develop bias correction schemes using bootstrap resampling. In particular, we propose a bootstrapping scheme which is an adaptation of that proposed by Efron (J. Amer. Statist. Assoc. 85 (1990) 79). The proposed bootstrap does not require the quantity of interest to have closed form, as does Efron's original proposal. The adaptation we suggest is particularly important since the maximum likelihood estimator of interest does not have a closed form. We show that this particular bootstrapping scheme outperforms alternative forms of bias reduction mechanisms, thus delivering more accurate inference. We also consider interval estimation using bootstrap methods, and show that a particular parametric bootstrap-based confidence interval is typically more reliable than both the asymptotic confidence interval and other bootstrap-based confidence intervals. An application to real data is presented and discussed.  相似文献   

9.
We develop a tractable, consistent bootstrap algorithm for inference about Farrell?CDebreu efficiency scores estimated by non-parametric data envelopment analysis (DEA) methods. The algorithm allows for very general situations where the distribution of the inefficiencies in the input-output space may be heterogeneous. Computational efficiency and tractability are achieved by avoiding the complex double-smoothing procedure in the algorithm proposed by Kneip et al. (Econometric Theory 24:1663?C1697, 2008). In particular, we avoid technical difficulties in the earlier algorithm associated with smoothed estimates of a density with unknown, nonlinear, multivariate bounded support requiring complicated reflection methods. The new procedure described here is relatively simple and easy to implement: for particular values of a pair of smoothing parameters, the computational complexity is the same as the (inconsistent) naive bootstrap. The resulting computational speed allows the bootstrap to be iterated in order to optimize the smoothing parameters. From a practical viewpoint, only standard packages for computing DEA efficiency estimates, i.e., solving linear problems, are required for implementation. The performance of the method in finite samples is illustrated through some simulated examples.  相似文献   

10.
When measuring units are expensive or time consuming, while ranking them can be done easily, it is known that ranked set sampling (RSS) is preferred to simple random sampling (SRS). Available results for RSS are developed under specific parametric assumptions or are asymptotic in nature, with few results available for finite size samples when the underlying distribution of the observed data is unknown. We investigate the use of resampling techniques to draw inferences on population characteristics. To obtain standard error and confidence interval estimates we discuss and compare three methods of resampling a given ranked set sample. Chen et al. (2004. Ranked Set Sampling: Theory and Applications. Springer, New York) suggest a natural method to obtain bootstrap samples from each row of a RSS. We prove that this method is consistent for a location estimator. We propose two other methods that are designed to obtain more stratified resamples from the given sample. Algorithms are provided for these methods. We recommend a method that obtains a bootstrap RSS from the observations. We prove several properties of this method, including consistency for a location parameter. We define two types of L-estimators for RSS and obtain expressions for their exact moments. We discuss an application to obtain confidence intervals for the Winsorized mean of a RSS.  相似文献   

11.
Statistical inference in censored quantile regression is challenging, partly due to the unsmoothness of the quantile score function. A new procedure is developed to estimate the variance of the Bang and Tsiatis inverse-censoring-probability weighted estimator for censored quantile regression by employing the idea of induced smoothing. The proposed variance estimator is shown to be asymptotically consistent. In addition, a numerical study suggests that the proposed procedure performs well in finite samples, and it is computationally more efficient than the commonly used bootstrap method.  相似文献   

12.
Software aging is caused by resource exhaustion and can lead to progressive performance degradation or result in a crash. We develop experiments that simulate an on-line bookstore application, using the standard configuration of TPC-W benchmark. We study application failures due to memory leaks, using the accelerated life testing (ALT). ALT significantly reduces the time needed to estimate the time to failure at normal level. We then select the Weibull time to failure distribution at normal level, to be used in a semi-Markov model so as to optimize the software rejuvenation trigger interval. Then we derive the optimal rejuvenation schedule interval by fixed point iteration and by an alternative non-parametric estimation algorithm. Finally, we develop a simulation model using importance sampling (IS) to cross validate the ALT experimental results and the semi-Markov model, and also we apply the non-parametric method to cross validate the optimized trigger intervals by comparing the availabilities obtained from the semi-Markov model and those from IS simulation using the non-parametric method.  相似文献   

13.
Computing direct illumination efficiently is still a problem of major significance in computer graphics. The evaluation involves an integral over the surface areas of the light sources in the scene. Because this integral typically features many discontinuities, introduced by the visibility term and complex material functions, Monte Carlo integration is one of the only general techniques that can be used to compute the integral. In this paper, we propose to evaluate the direct illumination using line samples instead of point samples. A direct consequence of line sampling is that the two‐dimensional integral over the area of the light source is reduced to a one‐dimensional integral. We exploit this dimensional reduction by relying on the property that commonly used sampling patterns, such as stratified sampling and low‐discrepancy sequences, converge faster when the dimension of the integration domain is reduced. We show that, while line sampling is generally more computationally intensive than point sampling, the variance of a line sample is smaller than that of a point sample, resulting in a higher order of convergence.  相似文献   

14.
We report the results from modelling standing volume, above-ground biomass and stem count with the aim of exploring the potential of two non-parametric approaches to estimate forest attributes. The models were built based on spectral and 3D information extracted from airborne optical and laser scanner data. The survey was completed across two geographically adjacent temperate forest sites in southwestern Germany, using spatially and temporally comparable remote-sensing data collected by similar instruments. Samples from the auxiliary reference stands (called off-site samples) were combined with random, random stratified and systematically stratified samples from the target area for prediction of standing volume, above-ground biomass and stem count in the target area. A range of combinations was used for the modelling process, comprising the most similar neighbour (MSN) and random forest (RF) imputation methods, three sampling designs and two predictor subset sizes. An evolutionary genetic algorithm (GA) was applied to prune the predictor variables. Diagnostic tools, including root mean square error (RMSE), bias and standard error of imputation, were employed to evaluate the results. The results showed that RF produced more accurate results than MSN (average improvement of 3.5% for a single-neighbour case with selected predictors), yet was more biased than MSN (average bias of 5.13% with RF compared to 2.44% with MSN for stem volume in a single-neighbour case with selected predictors). Combining systematically stratified auxiliary samples from the target data set with the reference data set yielded more accurate results compared to those from random and stratified random samples. Combining additional data was most influential when an intensity of up to 40% of supplementary samples was appended to the reference set. The use of GA-selected predictors resulted in reduced bias of the models. By means of bootstrap simulations of RMSE, the simulations were shown to lie within the applied non-parametric confidence intervals. The achieved results are concluded to be helpful for modelling the mentioned forest attributes by means of airborne remote-sensing data.  相似文献   

15.
We propose a general method for error estimation that displays low variance and generally low bias as well. This method is based on “bolstering” the original empirical distribution of the data. It has a direct geometric interpretation and can be easily applied to any classification rule and any number of classes. This method can be used to improve the performance of any error-counting estimation method, such as resubstitution and all cross-validation estimators, particularly in small-sample settings. We point out some similarities shared by our method with a previously proposed technique, known as smoothed error estimation. In some important cases, such as a linear classification rule with a Gaussian bolstering kernel, the integrals in the bolstered error estimate can be computed exactly. In the general case, the bolstered error estimate may be computed by Monte-Carlo sampling; however, our experiments show that a very small number of Monte-Carlo samples is needed. This results in a fast error estimator, which is in contrast to other resampling techniques, such as the bootstrap. We provide an extensive simulation study comparing the proposed method with resubstitution, cross-validation, and bootstrap error estimation, for three popular classification rules (linear discriminant analysis, k-nearest-neighbor, and decision trees), using several sample sizes, from small to moderate. The results indicate the proposed method vastly improves on resubstitution and cross-validation, especially for small samples, in terms of bias and variance. In that respect, it is competitive with, and in many occasions superior to, bootstrap error estimation, while being tens to hundreds of times faster. We provide a companion web site, which contains: (1) the complete set of tables and plots regarding the simulation study, and (2) C source code used to implement the bolstered error estimators proposed in this paper, as part of a larger library for classification and error estimation, with full documentation and examples. The companion web site can be accessed at the URL http://ee.tamu.edu/~edward/bolster.  相似文献   

16.
In this work, we discuss practical methods for the assessment, comparison, and selection of complex hierarchical Bayesian models. A natural way to assess the goodness of the model is to estimate its future predictive capability by estimating expected utilities. Instead of just making a point estimate, it is important to obtain the distribution of the expected utility estimate because it describes the uncertainty in the estimate. The distributions of the expected utility estimates can also be used to compare models, for example, by computing the probability of one model having a better expected utility than some other model. We propose an approach using cross-validation predictive densities to obtain expected utility estimates and Bayesian bootstrap to obtain samples from their distributions. We also discuss the probabilistic assumptions made and properties of two practical cross-validation methods, importance sampling and k-fold cross-validation. As illustrative examples, we use multilayer perceptron neural networks and gaussian processes with Markov chain Monte Carlo sampling in one toy problem and two challenging real-world problems.  相似文献   

17.
Bootstrapping is a simple technique typically used to assess accuracy of estimates of model parameters by using simple plug-in principles and replacing sometimes unwieldy theory by computer simulation. Common uses include variance estimation and confidence interval construction of model parameters. It also provides a way to estimate prediction accuracy of continuous and class-valued outcomes regression models. In this paper we will overview some of these applications of the bootstrap focusing on bootstrap estimates of prediction error, and also explore how the bootstrap can be used to improve prediction accuracy of unstable models like tree-structured classifiers through aggregation. The improvements can typically be attributed to variance reduction in the classical regression setting and more generally a smoothing of decision boundaries for the classification setting. These advancements have important implications in the way that atmospheric prediction models can be improved, and illustrations of this will be shown. For class-valued outcomes, an interesting graphic known as the CAT scan can be constructed to help understand the aggregated decision boundary. This will be illustrated using simulated data.  相似文献   

18.
Recently, a combined approach of bagging (bootstrap aggregating) and noise addition was proposed and shown to result in a significantly improved generalization performance. But, the level of noise introduced, a crucial factor, was determined by trial and error. The procedure is not only ad hoc but also time consuming since bagging involves training a committee of networks. Here we propose a principled procedure of computing the level of noise, which is also computationally less expensive. The idea comes from kernel density estimation (KDE), a non-parametric probability density estimation method where appropriate kernel functions such as Gaussian are imposed on data. The kernel bandwidth selector is a numerical method for finding the width of a kernel function (called bandwidth). The computed bandwidth can be used as the variance of added noise. The proposed approach makes the trial and error procedure unnecessary, and thus provides a much faster way of finding an appropriate level of noise. In addition, experimental results show that the proposed approach results in an improved performance over bagging, particularly for noisy data.  相似文献   

19.
Bagging Equalizes Influence   总被引:2,自引:0,他引:2  
Bagging constructs an estimator by averaging predictors trained on bootstrap samples. Bagged estimates almost consistently improve on the original predictor. It is thus important to understand the reasons for this success, and also for the occasional failures. It is widely believed that bagging is effective thanks to the variance reduction stemming from averaging predictors. However, seven years from its introduction, bagging is still not fully understood. This paper provides experimental evidence supporting the hypothesis that bagging stabilizes prediction by equalizing the influence of training examples. This effect is detailed in two different frameworks: estimation on the real line and regression. Baggings improvements/deteriorations are explained by the goodness/badness of highly influential examples, in situations where the usual variance reduction argument is at best questionable. Finally, reasons for the equalization effect are advanced. They support that other resampling strategies such as half-sampling should provide qualitatively identical effects while being computationally less demanding than bootstrap sampling.  相似文献   

20.
The conventional wisdom in the field of statistical pattern recognition (SPR) is that the size of the finite test sample dominates the variance in the assessment of the performance of a classical or neural classifier. The present work shows that this result has only narrow applicability. In particular, when competing algorithms are compared, the finite training sample more commonly dominates this uncertainty. This general problem in SPR is analyzed using a formal structure recently developed for multivariate random-effects receiver operating characteristic (ROC) analysis. Monte Carlo trials within the general model are used to explore the detailed statistical structure of several representative problems in the subfield of computer-aided diagnosis in medicine. The scaling laws between variance of accuracy measures and number of training samples and number of test samples are investigated and found to be comparable to those discussed in the classic text of Fukunaga, but important interaction terms have been neglected by previous authors. Finally, the importance of the contribution of finite trainers to the uncertainties argues for some form of bootstrap analysis to sample that uncertainty. The leading contemporary candidate is an extension of the 0.632 bootstrap and associated error analysis, as opposed to the more commonly used cross-validation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号