首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
2.
We consider the problem of using sampling to estimate the result of an aggregation operation over a subset-based SQL query, where a subquery is correlated to an outer query by a NOT EXISTS, NOT IN, EXISTS or IN clause. We design an unbiased estimator for our query and prove that it is indeed unbiased. We then provide a second, biased estimator that makes use of the superpopulation concept from statistics to minimize the mean squared error of the resulting estimate. The two estimators are tested over an extensive set of experiments. Material in this paper is based upon work supported by the National Science Foundation via grants 0347408 and 0612170.  相似文献   

3.
A nonlinear distributed estimation problem is solved by using reduced-order local models. Using local models with lower dimensions than the observed process model will lessen the local processors' complexities or computational loads. Fusion algorithms that combine local densities to construct the centralized density of a nonlinear random process are presented. The local densities are generated at each measurement time and communicated to a coordinator. The models used to produce these densities are reduced-order valid models. The validity of the local models guarantees that the coordinator reconstructs exactly the centralized density function  相似文献   

4.
The first- and second-order sensitivities with respect to varying structural shape are discussed for an arbitrary stress, strain and displacement functional. It is assumed that only the traction-free boundary of a structure can undergo the shape modification described by a set of shape design parameters. The first derivatives of a functional with respect to these parameters are derived using both the direct and adjoint approaches. Next the second derivatives are obtained using the mixed approach in which both the direct and adjoint first-order solutions are used. The general results are particularized for the case of complementary and potential energy of a structure. Some simple examples illustrate the theory presented.  相似文献   

5.
Standard errors for bagged and random forest estimators   总被引:1,自引:0,他引:1  
Bagging and random forests are widely used ensemble methods. Each forms an ensemble of models by randomly perturbing the fitting of a base learner. The standard errors estimation of the resultant regression function is considered. Three estimators are discussed. One, based on the jackknife, is applicable to bagged estimators and can be computed using the bagged ensemble. The two other estimators target the bootstrap standard error estimator, and require fitting multiple ensemble estimators, one for each bootstrap sample. It is shown that these bootstrap ensemble sizes can be small, which reduces the computation involved in forming the estimator. The estimators are studied using both simulated and real data.  相似文献   

6.
Chao Sima 《Pattern recognition》2006,39(9):1763-1780
A cross-validation error estimator is obtained by repeatedly leaving out some data points, deriving classifiers on the remaining points, computing errors for these classifiers on the left-out points, and then averaging these errors. The 0.632 bootstrap estimator is obtained by averaging the errors of classifiers designed from points drawn with replacement and then taking a convex combination of this “zero bootstrap” error with the resubstitution error for the designed classifier. This gives a convex combination of the low-biased resubstitution and the high-biased zero bootstrap. Another convex error estimator suggested in the literature is the unweighted average of resubstitution and cross-validation. This paper treats the following question: Given a feature-label distribution and classification rule, what is the optimal convex combination of two error estimators, i.e. what are the optimal weights for the convex combination. This problem is considered by finding the weights to minimize the MSE of a convex estimator. It also considers optimality under the constraint that the resulting estimator be unbiased. Owing to the large amount of results coming from the various feature-label models and error estimators, a portion of the results are presented herein and the main body of results appears on a companion website. In the tabulated results, each table treats the classification rules considered for the model, various Bayes errors, and various sample sizes. Each table includes the optimal weights, mean errors and standard deviations for the relevant error measures, and the MSE and MAE for the optimal convex estimator. Many observations can be made by considering the full set of experiments. Some general trends are outlined in the paper. The general conclusion is that optimizing the weights of a convex estimator can provide substantial improvement, depending on the classification rule, data model, sample size and component estimators. Optimal convex bootstrap estimators are applied to feature-set ranking to illustrate their potential advantage over non-optimized convex estimators.  相似文献   

7.
The optimal linear prediction estimator and filter are derived for dynamic systems with time-averaged measurements using the Wiener-Hopf theory. The estimates are compared with other optimal estimators for various measurement systems.  相似文献   

8.
9.
We consider the following general problem modeling load balancing in a variety of distributed settings. Given an arbitrary undirected connected graph G=(V,E) and a weight distribution w 0 on the nodes, determine a schedule to move weights across edges in each step so as to (approximately) balance the weights on the nodes. We focus on diffusive schedules for this problem. All previously studied diffusive schedules can be modeled as w t+1 = M w t where w t is the weight distribution after t steps and M is a doubly stochastic matrix. We call these the first-order schedules. First-order schedules, although widely used in practice, are often slow. In this paper we introduce a new direction in diffusive schedules by considering schedules that are modeled as: w 1 =M w 0 ;w t+1 =β M w t + (1-β) w t-1 for some appropriate β; we call these the second-order schedules. In the idealized setting of weights being real numbers, we adopt known results to show that β can be chosen so that the second-order schedule involves significantly fewer steps than the first-order method for approximate load balancing. In the realistic setting when the weights are positive integers, we simulate the idealized schedules by maintaining I Owe You units on the edges. Extensive experiments with simulated data and real-life data from JOSTLE, a mesh-partitioning software, show that the resultant realistic schedule is close to the idealized schedule, and it again involves fewer steps than the first-order schedules for approximate load balancing. Our main result is therefore a fast algorithm for coarse load balancing that can be used in a variety of applications. Received October 1996, and in final form January 1998.  相似文献   

10.
Addresses two issues concerning the separate-bias Kalman estimator. The first of these issues deals with the derivation of the optimal estimator for the general case in which the bias vector is stochastic in nature, and the second issue deals with defining a suitable suboptimal realization of the generalized estimator  相似文献   

11.
Formulae for computation of the first- and second-order sensitivity matrices of the eigenvalues and eigenvectors of a matrix with distinct eigenvalues are derived using matrix calculus and the algebra of Kronecker products. The sensitivities of eigenvalues and eigenvectors to all elements of the matrix can thus be expressed by concise matrix equations.  相似文献   

12.
对于带相邻及同一时刻相关噪声的时变系统,基于Kalman滤波理论提出了统一和通用的最优噪声估值器,包括观测噪声估值器和输入噪声估值器,提出了统一和通用的固定点和固定区间的最优噪声平滑器,它们为解决状态和信号估计问题提供了新的工具.一个仿真算例说明了其有效性.  相似文献   

13.
Statistical summaries of IP traffic are at the heart of network operation and are used to recover aggregate information on subpopulations of flows. It is therefore of great importance to collect the most accurate and informative summaries given the router's resource constraints. A summarization algorithm, such as Cisco's sampled NetFlow, is applied to IP packet streams that consist of multiple interleaving IP flows. We develop sampling algorithms and unbiased estimators which address sources of inefficiency in current methods. First, we design tunable algorithms whereas currently a single parameter (the sampling rate) controls utilization of both memory and processing/access speed (which means that it has to be set according to the bottleneck resource). Second, we make a better use of the memory hierarchy, which involves exporting partial summaries to slower storage during the measurement period.  相似文献   

14.
Bayes estimates under both modified symmetric and asymmetric loss functions are obtained for the reliability function of the extreme value distribution (EV1) using Lindley's approximation procedure. These estimates are compared to each others and to maximum likelihood estimates (MLE) using simulation study. A noninformative prior (Jeffreys invariant prior) is used in the comparisons. The Bayes estimator under asymmetric loss function compared to the posterior mean, it incorporates additional information about possible consequences of overestimation and underestimation of the true value of the reliability function. The MLE is superior to either of the Bayes estimates, except for small values of time t the Bayes estimates consistently perform well. While the Bayes approach is computationally intensive, the calculations can be easily computerized.  相似文献   

15.
The traditional statistical assumption for interpreting histograms and justifying approximate query processing methods based on them is that all elements in a bucket have the same frequency—this is called uniform distribution assumption. In this paper, we analyze histograms from a statistical point of view. We show that a significantly less restrictive statistical assumption – the elements within a bucket are randomly arranged even though they might have different frequencies – leads to identical formulas for approximating aggregate queries using histograms. Under this assumption, we analyze the behavior of both unidimensional and multidimensional histograms and provide tight error guarantees for the quality of approximations. We conclude that histograms are the best estimators if the assumption holds; sampling and sketching are significantly worse. As an example of how the statistical theory of histograms can be extended, we show how XSketches – an approximation technique for XML queries that uses histograms as building blocks – can be statistically analyzed. The combination of the random shuffling assumption and the other statistical assumptions associated with XSketch estimators ensures a complete statistical model and error analysis for XSketches.  相似文献   

16.
Several methods for estimating a sample-based discriminant's probability of correct classification are compared with respect to bias, variance, robustness, and computation cost. “Smooth” modification of the counting estimator, or sample success proportion, is recommended to reduce bias and variance while retaining robustness. Also the “bootstrap” method of Efron(8) can approximately correct an additive estimator's bias using an ancillary computer simulation. In contrast, bias reduction achieved by the popular “leave-one-out” modification of counting method is vitiated by corresponding increase in variance.  相似文献   

17.
Many approaches attempt to improve naive Bayes and have been broadly divided into five main categories: (1) structure extension; (2) attribute weighting; (3) attribute selection; (4) instance weighting; (5) instance selection, also called local learning. In this paper, we work on the approach of structure extension and single out a random Bayes model by augmenting the structure of naive Bayes. We called it random one-dependence estimators, simply RODE. In RODE, each attribute has at most one parent from other attributes and this parent is randomly selected from log2m (where m is the number of attributes) attributes with the maximal conditional mutual information. Our work conducts the randomness into Bayesian network classifiers. The experimental results on a large number of UCI data sets validate its effectiveness in terms of classification, class probability estimation, and ranking.  相似文献   

18.
19.
This paper presents an alternative method in dealing with nonlinear estimation problems. The principle is to approximate the integration of the conditional densities by using Gauss quadrature formulas and to set up the grid for the current filtering density simultaneously. The grid is centered at the filtering mean. The region where the grid locates is changed according to the conditional distribution. Approximation errors which depend on the choice of the number of nodes and the integration interval are discussed. Numerical experiments indicate that approximation errors do not accumulate during updating procedures.  相似文献   

20.
Cyclostationarity (CS) has proven to be effective in the treatment and identification of signal components for diagnostic and prognosis purposes. CS research has focused on algorithms, in terms of simplicity and computational efficiency. The performance of algorithms largely depends on the signals being analyzed.The objective of this research paper is to exploit the CS characteristics of signals in the context of morphological component analysis (MCA) method. It proposes a novel methodology used for separating between the periodic (First-Order Cyclostationarity: CS1) and random (Second-Order Cyclostationarity: CS2) sources by means of one sensor measurement. This MCACS2 methodology is based on MCA, where each of the two sources is sparsely represented by a special dictionary: i) the CS1 periodic structure is sparsely represented by means of the Discrete Cosine Transform dictionary, and ii) the CS2 random component is sparsely represented by a new proposed dictionary derived from Envelope Spectrum Analysis. Subsequently, a simulation study is performed in order to validate the proposed new MCACS2 method followed by tests on real GRF biomechanical signals. The result concludes by stating that such a novel algorithm provides an additional way for the exploitation of cyclostationarity and may be useful in other domain applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号