首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
When (X1, ?1),..., (Xn, ?n) are independent identically distributed random vectors from IRd X {0, 1} distributed as (X, ?), and when ? is estimated by its nearest neighbor estimate ?(1), then Cover and Hart have shown that P{?(1) ? ?}n ? ? ? 2E {? (X) (1 - ?(X))} ? 2R*(1 - R*) where R* is the Bayes probability of error and ?(x) = P{? = 1 | X = x}. They have conditions on the distribution of (X, ?). We give two proofs, one due to Stone and a short original one, of the same result for all distributions of (X, ?). If ties are carefully taken care of, we also show that P{?(1) ? ?|X1, ?1, ..., Xn, ?n} converges in probability to a constant for all distributions of (X, ?), thereby strengthening results of Wagner and Fritz.  相似文献   

2.
Lower bounds for Bayes error estimation   总被引:1,自引:0,他引:1  
We give a short proof of the following result. Let (X,Y) be any distribution on N×{0,1}, and let (X1,Y1),...,(Xn,Yn) be an i.i.d. sample drawn from this distribution. In discrimination, the Bayes error L*=infgP{g(X)≠Y} is of crucial importance. Here we show that without further conditions on the distribution of (X,Y), no rate-of-convergence results can be obtained. Let φn(X1,Y1,...,Xn,Yn ) be an estimate of the Bayes error, and let {φn(.)} be a sequence of such estimates. For any sequence {an} of positive numbers converging to zero, a distribution of (X,Y) may be found such that E{|L*-φn(X1,Y 1,...,Xn,Yn)|}⩾an often converges infinitely  相似文献   

3.
Let T=(V, E) be an edge-weighted tree with |V|=n vertices embedded in the Euclidean plane. Let IE denote the set of all points on the edges of T. Let X and Y be two subsets of IE and let r be a positive real number. A subset D/spl sube/X is an X/Y/r-dominating set if every point in Y is within distance r of a point in D. The X/Y/r-dominating set problem is to find an X/Y/r-dominating set D* with minimum cardinality. Let p/spl ges/1 be an integer. The X/Y/p-center problem is to find a subset C*/spl sube/X of p points such that the maximum distance of any point in Y from C* is minimized. Let X and Y be either V or IE. In this paper, efficient parallel algorithms on the EREW PRAM are first presented for the X/Y/r-dominating set problem. The presented algorithms require O(log/sup 2/n) time for all cases of X and Y. Parallel algorithms on the EREW PRAM are then developed for the X/Y/p-center problem. The presented algorithms require O(log/sup 3/n) time for all cases of X and Y. Previously, sequential algorithms for these two problems had been extensively studied in the literature. However, parallel solutions with polylogarithmic time existed only for their special cases. The algorithms presented in this paper are obtained by using an interesting approach which we call the dependency-tree approach. Our results are examples of parallelizing sequential dynamic-programming algorithms by using the approach.  相似文献   

4.
Ambiguities in Incremental Line Rastering   总被引:1,自引:0,他引:1  
In implmenting rater grahic algorithms, it is impotant to toroughly understand behavior and implicit defaults inherent in each algorithm. Design choices must balance performance with respect to drawing speed, circult count, code space, picture fidelity, system complexity, and system consistency. For example, "close" may sound appealing when describing the match of the rastered representation to a geometirc line. An implementation, however, must quantily an error metric?such as minimum normal distance between candidate raster grid points and the geometric line?and resolve "ties" in which two candidate grid points have an equal error metric. Equal error metric ambiguity can permit algorithimic selection of raster points for a line from (X0, Y0) to (X1, Y1) to differ from points selected rastering the same line back from (X1, Y1) to (X0, Y0). Modilying a rastering algorithm to provide an exactly reversibie path, though, will cause problems when lines are rastered in a context of approximating a circle with a polygon. Only by fully understanding any algorithm can designers determine whether such pel-level anomalies are worth the code space or circuit count necessary to provide explicit user resolution, or whether a fixed default must suffice. This article discusses implementation considerations relevant to selecting and customizing incremental line-drawing algorithms to cope with such anomalies as equal error metric instances, perturbation effects of clipping, interesections in raster space, EXOR interpretations for polylines, reversibility, and fractional endpoint rounding.  相似文献   

5.
This paper extends the notions of capacity and distribution-free error estimation to nonlinear Boolean classifiers on patterns with binary-valued features. We establish quantitative relationships between the dimensionality of the feature vectors (d), the combinational complexity of the decision rule (c), the number of samples in the training set (n), and the classification performance of the resulting classifier. Our results state that the discriminating capacity of Boolean classifiers is given by the product dc, and the probability of ambiguous generalization is asymptotically given by (n/dc-1)-1 0(log d)/d) for large d, and n=0(dc). In addition we show that if a fraction ? of the training samples is misclassified then the probability of error (?) in subsequent samples satisfies P(|?-?| ?) m=<2.773 exp (dc-e2n/8) for all distributions, regardless of how the classifier was discovered.  相似文献   

6.
A method for managing agile sensors to optimize detection and classification based on discrimination gain is presented. Expected discrimination gain is used to determine threshold settings and search order for a collection of discrete detection cells. This is applied in a low signal-to-noise environment where target-containing cells must be sampled many times before a target can be detected or classified with high confidence. The goal of sensor management is interpreted here to be to direct sensors to optimize the probability densities produced by a data fusion system that they feed. The use of discrimination is motivated by its interpretation as a measure of the relative likelihood for alternative probability densities. This is studied in a problem where a single sensor can be directed at any detection cell in the surveillance volume for each sample. Bayes rule is used to construct a recursive estimator for the cell target probabilities. The expected discrimination gain is predicted for each cell using its current target probability estimates. This gain is used to select the optimal cell for the next sample. The expected discrimination gains can be maintained in a binary search tree structure for computational efficiency. The computational complexity of this algorithm is proportional to the height of the tree which is logarithmic in the number of detection cells. In a test case for a single 0 dB Gaussian target, the error rate for discrimination directed search was similar to the direct search result against a 6 dB target  相似文献   

7.
Testing digital circuit accounts for an increasing part of the cost to design, manufacture and service, electric system―― a trend that is projected to continue and accelerate[1]. Test compression is known as a methodology to reduce the test cost. Test c…  相似文献   

8.
Constrained restoration and the recovery of discontinuities   总被引:31,自引:0,他引:31  
The linear image restoration problem is to recover an original brightness distribution X0 given the blurred and noisy observations Y=KX0+B, where K and B represent the point spread function and measurement error, respectively. This problem is typical of ill-conditioned inverse problems that frequently arise in low-level computer vision. A conventional method to stabilize the problem is to introduce a priori constraints on X0 and design a cost functional H(X) over images X, which is a weighted average of the prior constraints (regularization term) and posterior constraints (data term); the reconstruction is then the image X, which minimizes H. A prominent weakness in this approach, especially with quadratic-type stabilizers, is the difficulty in recovering discontinuities. The authors therefore examine prior smoothness constraints of a different form, which permit the recovery of discontinuities without introducing auxiliary variables for marking the location of jumps and suspending the constraints in their vicinity. In this sense, discontinuities are addressed implicitly rather than explicitly  相似文献   

9.
There are many important issues that need to be resolved for identification of a fuzzy rule-based system using clustering. We address three such important issues: 1) deciding on the proper domain(s) of clustering; 2) deciding on the number of rules; and 3) getting an initial estimate of parameters of the fuzzy systems. We justify that one should start with separate clustering of X (input) and Y (output). We propose a scheme to establish correspondence between the clusters obtained in X and Y. The correspondence dictates whether further splitting/merging of clusters is needed or not. If X and Y do not exhibit strong cluster substructures, then again clustering of X* (input data augmented by the output data) exploiting the results of separate clustering of X and Y, and of the correspondence scheme is recommended. We justify that usual cluster validity indices are not suitable for finding the number of rules, and the proposed scheme does not use any cluster validity index. Three methods are suggested to get the initial estimate of membership functions (MFs). The proposed scheme is used to identify the rule base needed to realize a self-tuning fuzzy PI-type controller and its performance is found to be quite satisfactory.  相似文献   

10.
Let X be a discrete random variable with a given probability distribution. For any α, 0 ≤ α ≤ 1, we obtain precise values for both the maximum and minimum variational distance between X and another random variable Y under which an α-coupling of these random variables is possible. We also give the maximum and minimum values for couplings of X and Y provided that the variational distance between these random variables is fixed. As a consequence, we obtain a new lower bound on the divergence through variational distance.  相似文献   

11.
The low sensitivity of the probability of error rule (Pe rule) for feature selection is demonstrated and discussed. It is shown that under certain conditions features with significantly different discrimination power are considered as equivalent by the Pe rule. The main reason for this phenomenon lies in the fact that, directly, the Pe rule depends only on the most probable class and that, under the stated condition, the prior most probable class remains the posterior most probable class regardless of the result for the observed feature. A rule for breaking ties is suggested to refine the feature ordering induced by the Pe rule. By this tie-breaking rule, when two features have the same value for the expected probability of error, the feature with the higher variance for the probability of error is preferred.  相似文献   

12.
The problem of determining the maximum mutual information I(X; Y) and minimum entropy H(X, Y) of a pair of discrete random variables X and Y is considered under the condition that the probability distribution of X is fixed and the error probability Pr{Y ≠ X} takes a given value ε, 0 ≤ ε ≤ 1. Precise values for these quantities are found, which in several cases allows us to obtain explicit formulas for both the maximum information and minimum entropy in terms of the probability distribution of X and the parameter ε.  相似文献   

13.
In this paper,a new statistic model named Center-Distance Continuous Probability Model(CDCPM)for speech recognition is described,which is based on Center-Distance Normal(CDN)distribution.In a CDCPM,the probability transition matrix is omitted,and the observation probability density function(PDF)in each state is in the form of embedded multiple-model(EMM)based on the Nearest Neighbour rule.The experimental results on two giant real-world Chinese speech databases and a real-world continuous-manner 2000 phrase system show that this model is a powerful one.Also,a distance measure for CDPMs is proposed which is based on the Bayesian minimum classification error(MCE) discrimination.  相似文献   

14.
A probabilistic procedure is suggested for the automatic correction of spelling and typing errors in printed English texts. The heart of the procedure is a probabilistic model for the generation of the garbled word from the correct word. The garbler can delete or insert symbols in the word or substitute one or more symbols by other symbols. An expression is derived for P(Y ? X), the probability of generating a garbled word Y from a correct word X. The model is probabilistically consistent. Using the expression for P(Y ? X), we can derive an estimate of the correct word from the garbled word Y so as to minimize the average probability of error in the decision. One of the important features of the expression P(Y ? X) is that it can be computed recursively. Experiments conducted using the dictionary of 1025 most common English words indicate that the accuracy of correction by this scheme is substantially greater than that which can be obtained by other algorithms especially while dealing with garbled words derived from relatively short words of length less than 6.  相似文献   

15.
Fast computation of normalized edit distances   总被引:1,自引:0,他引:1  
The normalized edit distance (NED) between two strings X and Y is defined as the minimum quotient between the sum of weights of the edit operations required to transform X into Y and the length of the editing path corresponding to these operations. An algorithm for computing the NED was introduced by Marzal and Vidal (1993) that exhibits 0(mn2 ) computing complexity, where m and n are the lengths of X and Y. We propose here an algorithm that is observed to require in practice the same 0(mn) computing resources as the conventional unnormalized edit distance algorithm does. The performance of this algorithm is illustrated through computational experiments with synthetic data, as well as with real data consisting of OCR chain-coded strings  相似文献   

16.
Let A be an n×n matrix of reals with sorted rows and columns and k an integer, 1 ? k ? n2. We present an O(n) time algorithm for selecting the kth smallest element of A. If X and Y are sorted n-vectors of reals, then Cartesian sum X + Y is such a matrix as A. One application of selection in X + Y can be found in statistics. The algorithm presented here is based on a new divide-and-conquer technique, which can be applied to similar order related problems as well. Due to the fact that the algorithm has a relatively small constant time factor, this result is of practical as well as theoretical interest.  相似文献   

17.
18.

A causal rule between two variables, X M Y, captures the relationship that the presence of X causes the appearance of Y. Because of its usefulness (compared to association rules), techniques for mining causal rules are beginning to be developed. However, the effectiveness of existing methods (such as the LCD and CU-path algorithms) are limited to mining causal rules among simple variables, and are inadequate to discover and represent causal rules among multi-value variables. In this paper, we propose that the causality between variables X and Y be represented in the form X M Y with conditional probability matrix M Y|X . We also propose a new approach to discover causality in large databases based on partitioning. The approach partitions the items into item variables by decomposing "bad" item variables and composing "not-good" item variables. In particular, we establish a method to optimize causal rules that merges the "useless" information in conditional probability matrices of extracted causal rules.  相似文献   

19.
The paper is motivated towards developing a generalized probability model describing the longevity of a system exposed to paired risks R1 and R2 which are dependent. The bivariate exponential model of Freund (1961) with failure times X and Y under risks R1 and R2 with a time-independent hazard rate set-up has been generalized by incorporating an additional age factor, t, as a variable. The hazard rates due to R1 and R2 have been changed from a to α(t) = αtα?1, and from β to β(f) = βtβ?1 where α,β > 0 which are Weibull hazard functions for α,β > 1. Further conditions are imposed such that α is changed to α' when R2 is off and β is changed to β' when R1 is off. The trivariate distribution of Freund so generalized has again been doubly truncated in the range a  t  b, for a, b > 0; and the conditional distribution of X and Y given t has been used to study the role of the component's age in the context of the system's survival under paired dependent risks in the finite age range.  相似文献   

20.
压电基因传感器检测芽孢杆菌靶序列研究   总被引:1,自引:0,他引:1  
在石英晶体微天平(QCM)上用生物素亲和素法和自组装法2种不同的方法固定寡核苷酸探针,构建压电基因传感器,对芽孢杆菌靶序列进行实时检测。结果表明:生物素亲和素法固定探针效果更好,靶序列浓度为0.05~0.5μmol/L时,有很好的线性关系,线性回归方程为Y=93.88X+10.88(R=0.989 0,N=6,P<0.001),非线性误差为±4.7%。该传感器特异性较好,能够识别错配3个碱基的序列。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号