首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
We introduce a nonparametric test intended for large-scale simultaneous inference in situations where the utility of distribution-free tests is limited because of their discrete nature. Such situations are frequently dealt with in microarray analysis where the number of tests is much larger than the sample size. The proposed test statistic is based on a certain distance between the distributions from which the samples under study are drawn. In a simulation study, the proposed permutation test is compared with permutation counterparts of the t-test and the Kolmogorov–Smirnov test. The usefulness of the proposed test is discussed in the context of microarray gene expression data and illustrated with an application to real datasets.  相似文献   

2.
Based on Läuter’s [Läuter, J., 1996. Exact t and F tests for analyzing studies with multiple endpoints. Biometrics 52, 964-970] exact t test for biometrical studies related to the multivariate normal mean, we develop a generalized F-test for the multivariate normal mean and extend it to multiple comparison. The proposed generalized F-tests have simple approximate null distributions. A Monte Carlo study and two real examples show that the generalized F-test is at least as good as the optional individual Läuter’s test and can improve its performance in some situations where the projection directions for the Läuter’s test may not be suitably chosen. The generalized F-test could be superior to individual Läuter’s tests and the classical Hotelling T2-test for the general purpose of testing the multivariate normal mean. It is shown by Monte Carlo studies that the extended generalized F-test outperforms the commonly-used classical test for multiple comparison of normal means in the case of high dimension with small sample sizes.  相似文献   

3.
In this paper we compare the size distortions and powers for Pearson’s χ2-statistic, likelihood ratio statistic LR, score statistic SC and two statistics, which we call UT and VT here, proposed by [Potthoff, R.F., Whittinghill, M., 1966. Testing for homogeneity: II. The Poisson distribution. Biometrika 53, 183–190] for testing the equality of the rates of K Poisson processes. Asymptotic tests and parametric bootstrap tests are considered. It is found that the asymptotic UT test is too conservative to be recommended, while the other four asymptotic tests perform similarly and their powers are close to those of their parametric bootstrap counterparts when the observed counts are large enough. When the observed counts are not large, Monte Carlo simulation suggested that the asymptotic test using SC, LR and UT statistics are discouraged; none of the parametric bootstrap tests with the five statistics considered here is uniformly best or worst, and the asymptotic tests using Pearson’s χ2 and VT always have similar powers to their bootstrap counterparts. Thus, the asymptotic Pearson’s χ2 and VT tests have an advantage over all five parametric bootstrap tests in terms of their computational simplicity and convenience, and over the other four asymptotic tests in terms of their powers and size distortions.  相似文献   

4.
郭小萍  袁杰  李元 《自动化学报》2014,40(1):135-142
针对具有非高斯、非线性及多工况特性的批次过程,提出一种基于特征量最近邻统计指标的过程监视方法. 首先,将批次过程正常工况原始数据投影到其特征空间,提取主元T和平方预测误差SPE,并进行特征量k最近邻距离平方和的求解. 然后,采用核密度估计法获得概率密度分布函数,确定统计监视控制限. 特征空间的主元T和SPE特征量能全面代表原始数据的有用信息. 采用特征量k最近邻建立监视模型将会节省存储空间,提高建模样本数量与变量之比以及检测异常工况的速度. 另外,利用局部近邻数据建模可以解决过程具有的非线性和多工况问题,而应用核密度估计法可以解决过程数据具有的非高斯分布问题. 最后,在半导体生产过程的成功应用表明了所提方法的有效性.  相似文献   

5.
In comparing the mean count of two independent samples, some practitioners would use the t-test or the Wilcoxon rank sum test while others may use methods based on a Poisson model. It is not uncommon to encounter count data that exhibit overdispersion where the Poisson model is no longer appropriate. This paper deals with methods for overdispersed data using the negative binomial distribution resulting from a Poisson-Gamma mixture. We investigate the small sample properties of the likelihood-based tests and compare their performances to those of the t-test and of the Wilcoxon test. We also illustrate how these procedures may be used to compute power and sample sizes to design studies with response variables that are overdispersed count data. Although methods are based on inferences about two independent samples, sample size calculations may also be applied to problems comparing more than two independent samples. It will be shown that there is gain in efficiency when using the likelihood-based methods compared to the t-test and the Wilcoxon test. In studies where each observation is very costly, the ability to derive smaller sample size estimates with the appropriate tests is not only statistically, but also financially, appealing.  相似文献   

6.
Multivariate significance testing and model calibration under uncertainty   总被引:2,自引:0,他引:2  
The importance of modeling and simulation in the scientific community has drawn interest towards methods for assessing the accuracy and uncertainty associated with such models. This paper addresses the validation and calibration of computer simulations using the thermal challenge problem developed at Sandia National Laboratories for illustration. The objectives of the challenge problem are to use hypothetical experimental data to validate a given model, and then to use the model to make predictions in an untested domain. With regards to assessing the accuracy of the given model (validation), we illustrate the use of Hotelling’s T2 statistic for multivariate significance testing, with emphasis on the formulation and interpretation of such an analysis for validation assessment. In order to use the model for prediction, we next employ the Bayesian calibration method introduced by Kennedy and O’Hagan. Towards this end, we discuss how inherent variability can be reconciled with “lack-of-knowledge” and other uncertainties, and we illustrate a procedure that allows probability distribution characterization uncertainty to be included in the overall uncertainty analysis of the Bayesian calibration process.  相似文献   

7.
Anew local control spline based on shape parameterw with G^3 continuity,called BLC-spline,is pro* posed.Not only is BLC-spline very smoot,but also the spline curve‘s characteristic polygon has only three control vertices,and the characteristic polyhedron has only nine control vertices.The behavior of Iocal control of BLC-spline is better than that of the other splines such as cubic Bezier,B and Beta-spline.The three shape parameters β0,β1and β2 of BLC-spline,which are independent of the control vertices,may be altered to change the shape of the curve or surface.It is shown that BLC-spline may be used to construcet a space are spline for DNC machining directly.That is a powerful tool for the design and manufacture of curves and surfaces in integrated CAD/CAM systems.  相似文献   

8.
A systematic comparison of two types of method for estimating the nitrogen concentration of rape is presented: the traditional statistical method based on linear regression and the emerging computationally powerful technique based on artificial neural networks (ANN). Five optimum bands were selected using stepwise regression. Comparison between the two methods was based primarily on analysis of the statistic parameters. The rms. error for the back-propagation network (BPN) was significantly lower than that for the stepwise regression method, and the T-value was higher for BPN. In particular, for the first-difference of inverse-log spectra (log 1/R)′, T-values performed with a 127.71% success rate using BPN. The results show that the neural network is more robust to training and estimating rape nitrogen concentrations using canopy hyperspectral reflectance data.  相似文献   

9.
We study the problem of PAC-learning Boolean functions with random attribute noise under the uniform distribution. We define a noisy distance measure for function classes and show that if this measure is small for a class and an attribute noise distribution D then is not learnable with respect to the uniform distribution in the presence of noise generated according to D. The noisy distance measure is then characterized in terms of Fourier properties of the function class. We use this characterization to show that the class of all parity functions is not learnable for any but very concentrated noise distributions D. On the other hand, we show that if is learnable with respect to uniform using a standard Fourier-based learning technique, then is learnable with time and sample complexity also determined by the noisy distance. In fact, we show that this style algorithm is nearly the best possible for learning in the presence of attribute noise. As an application of our results, we show how to extend such an algorithm for learning AC0 so that it handles certain types of attribute noise with relatively little impact on the running time.  相似文献   

10.
In this paper, we consider a Portmanteau-type test of randomness for symmetric α stable random variables with exponent 0<α≤2, using a test statistic that differs from, but has the same general form as Box–Pierce Q-statistic, which is defined using the codifference function. We obtain that unlike a similar test proposed in [Runde, R., 1997. The asymptotic null distribution of the Box–Pierce Q-statistic for random variables with infinite variance—with an application to German stock returns. Journal of Econometrics 78, 205–216], the asymptotic distribution of the proposed statistic is similar to the classical case, that is asymptotically χ2 distributed, both in finite and infinite variance cases. Simulation studies are performed to obtain the small sample performance of the proposed statistic. We found that the proposed statistic works fairly well, in the sense that in the infinite variance case, under suitable choice of the design parameter, its empirical levels are closer to the theoretical ones than Runde’s statistic. In the finite variance case, its empirical level is approximately the same as that of Ljung–Box’s statistic [Ljung, G.M. and Box, G.E.P., 1978. On a measure of lack of fit in time series models. Biometrika 65, 297–303]. Furthermore, the statistic also has good power against the AR(1) alternative. We provide an empirical example using some stocks chosen from the LQ45 index listed in the Indonesia Stock Exchange (IDX).  相似文献   

11.
A novel simulation based approach to unit root testing is proposed in this paper. The test is constructed from the distinct orders in probability of the OLS parameter estimates obtained from a spurious and an unbalanced regression, respectively. While the parameter estimate from a regression of two integrated and uncorrelated time series is of order O p (1), the estimate is of order O p (T −1) if the dependent variable is stationary. The test statistic is constructed as an interquantile range from the empirical distribution obtained from regressing the standardized data sufficiently often on controlled random walks. GLS detrending (Elliott et al., Econometrica 64(4):813–836, 1996) and spectral density variance estimators (Perron and Ng, Econom Theory 14(5):560–603, 1998) are applied to account for deterministic terms and residual autocorrelation in the data. A Monte Carlo study confirms that the proposed test has favorable empirical size properties and is powerful in local-to-unity neighborhoods. As an empirical illustration, we test the purchasing power parity hypothesis for a sample of G7 economies.  相似文献   

12.
Several tests for a zero random effect variance in linear mixed models are compared. This testing problem is non-regular because the tested parameter is on the boundary of the parameter space. Size and power of the different tests are investigated in an extensive simulation study that covers a variety of important settings. These include testing for polynomial regression versus a general smooth alternative using penalized splines. Among the test procedures considered, three are based on the restricted likelihood ratio test statistic (RLRT), while six are different extensions of the linear model F-test to the linear mixed model. Four of the tests with unknown null distributions are based on a parametric bootstrap, the other tests rely on approximate or asymptotic distributions. The parametric bootstrap-based tests all have a similar performance. Tests based on approximate F-distributions are usually the least powerful among the tests under consideration. The chi-square mixture approximation for the RLRT is confirmed to be conservative, with corresponding loss in power. A recently developed approximation to the distribution of the RLRT is identified as a rapid, powerful and reliable alternative to computationally intensive parametric bootstrap procedures. This novel method extends the exact distribution available for models with one random effect to models with several random effects.  相似文献   

13.
Clustering techniques play an important role in analyzing high dimensional data that is common in high-throughput screening such as microarray and mass spectrometry data. Effective use of the high dimensionality and some replications can help to increase clustering accuracy and stability. In this article a new partitioning algorithm with a robust distance measure is introduced to cluster variables in high dimensional low sample size (HDLSS) data that contain a large number of independent variables with a small number of replications per variable. The proposed clustering algorithm, PPCLUST, considers data from a mixture distribution and uses p-values from nonparametric rank tests of homogeneous distribution as a measure of similarity to separate the mixture components. PPCLUST is able to efficiently cluster a large number of variables in the presence of very few replications. Inherited from the robustness of rank procedure, the new algorithm is robust to outliers and invariant to monotone transformations of data. Numerical studies and an application to microarray gene expression data for colorectal cancer study are discussed.  相似文献   

14.
Every endofunctor F of Set has an initial algebra and a final coalgebra, but they are classes in general. Consequently, the endofunctor F of the category of classes that F induces generates a completely iterative monad T. And solutions of arbitrary guarded systems of iterative equations w.r.t. F exist, and can be found in naturally defined subsets of the classes TY.More generally, starting from any category , we can form a free cocompletion of under small-filtered colimits (e.g., Set is the category of classes), and we give sufficient conditions to obtain analogous results for arbitrary endofunctors of .  相似文献   

15.
An f-sensitivity distance oracle for a weighted undirected graph G(V,E) is a data structure capable of answering restricted distance queries between vertex pairs, i.e., calculating distances on a subgraph avoiding some forbidden edges. This paper presents an efficiently constructible f-sensitivity distance oracle that given a triplet (s,t,F), where s and t are vertices and F is a set of forbidden edges such that |F|≤f, returns an estimate of the distance between s and t in G(V,EF). For an integer parameter k≥1, the size of the data structure is O(fkn 1+1/k log (nW)), where W is the heaviest edge in G, the stretch (approximation ratio) of the returned distance is (8k−2)(f+1), and the query time is O(|F|⋅log 2 n⋅log log n⋅log log d), where d is the distance between s and t in G(V,EF).  相似文献   

16.
Abstract

In this paper the theory of fixed sample nonparametric m-interval partition detectors is extended to include sequential operation. A formulation for sequential m-interval partition detector is given which requires knowledge of only (m ? 1) quantiles under a Lehmann alternative and, in addition, the slope of the distribution at these quantiles under a shift of the mean alternative. A theorem is proven that permits the use of Fisher's information as a bound on performance and assures the existence of a locally most powerful sequential partition detector. In addition, a reduced partition space concept is introduced to combat impulsive noise interference.  相似文献   

17.
In this paper, we define double Horn functions, which are the Boolean functionsfsuch that bothfand its complement (i.e., negation)fare Horn, and investigate their semantical and computational properties. Double Horn functions embody a balanced treatment of positive and negative information in the course of the extension problem of partially defined Boolean functions (pdBfs), where a pdBf is a pair (T, F) of disjoint setsT, F⊆{0, 1}nof true and false vectors, respectively, and an extension of (T, F) is a Boolean functionfthat is compatible withTandF. We derive syntactic and semantic characterizations of double Horn functions, and determine the number of such functions. The characterizations are then exploited to give polynomial time algorithms (i) that recognize double Horn functions from Horn DNFs (disjunctive normal forms), and (ii) that compute the prime DNF from an arbitrary formula, as well as its complement and its dual. Furthermore, we consider the problem of determining a double Horn extension of a given pdBf. We describe a polynomial time algorithm for this problem and moreover an algorithm that enumerates all double Horn extensions of a pdBf with polynomial delay. However, finding a shortest double Horn extension (in terms of the size of a formula?representing it) is shown to be intractable.  相似文献   

18.
The objective of the study is to select the best possible array size of Indian Remote Sensing Satellite (IRS-IB) linear imaging self scanning (LISS-IIA) digital data for the estimation of the suspended solids concentration on a surface water body. For this purpose a lake namely Hussain Sagar in Hyderabad (India) has been considered. The lake water samples were collected on 21 February 1992 in concurrence with the date of IRS-IB overpass. These water samples have been analysed to determine the suspended solids concentration at predetermined sample locations. Different pixel array sizes of IRS-IB LISS-IIA digital data has been analysed for the selection of the size of the pixel array for the estimation of water quality variables. This selection has been conducted by using various statistical methods such as analysis of variance, paired t-test and linear regression techniques. Analysis of variance and paired t-test are basically used for the selection of minimum pixel array size and linear regression techniques have been used for the selection of the best favourable band and pixel array for the estimation of suspended solids concentration. The relations between digital data and measured values of suspended solids concentrations have been quantified using simple linear and multiple regression. The possible combinations of bands, i.e., model 1, model 2 are developed. From possible combinations model 1 has been chosen for the estimation of suspended solids concentration based on the highest coefficient of determination (R 2) lowest standard error of estimate and F-ratio (four times greater than critical F-ratio (Fcr). Based on the results of this study it is observed that the statistical approach has a strong potential for the application of remote sensing data for quantification of suspended solids concentration.  相似文献   

19.
This paper considers the problem of nonparametric comparison of counting processes with panel count data, which arise naturally when recurrent events are considered. For the problem considered, we construct a new nonparametric test statistic based on the nonparametric maximum likelihood estimator of the mean function of the counting processes over observation times. The asymptotic distribution of the proposed statistic is derived and its finite-sample property is examined through Monte Carlo simulations. The simulation results show that the proposed method is good for practical use and also more powerful than the existing nonparametric tests based on the nonparametric maximum pseudo-likelihood estimator. A set of panel count data from a floating gallstone study is analyzed and presented as an illustrative example.  相似文献   

20.
The forward search provides data-driven flexible trimming of a Cp statistic for the choice of regression models that reveals the effect of outliers on model selection. An informed robust model choice follows. Even in small samples, the statistic has a null distribution indistinguishable from an F distribution. Limits on acceptable values of the Cp statistic follow. Two examples of widely differing size are discussed. A powerful graphical tool is the generalized candlestick plot, which summarizes the information on all forward searches and on the choice of models. A comparison is made with the use of M-estimation in robust model choice.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号