首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
The Kruskal-Wallis (KW) nonparametric analysis of variance is often used instead of a standard one-way ANOVA when data are from a suspected non-normal population. The KW omnibus procedure tests for some differences between groups, but provides no specific post hoc pair wise comparisons. This paper provides a SAS® macro implementation of a multiple comparison test based on significant Kruskal-Wallis results from the SAS NPAR1WAY procedure. The implementation is designed for up to 20 groups at a user-specified alpha significance level. A Monte-Carlo simulation compared this nonparametric procedure to commonly used parametric multiple comparison tests.  相似文献   

2.
Null hypothesis significance testing is routinely used for comparing the performance of machine learning algorithms. Here, we provide a detailed account of the major underrated problems that this common practice entails. For example, omnibus tests, such as the widely used Friedman test, are not appropriate for the comparison of multiple classifiers over diverse data sets. In contrast to the view that significance tests are essential to a sound and objective interpretation of classification results, our study suggests that no such tests are needed. Instead, greater emphasis should be placed on the magnitude of the performance difference and the investigator’s informed judgment. As an effective tool for this purpose, we propose confidence curves, which depict nested confidence intervals at all levels for the performance difference. These curves enable us to assess the compatibility of an infinite number of null hypotheses with the experimental results. We benchmarked several classifiers on multiple data sets and analyzed the results with both significance tests and confidence curves. Our conclusion is that confidence curves effectively summarize the key information needed for a meaningful interpretation of classification results while avoiding the intrinsic pitfalls of significance tests.  相似文献   

3.
目前,国内设计研究人员进行了大量的真型铁塔试验,但这些试验大多侧重于对铁塔整体进行承载能力的验证,而对主要受力构件在试验加载情况下的实测内力与理论计算内力进行对比分析的不多,并且在已有的对比分析文献中,构件的试验结果与有限元分析计算结果出入较大的情况比较多。有鉴于此,本文对铁塔主要受力构件进行了理论计算与试验结果的对比方法研究,分析了不同类型构件应力、应变的分布特点,提出了换算内力对比法,将试验应变实测值换算成试验内力值,与有限元分析的理论计算内力值进行对比,得出了较为满意的结果。这一方法对于提升试验成果应用价值,帮助设计人员深刻认识结构传力途径,进一步优化结构设计具有重要意义。  相似文献   

4.
In the statistics literature, a number of procedures have been proposed for testing equality of several groups’ covariance matrices when data are complete, but this problem has not been considered for incomplete data in a general setting. This paper proposes statistical tests for equality of covariance matrices when data are missing. A Wald test (denoted by T1), a likelihood ratio test (LRT) (denoted by R), based on the assumption of normal populations are developed. It is well-known that for the complete data case the classic LRT and the Wald test constructed under the normality assumption perform poorly in instances when data are not from multivariate normal distributions. As expected, this is also the case for the incomplete data case and therefore has led us to construct a robust Wald test (denoted by T2) that performs well for both normal and non-normal data. A re-scaled LRT (denoted by R*) is also proposed. A simulation study is carried out to assess the performance of T1, T2, R, and R* in terms of closeness of their observed significance level to the nominal significance level as well as the power of these tests. It is found that T2 performs very well for both normal and non-normal data in both small and large samples. In addition to its usual applications, we have discussed the application of the proposed tests in testing whether a set of data are missing completely at random (MCAR).  相似文献   

5.
The clustered logrank test is a nonparametric method of significance testing for correlated survival data. Examples of its application include cluster randomized trials where groups of patients rather than individuals are randomized to either a treatment or a control intervention. We describe a SAS macro that implements the 2-sample clustered logrank test for data where the entire cluster is randomized to the same treatment group. We discuss the theory and applications behind this test as well as details of the SAS code.  相似文献   

6.
A new bootstrap test is introduced that allows for assessing the significance of the differences between stochastic algorithms in a cross-validation with repeated folds experimental setup. Intervals are used for modeling the variability of the data that can be attributed to the repetition of learning and testing stages over the same folds in cross validation. Numerical experiments are provided that support the following three claims: (1) Bootstrap tests can be more powerful than ANOVA or Friedman test for comparing multiple classifiers. (2) In the presence of outliers, interval-valued bootstrap tests achieve a better discrimination between stochastic algorithms than nonparametric tests. (3) Choosing ANOVA, Friedman or Bootstrap can produce different conclusions in experiments involving actual data from machine learning tasks.  相似文献   

7.
We present an ongoing research project aimed at developing a framework for component-based testing, in which we re-use and suitably combine some existing tools: the system architecture and the components are specified by the UML, and specifically the recently proposed UML Components methodology; the test cases are derived by applying the Cow_Suite, an environment for UML-based testing, previously conceived for the integration testing of OO systems; and the tests are codified and executed within the CDT, a framework under development, allowing for the decoupling between the abstract specification of tests, which is made against an architectural model, and their concrete execution, which needs to take into account the component implementations.  相似文献   

8.
Given a data set and a number of supervised learning algorithms, we would like to find the algorithm with the smallest expected error. Existing pairwise tests allow a comparison of two algorithms only; range tests and ANOVA check whether multiple algorithms have the same expected error and cannot be used for finding the smallest. We propose a methodology, the multitest algorithm, whereby we order supervised learning algorithms taking into account 1) the result of pairwise statistical tests on expected error (what the data tells us), and 2) our prior preferences, e.g., due to complexity. We define the problem in graph-theoretic terms and propose an algorithm to find the "best" learning algorithm in terms of these two criteria, or in the more general case, order learning algorithms in terms of their "goodness." Simulation results using five classification algorithms on 30 data sets indicate the utility of the method. Our proposed method can be generalized to regression and other loss functions by using a suitable pairwise test.  相似文献   

9.
Insulin sensitivity (SI) is useful in the diagnosis, screening and treatment of diabetes. However, most current tests cannot provide an accurate, immediate or real-time estimate. The DISTq method does not require insulin or C-peptide assays like most SI tests, thus enabling real-time, low-cost SI estimation. The method uses a posteriori parameter estimations in the absence of insulin or C-peptide assays to simulate accurate, patient-specific, insulin concentrations that enable SI identification.Mathematical functions for the a posteriori parameter estimates were generated using data from 46 fully sampled DIST tests (glucose, insulin and C-peptide). SI values found using the DISTq from the 46 test pilot cohort and a second independent 218 test cohort correlated R = 0.890 and R = 0.825, respectively, to the fully sampled (including insulin and C-peptide assays) DIST SI metrics. When the a posteriori insulin estimation functions were derived using the second cohort, correlations for the pilot and second cohorts reduced to 0.765 and 0.818, respectively.These results show accurate SI estimation is possible in the absence of insulin or C-peptide assays using the proposed method. Such estimates may only need to be generated once and then used repeatedly in the future for isolated cohorts. The reduced correlation using the second cohort was due to this cohort's bias towards low SI insulin resistant subjects, limiting the data set's ability to generalise over a wider range. All the correlations remain high enough for the DISTq to be a useful test for a number of clinical applications. The unique real-time results can be generated within minutes of testing as no insulin and C-peptide assays are required and may enable new clinical applications.  相似文献   

10.
Suppose learners use their free time to go online to review course materials, and they do so by taking optional tests that consist of multiple‐choice questions (MCQs). What will happen if, for every practice question, there is always a choice (out of four possible choices) that is marked as “the (current) hot choice?” Will this make any difference in learning effects? To answer this question, an educational experiment was conducted. It was found that “hot designations” helped the experimental group perform significantly better in both the immediate post‐test exam and a delayed post‐test exam and that learners with higher levels of initial knowledge benefited more from this review strategy. From the results of a follow‐up questionnaire and one‐on‐one interviews, it was found that the proposed review strategy promoted a more thorough thinking style in subjects of the experimental group.  相似文献   

11.
针对飞轮组件在测试过程中所面临的测试个数多、测试项目多的特点,而且要求对测试数据有强大的后处理功能的实际需求,设计了这套可同时测量最多6个同种型号飞轮组件的测试系统,并通过选用不同的转接箱,达到可测量多种型号飞轮组件的目的;详细描述了系统的组成、各部分的功能、总体结构、硬件的配置及其软件的设计等内容,并给出了系统的测试原理及测试结果;该系统不仅使飞轮组件的测试从手动到自动,还利用了网络,实现了真正意义上的远程操控,使工作人员不到现场就不能测试这一顽疾得以解决;实际应用证明,该测试系统可靠性高,使用方便,灵活,能够满足飞轮组件的测试要求.  相似文献   

12.
随机性检测是研究密码算法基础理论的重要内容。为了有效测试与鉴别物联网系统中Zigbee协议采用密码算法的安全性,本文结合Zigbee网络的特点,在二元矩阵秩检验的基础上对测试方式进行了合理的组织与划分,提出基于矩阵概率检验的Zigbee随机性检测方法,解决了二元矩阵秩检验单纯从线性相关性判断随机序列的片面性,并能有效地判断Zigbee协议是否实施了加密机制以及加密强度。仿真结果表明,该算法具有误差较小、可靠性高等特点,检测结果更具说服力,为物联网系统的信息安全测评提供理论和实践指导。  相似文献   

13.
Statistical tests are powerful tools for data analysis. Kruskal–Wallis test is a non-parametric statistical test that evaluates whether two or more samples are drawn from the same distribution. It is commonly used in various areas. But sometimes, the use of the method is impeded by privacy issues raised in fields such as biomedical research and clinical data analysis because of the confidential information contained in the data. In this work, we give a privacy-preserving solution for the Kruskal–Wallis test which enables two or more parties to coordinately perform the test on the union of their data without compromising their data privacy. To the best of our knowledge, this is the first work that solves the privacy issues in the use of the Kruskal–Wallis test on distributed data.  相似文献   

14.
Multiple t tests at a fixed p level are frequently used to analyse biomedical data where analysis of variance followed by multiple comparisons or the adjustment of the p values according to Bonferroni would be more appropriate. The Kruskal-Wallis test is a nonparametric 'analysis of variance' which may be used to compare several independent samples. The present program is written in an elementary subset of BASIC and will perform Kruskal-Wallis test followed by multiple comparisons between the groups on practically any computer programmable in BASIC.  相似文献   

15.
Markov chain usage models support test planning, test automation, and analysis of test results. In practice, transition probabilities for Markov chain usage models are often specified using a cycle of assigning, verifying, and revising specific values for individual transition probabilities. For large systems, such an approach can be difficult for a variety of reasons. We describe an improved approach that represents transition probabilities by explicitly preserving the information concerning test objectives and the relationships between transition probabilities in a format that is easy to maintain and easy to analyze. Using mathematical programming, transition probabilities are automatically generated to satisfy test management objectives and constraints. A more mathematical treatment of this approach is given in References [ 1 ] (Poore JH, Walton GH, Whittaker JA. A constraint‐based approach to the representation of software usage models. Information and SoftwareTechnology 2000; at press) and [ 2 ] (Walton GH. Generating transition probabilities for Markov chain usage models. PhD Thesis, University of Tennessee, Knoxville, TN, May 1995.). In contrast, this paper is targeted at the software engineering practitioner, software development manager, and test manager. This paper also adds to the published literature on Markov chain usage modeling and model‐based testing by describing and illustrating an iterative process for usage model development and optimization and by providing some recommendations for embedding model‐based testing activities within an incremental development process. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

16.
Many methods have been proposed for comparing the medians of J independent groups. Generally, however, extant techniques require very restrictive assumptions or they are known to perform in an unsatisfactory manner in simulations. Included are many well-known rank-based methods plus certain types of bootstrap techniques. One goal here is to point out that two recently proposed methods also perform poorly when there are tied values. Another goal is to examine the small sample properties of several alternative methods that have not been previously studied. The main result is that a multiple comparison technique (called method R), is the only method to perform well in all the situations considered here. For an omnibus test with J>2 groups and no tied values, two methods are found that control Type I error probabilities reasonably well, one of which is based in part on results in Liu and Singh [1997. Notions of limiting p-values based on data depth and bootstrap. J. Amer. Statist. Assoc. 92, 266-277]. With tied values, the second method is found to be more satisfactory, but even it can perform poorly.  相似文献   

17.
《Computers & Geosciences》1987,13(2):185-208
ANGLE interprets measurements of either directional (modulo 360) or axial (modulo 180) orientation data on the circle, such as vertical dike azimuths, tectonic lineaments, glacial striae, crossbed directions, crystal or fossil orientations on foliation surfaces, etc. The user is allowed extensive selection of input-data formats, notations, etc. Single samples (sets of measurements) can be tested for a null hypothesis of uniformity (i.e. lack of preferred orientation), using the nonparametric Hodges-Ajne. Kuiper, and Watson U2 tests and the parametric Rayleigh tests, against the following alternative hypotheses: (1) a single, unspecified preferred orientation; (2) a single, prespecified mean orientation; or (3) for directional data only, a preferred axial orientation (i.e. a bimodal distribution with two diagonal modes). Estimated mean directions and concentration parameters also are calculated for each input file, assuming a Von Mises distribution. For two or more samples, the nonparametric Mardia (uniform scores) and Watson U2 tests, and the parametric Watson-Williams test can be used additionally to indicate whether the samples may come from populations with (1) equal mean directions, or (2) mean directions differing by a predetermined amount.  相似文献   

18.
"Missing is useful": missing values in cost-sensitive decision trees   总被引:3,自引:0,他引:3  
Many real-world data sets for machine learning and data mining contain missing values and much previous research regards it as a problem and attempts to impute missing values before training and testing. In this paper, we study this issue in cost-sensitive learning that considers both test costs and misclassification costs. If some attributes (tests) are too expensive in obtaining their values, it would be more cost-effective to miss out their values, similar to skipping expensive and risky tests (missing values) in patient diagnosis (classification). That is, "missing is useful" as missing values actually reduces the total cost of tests and misclassifications and, therefore, it is not meaningful to impute their values. We discuss and compare several strategies that utilize only known values and that "missing is useful" for cost reduction in cost-sensitive decision tree learning.  相似文献   

19.
许瑾晨  黄永忠  郭绍忠  周蓓  赵捷 《软件学报》2015,26(6):1306-1321
数学函数库作为CPU软件的重要组成部分,对于高性能计算机平台上的科学计算、工程数值计算起着极为关键的作用.现有的测试工具只能片面地对函数库进行测试,没有从正确性、精度和函数性能这3方面加以考虑,而且往往只针对一类目标体系结构,适用性有限.针对现有测试工具的缺陷,提出了面向多目标体系结构、全面可复用的一体化测试平台BMltest(basic math library test).测试平台结合函数特征值、IEEE-754特殊数以及利用浮点数生成规则实现的全浮点域指数分布的IEEE-754规范数构造了测试集,有效提高了测试集浮点数的覆盖率;提出了基于多精度库MPFR(multiple-precision floating-point reliable library)的精度测试方法,提高了精度测试的可靠性;提出了基于代码隔离的性能测试方法,最大限度地降低了外部环境对性能测试的干扰.针对大量的浮点测试结果,给出了合理的结果评价方案.测试平台使用的测试集数据与函数做到了相关性的极大分离,保证了测试方法的普适性.通过对包括GNU,Open64及Mlib函数库内所有855个函数的测试结果表明:BMltest平台的测试数据集更全面、有效,精度测试方法更可靠;与其他测试平台相比,性能测试结果更准确、稳定.  相似文献   

20.
The three likelihood-based tests, namely, likelihood ratio test, Rao score test, and Wald test and two more asymptotic tests which use Srivastava's estimator of intraclass correlation coefficient are considered to test the null hypothesis of equality of intraclass correlation coefficients when the families have unequal number of children. Methods are illustrated on Galton's data set. Using simulation experiment we compute the sizes and powers of these tests and compare. It is found that our proposed test using Srivastava's estimator and the score test perform the best among all tests.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号