首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Calculation of the statistical power of statistical tests is important in planning and interpreting the results of research studies, including meta-analyses. It is particularly important in moderator analyses in meta-analysis, which are often used as sensitivity analyses to rule out moderator effects but also may have low statistical power. This article describes how to compute statistical power of both fixed- and mixed-effects moderator tests in meta-analysis that are analogous to the analysis of variance and multiple regression analysis for effect sizes. It also shows how to compute power of tests for goodness of fit associated with these models. Examples from a published meta-analysis demonstrate that power of moderator tests and goodness-of-fit tests is not always high. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

2.
Shortcut approximate equations are described that provide estimates of the sample size required for 50% power (α?=?0.05, two-tailed) for 1 degree of freedom tests of significance for simple correlations, differences between 2 independent group means, and Pearson's chi-square test for 2?×?2 contingency tables. These sample sizes should be thought of as minima, because power equal to 50% means that the chance of a significant finding is that of flipping a fair coin. A more desirable sample size can be computed by simply doubling the 50% sample sizes, which is shown to result in power between 80% and 90%. With these simple tools, power can be estimated rapidly, which, it is hoped, will lead to greater use and understanding of power in the teaching of statistics and in research. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

3.
I. Olkin and J. D. Finn (1995) presented 2 methods for comparing squared multiple correlation coefficients for 2 independent samples. In 1 method, the researcher constructs a confidence interval for the difference between 2 population squared coefficients; in the 2nd method, a Fisher-type transformation of the sample squared correlation coefficient is used to obtain a test statistic. Both methods are based on asymptotic theory and use approximations to the sampling variance. The approximations are incorrect when the population multiple correlation coefficient is zero. The 2 procedures were examined for equal and unequal population multiple correlation coefficients in combination with equal and unequal sample sizes. As expected, the procedures were inaccurate when the population multiple correlation coefficients were zero or very small and, in some conditions, were inaccurate when sample sizes and coefficients were unequal. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

4.
Suggests that significance tests for 9 of the most common statistical procedures (simple correlation, t test for independent samples, multiple regression analysis, 1-way ANOVA, factorial ANOVA, analysis of covariance, t test for correlated samples, discriminant analysis, and chi-square test of independence) can all be treated as special cases of the test of the null hypothesis in canonical correlation analysis for 2 sets of variables. (15 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

5.
This article discusses power and sample size calculations for observational studies in which the values of the independent variables cannot be fixed in advance but are themselves outcomes of the study. It reviews the mathematical framework applicable when a multivariate normal distribution can be assumed and describes a method for calculating exact power and sample sizes using a series expansion for the distribution of the multiple correlation coefficient. A table of exact sample sizes for level .05 tests is provided. Approximations to the exact power are discussed, most notably those of J. Cohen (1977). A rigorous justification of Cohen's approximations is given. Comparisons with exact answers show that the approximations are quite accurate in many situations of practical interest. More extensive tables and a computer program for exact calculations can be obtained from the authors. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

6.
[Correction Notice: An erratum for this article was reported in Vol 95(5) of Journal of Applied Psychology (see record 2010-18410-004). There was an error in Formula 6 on page 731 for the pooled standard deviation of the ESSD index. The correct formula is given in the erratum. Related to this, in Table 8 on page 739, the ETSSD statistic should have been .094 for the cross cultural comparison and .001 for the Administration Format example.] Much progress has been made in the past 2 decades with respect to methods of identifying measurement invariance or a lack thereof. Until now, the focus of these efforts has been to establish criteria for statistical significance in items and scales that function differently across samples. The power associated with tests of differential functioning, as with all significance tests, is affected by sample size and other considerations. Additionally, statistical significance need not imply practical importance. There is a strong need as such for meaningful effect size indicators to describe the extent to which items and scales function differently. Recently developed effect size measures show promise for providing a metric to describe the amount of differential functioning present between groups. Expanding upon recent developments, this article presents a taxonomy of potential differential functioning effect sizes; several new indices of item and scale differential functioning effect size are proposed and illustrated with 2 data samples. Software created for computing these indices and graphing item- and scale-level differential functioning is described. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

7.
Reports an error in "A taxonomy of effect size measures for the differential functioning of items and scales" by Adam W. Meade (Journal of Applied Psychology, 2010[Jul], Vol 95[4], 728-743). There was an error in Formula 6 on page 731 for the pooled standard deviation of the ESSD index. The correct formula is given in the erratum. Related to this, in Table 8 on page 739, the ETSSD statistic should have been .094 for the cross cultural comparison and .001 for the Administration Format example. (The following abstract of the original article appeared in record 2010-13313-009.) Much progress has been made in the past 2 decades with respect to methods of identifying measurement invariance or a lack thereof. Until now, the focus of these efforts has been to establish criteria for statistical significance in items and scales that function differently across samples. The power associated with tests of differential functioning, as with all significance tests, is affected by sample size and other considerations. Additionally, statistical significance need not imply practical importance. There is a strong need as such for meaningful effect size indicators to describe the extent to which items and scales function differently. Recently developed effect size measures show promise for providing a metric to describe the amount of differential functioning present between groups. Expanding upon recent developments, this article presents a taxonomy of potential differential functioning effect sizes; several new indices of item and scale differential functioning effect size are proposed and illustrated with 2 data samples. Software created for computing these indices and graphing item- and scale-level differential functioning is described. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

8.
Comments that criterion-related validity studies are often not technically feasible because sample sizes are inadequate for necessary statistical power (e.g., .90). Effect sizes are frequently overestimated because of a failure to consider the combined effects of range restriction and criterion unreliability, both of which attenuate validity coefficients. Restricted validities must therefore be estimated by applying appropriate correction formulas. In this study the corrections are made for the multiple prediction case. Required sample sizes, determined using the univariate power model, are presented for a range of unit-weighted predictors, for varying degrees of restriction, and for power levels of .50 and .90. The advantage of multiple predictors is shown by comparing their required sample size to that of the best single predictor. For a given power, effect size is clearly the major determinant of required sample size. Implications for applied research are discussed. (13 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

9.
We previously proposed a class of ordered weighted logrank tests for analysing censored survival data under order restrictions. However, the power of these tests is asymmetrical with respect to possible alternative configurations. While it is superior in most cases, the power can be inferior to the non-ordered logrank test in extreme cases. We propose a modified ordered logrank test which performs uniformly better than the non-ordered test. The power of the modified test is equivalent to the generalized Jonckheere's test but its computation is much simpler. We give sample size requirements for sufficient power to reject the global null hypothesis at specified hazard ratios between the control group and the best group. Following Fisher's least significant difference (LSD) strategy for multiple comparisons, power investigations indicate that the nominal power for the global test carries over to the control versus best comparison during pairwise testing. The power for detecting intermediate survival differences is inadequate but the sample sizes required to detect such differences may be impractical in most applications.  相似文献   

10.
Advantages of using equal sample sizes in 2 sample tests for means, correlations, and proportions are well known. However, in applied research there are frequently circumstances that limit the size of 1 of the 2 samples. This article draws attention to a simple method of determining, from J. Cohen's (1988) tables, the effects that this constraint (on the size of 1 sample) has on the maximum attainable power of these tests. These effects can be extremely serious in the case of small, medium, and even large effect sizes and clearly indicate that availability of a very large 2nd sample may not compensate for a constraint on the size of the 1st sample. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

11.
Correlational analysis is a cornerstone method of statistical analysis, yet most presentations of correlational techniques deal primarily with tests of significance. The focus of this article is obtaining explicit expressions for confidence intervals for functions of simple, partial, and multiple correlations. Not only do these permit tests of hypotheses about differences but they also allow a clear statement about the degree to which correlations differ. Several important differences of correlations for which tests and confidence intervals are not widely known are included among the procedures discussed. Among these is the comparison of 2 multiple correlations based on independent samples. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

12.
13.
Compares the accuracy of several formulas for the standard error of the mean uncorrected correlation in meta-analytic and validity generalization studies. The effect of computing the mean correlation by weighting the correlation in each study by its sample size is also studied. On the basis of formal analysis and simulation studies, it is concluded that the common formula for the sampling variance of the mean correlation, Vr ?=?Vr/K where K is the number of studies in the meta-analysis, gives reasonably accurate results. This formula gives accurate results even when sample sizes and ρs are unequal and regardless of whether or not the statistical artifacts vary from study to study. It is also shown that using sample-size weighting may result in underestimation of the standard error of the mean uncorrected correlation when there are outlier sample sizes. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

14.
Minimal measurement error (reliability) during the collection of interval- and ratio-type data is critically important to sports medicine research. The main components of measurement error are systematic bias (e.g. general learning or fatigue effects on the tests) and random error due to biological or mechanical variation. Both error components should be meaningfully quantified for the sports physician to relate the described error to judgements regarding 'analytical goals' (the requirements of the measurement tool for effective practical use) rather than the statistical significance of any reliability indicators. Methods based on correlation coefficients and regression provide an indication of 'relative reliability'. Since these methods are highly influenced by the range of measured values, researchers should be cautious in: (i) concluding acceptable relative reliability even if a correlation is above 0.9; (ii) extrapolating the results of a test-retest correlation to a new sample of individuals involved in an experiment; and (iii) comparing test-retest correlations between different reliability studies. Methods used to describe 'absolute reliability' include the standard error of measurements (SEM), coefficient of variation (CV) and limits of agreement (LOA). These statistics are more appropriate for comparing reliability between different measurement tools in different studies. They can be used in multiple retest studies from ANOVA procedures, help predict the magnitude of a 'real' change in individual athletes and be employed to estimate statistical power for a repeated-measures experiment. These methods vary considerably in the way they are calculated and their use also assumes the presence (CV) or absence (SEM) of heteroscedasticity. Most methods of calculating SEM and CV represent approximately 68% of the error that is actually present in the repeated measurements for the 'average' individual in the sample. LOA represent the test-retest differences for 95% of a population. The associated Bland-Altman plot shows the measurement error schematically and helps to identify the presence of heteroscedasticity. If there is evidence of heteroscedasticity or non-normality, one should logarithmically transform the data and quote the bias and random error as ratios. This allows simple comparisons of reliability across different measurement tools. It is recommended that sports clinicians and researchers should cite and interpret a number of statistical methods for assessing reliability. We encourage the inclusion of the LOA method, especially the exploration of heteroscedasticity that is inherent in this analysis. We also stress the importance of relating the results of any reliability statistic to 'analytical goals' in sports medicine.  相似文献   

15.
RATIONALE AND OBJECTIVES: Traditionally, multireader receiver operating characteristic (ROC) studies have used a "paired-case, paired-reader" design. The statistical power of such a design for inferences about the relative accuracies of the tests was assessed and compared with alternative designs. METHODS: The noncentrality parameter of an F statistic was used to compute power as a function of the reader and patient sample sizes and the variability and correlation between readings. RESULTS: For a fixed-power and Type I error rate, the traditional design reduces the number of verified cases required. A hybrid design, in which each reader interprets a different sample of patients, reduces the number of readers, total readings, and reading required per reader. The drawback is a substantial increase in the number of verified cases. CONCLUSION: The ultimate choice of study design depends on the nature of the tests being compared, limiting resources, a priori knowledge of the magnitude of the correlations and variability and logistic complexity.  相似文献   

16.
Monte Carlo simulations were conducted to examine the degree to which the statistical power of moderated multiple regression (MMR) to detect the effects of a dichotomous moderator variable was affected by the main and interactive effects of (a) predictor variable range restriction, (b) total sample size, (c) sample sizes for 2 moderator variable-based subgroups, (d) predictor variable intercorrelation, and (e) magnitude of the moderating effect. Results showed that the main and interactive influences of these variables may have profound effects on power. Thus, future attempts to detect moderating effects with MMR should consider the power implications of both the main and interactive effects of the variables assessed in the present study. Otherwise, even moderating effects of substantial magnitude may go undetected. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

17.
The reporting and interpretation of effect sizes in addition to statistical significance tests is becoming increasingly recognized as good research practice, as evidenced by the editorial policies of at least 23 journals that now require effect sizes. Statistical significance tests are limited in the information they provide readers about results, and effect sizes can be useful when evaluating result importance. The current article (a) summarizes statistical versus practical significance, (b) briefly discusses various effect size options, (c) presents a review of research articles published in the International Journal of Play Therapy (1993-2003) regarding use of effect sizes and statistical significance tests, and (d) provides recommendations for improved research practice in the journal and elsewhere. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

18.
The effects of the medium of test administration, paper and pencil vs computerized, were examined for timed power and speeded tests of cognitive abilities for populations of young adults and adults. Meta-analytic techniques were used to estimate the cross-mode correlation after correcting for measurement error. A total of 159 correlations was meta-analyzed: 123 from timed power tests and 36 from speeded tests. The corrected cross-mode correlation was found to be .91 when all correlations were analyzed simultaneously. Speededness was found to moderate the effects of administration mode in that the cross-mode correlation was estimated to be .97 for timed power tests but only .72 for speeded tests. No difference in equivalence was observed between adaptively and conventionally administered computerized tests. Some limitations on the generality of these results are discussed, and directions for future research are outlined. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

19.
Epidemiologic and public health researchers frequently include several dependent variables, repeated assessments, or subgroup analyses in their investigations. These factors result in multiple tests of statistical significance and may produce type 1 experimental errors. This study examined the type 1 error rate in a sample of public health and epidemiologic research. A total of 173 articles chosen at random from 1996 issues of the American Journal of Public Health and the American Journal of Epidemiology were examined to determine the incidence of type 1 errors. Three different methods of computing type 1 error rates were used: experiment-wise error rate, error rate per experiment, and percent error rate. The results indicate a type 1 error rate substantially higher than the traditionally assumed level of 5% (p < 0.05). No practical or statistically significant difference was found between type 1 error rates across the two journals. Methods to determine and correct type 1 errors should be reported in epidemiologic and public health research investigations that include multiple statistical tests.  相似文献   

20.
The more commonly known statistical procedures, such as the t-test, analysis of variance, or chi-squared test, can handle only one dependent variable (DV) at a time. Two types of problems can arise when there is more than one DV: 1. a greater probability of erroneously concluding that there is a significant difference between the groups when in fact there is none (a Type I error); and 2. failure to detect differences between the groups in terms of the patterns of DVs (a Type II error). Multivariate statistics are designed to overcome both of these problems. However, there are costs associated with these benefits, such as increased complexity, decreased power, multiple ways of answering the same question, and ambiguity in the allocation of shared variance. This is the first of a series of articles on multivariate statistical tests which will address these issues and explain their possible uses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号