首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Most psychology journals now require authors to report a sample value of effect size along with hypothesis testing results. The sample effect size value can be misleading because it contains sampling error. Authors often incorrectly interpret the sample effect size as if it were the population effect size. A simple solution to this problem is to report a confidence interval for the population value of the effect size. Standardized linear contrasts of means are useful measures of effect size in a wide variety of research applications. New confidence intervals for standardized linear contrasts of means are developed and may be applied to between-subjects designs, within-subjects designs, or mixed designs. The proposed confidence interval methods are easy to compute, do not require equal population variances, and perform better than the currently available methods when the population variances are not equal. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

2.
Calculating and reporting appropriate measures of effect size are becoming standard practice in psychological research. One of the most common scenarios encountered involves the comparison of 2 groups, which includes research designs that are experimental (e.g., random assignment to treatment vs. placebo conditions) and nonexperimental (e.g., testing for gender differences). Familiar measures such as the standardized mean difference (d) or the point-biserial correlation (rpb) characterize the magnitude of the difference between groups, but these effect size measures are sensitive to a number of additional influences. For example, R. E. McGrath and G. J. Meyer (2006) showed that rpb is sensitive to sample base rates, and extending their analysis to situations of unequal variances reveals that d is, too. The probability-based measure A, the nonparametric generalization of what K. O. McGraw and S. P. Wong (1992) called the common language effect size statistic, is insensitive to base rates and more robust to several other factors (e.g., extreme scores, nonlinear transformations). In addition to its excellent generalizability across contexts, A is easy to understand and can be obtained from standard computer output or through simple hand calculations. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

3.
K. O. McGraw and S. P. Wong (see record 1992-18415-001) described an appealing index of effect size that requires no prior knowledge of statistics to understand. They termed this index the common language effect size indicator (CL): the probability that a score randomly sampled from 1 distribution will be larger than a randomly sampled score from a 2nd distribution. In extending this concept to a bivariate normal distribution, with correlation r, one may think again of randomly sampling 2 individuals; if the 1st individual has a higher score on the 1st variable than the 2nd individual, the CLR in this case is the probability that the 1st individual will also have a higher score on the 2nd variable. An equation for this probability is derived that permits converting any value of r into CLR, the common language effect size index for a bivariate correlation. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

4.
This article presents a statistic for tests of mean equality in between-subjects and within-subjects designs when variances are heterogeneous. The approximate degrees of freedom statistic of S. Johansen (1980) can be used to test main and interaction effects, as well as multiple comparison hypotheses related to these effects. Thus, researchers need only be familiar with a single statistic, rather than the many statistics that have been defined in the literature, to perform these tests of significance. Also included is a computer program to obtain a numerical solution. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

5.
Tests for experiments with matched groups or repeated measures designs use error terms that involve the correlation between the measures as well as the variance of the data. The larger the correlation between the measures, the smaller the error and the larger the test statistic. If an effect size is computed from the test statistic without taking the correlation between the measures into account, effect size will be overestimated. Procedures for computing effect size appropriately from matched groups or repeated measures designs are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

6.
This article shows that measurement invariance (defined in terms of an invariant measurement model in different groups) is generally inconsistent with selection invariance (defined in terms of equal sensitivity and specificity across groups). In particular, when a unidimensional measurement instrument is used and group differences are present in the location but not in the variance of the latent distribution, sensitivity and positive predictive value will be higher in the group at the higher end of the latent dimension, whereas specificity and negative predictive value will be higher in the group at the lower end of the latent dimension. When latent variances are unequal, the differences in these quantities depend on the size of group differences in variances relative to the size of group differences in means. The effect originates as a special case of Simpson's paradox, which arises because the observed score distribution is collapsed into an accept-reject dichotomy. Simulations show the effect can be substantial in realistic situations. It is suggested that the effect may be partly responsible for overprediction in minority groups as typically found in empirical studies on differential academic performance. A methodological solution to the problem is suggested, and social policy implications are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

7.
Estimation of the effect size parameter, D, the standardized difference between population means, is sensitive to heterogeneity of variance (heteroscedasticity), which seems to abound in psychological data. Pooling s2s assumes homoscedasticity, as do methods for constructing a confidence interval for D, estimating D from t or analysis of variance results, formulas that adjust estimates for inflation by main effects or covariates, and the Q statistic. The common language effect size statistic as an estimate of Pr(X??>?X?), the probability that a randomly sampled member of Population 1 will outscore a randomly sampled member of Population 2, also assumes normality and homoscedasticity. Various proposed solutions are reviewed, including measures that do not make these assumptions, such as the probability of superiority estimate of Pr(X??>?X?). Ways to reconceptualize effect size when treatments may affect moments such as the variance are also discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

8.
Much behavioral research involves comparing the central tendencies of different groups, or of the same Ss under different conditions, and the usual analysis is some form of mean comparison. This article suggests that an ordinal statistic, d, is often more appropriate. d compares the number of times a score from one group or condition is higher than one from the other, compared with the reverse. Compared to mean comparisons, d is more robust and equally or more powerful; it is invariant under transformation; and it often conforms more closely to the experimeter's research hypothesis. It is suggested that inferences from d be based on sample estimates of its variance rather than on the more traditional assumption of identical distributions. The statistic is extended to simple repeated measures designs, and ways of extending its use to more complex designs are suggested. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

9.
L. Wilkinson and the Task Force on Statistical Inference (1999) recommended reporting confidence intervals for measures of effect sizes. If the sample size is too small, the confidence interval may be too wide to provide meaningful information. Recently, K. Kelley and J. R. Rausch (2006) used an iterative approach to computer-generate tables of sample size requirements for a standardized difference between 2 means in between-subjects designs. Sample size formulas are derived here for general standardized linear contrasts of k ≥ 2 means for both between-subjects designs and within-subjects designs. Special sample size formulas also are derived for the standardizer proposed by G. V. Glass (1976). (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

10.
This paper investigates the urn sampling analogue for the score statistic relating survival to covariates assuming a proportional hazard model. The exact permutation distribution can be calculated as well as the exact low order moments for arbitrary censoring patterns. The asymptotic distribution of the score statistic is an easy consequence. The method is naturally extended to deal with the multivariate case, time varying covariates and interval censoring. Finally the relationship between the censoring process, the survival times and covariates are studied considering different reference sets for the distribution of the score statistic. Some assumptions about the censoring process are investigated and as a consequence the effect of censoring is clarified.  相似文献   

11.
A general rationale and specific procedures for examining the statistical power characteristics of psychology-of-aging (POA) empirical studies are provided. First, 4 basic ingredients of statistical hypothesis testing are reviewed. Then, 2 measures of effect size are introduced (standardized mean differences and the proportion of variation accounted for by the effect of interest), and methods are given for estimating these measures from already-completed studies. Power and sample size formulas, examples, and discussion are provided for common comparison-of-means designs, including independent samples 1-factor and factorial ANOVA designs, analysis of covariance (ANCOVA) designs, repeated measures (correlated samples) ANOVA designs, and split-plot (combined between- and within-Ss) ANOVA designs. Because of past conceptual differences, special attention is given to the power associated with statistical interactions, and cautions about applying the various procedures are indicated. Illustrative power estimations also are applied to a published study from the literature. POA researchers will be better informed consumers of what they read and more "empowered" with respect to what they research by understanding the important roles played by power and sample size in statistical hypothesis testing. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

12.
The behavior of the L. V. Hedges's (see record 1983-00213-001) Q test for the fixed-effects meta-analytic model was investigated for small and unequal study sample sizes paired with larger numbers of studies, nonnormal score distributions, and unequal variances. The results of a Monte Carlo study indicate that the hypothesis of equal effect sizes tends to be rejected less than expected if smaller study sample sizes are paired with larger numbers of studies; pairing smaller variances with larger sample sizes (or vice versa) leads to this hypothesis being rejected more than expected. The power of the Q test is also less than expected when small study sample sizes are paired with larger numbers of studies. These findings suggest conditions for which the Q test should be used cautiously. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

13.
Conventional statistical approaches rely heavily on the properties of the central limit theorem to bridge the gap between the characteristics of a sample and some theoretical sampling distribution. Problems associated with nonrandom sampling, unknown population distributions, heterogeneous variances, small sample sizes, and missing data jeopardize the assumptions of such approaches and cast skepticism on conclusions. Conventional nonparametric alternatives offer freedom from distribution assumptions, but design limitations and loss of power can be serious drawbacks. With the data-processing capacity of today's computers, a new dimension of distribution-free statistical methods has evolved that addresses many of the limitations of conventional parametric and nonparametric methods. Computer-intensive statistical methods involve reshuffling, resampling, or simulating a data set thousands of times to empirically define a sampling distribution for a chosen test statistic. The only assumption necessary for valid results is the random assignment of experimental units to the test groups or treatments. Application to a real data set illustrates the advantages of these methods, including freedom from distribution assumptions without loss of power, complete choice over test statistics, easy adaptation to design complexities and missing data, and considerable intuitive appeal. The illustrations also reveal that computer-intensive methods can be more time consuming than conventional methods and the amount of computer code required to orchestrate reshuffling, resampling, or simulation procedures can be appreciable.  相似文献   

14.
One of the main objectives in meta-analysis is to estimate the overall effect size by calculating a confidence interval (CI). The usual procedure consists of assuming a standard normal distribution and a sampling variance defined as the inverse of the sum of the estimated weights of the effect sizes. But this procedure does not take into account the uncertainty due to the fact that the heterogeneity variance (τ2) and the within-study variances have to be estimated, leading to CIs that are too narrow with the consequence that the actual coverage probability is smaller than the nominal confidence level. In this article, the performances of 3 alternatives to the standard CI procedure are examined under a random-effects model and 8 different τ2 estimators to estimate the weights: the t distribution CI, the weighted variance CI (with an improved variance), and the quantile approximation method (recently proposed). The results of a Monte Carlo simulation showed that the weighted variance CI outperformed the other methods regardless of the τ2 estimator, the value of τ2, the number of studies, and the sample size. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

15.
Despite the development of procedures for calculating sample size as a function of relevant effect size parameters, rules of thumb tend to persist in designs of multiple regression studies. One explanation for their persistence may be the difficulty in formulating a reasonable a priori value of an effect size to be detected. This article presents methods for calculating effect sizes in multiple regression from a variety of perspectives and also introduces a new method based on an exchangeability structure among predictor variables. No single method is deemed superior, but rather examples show that a combination of methods is likely to be most valuable in many situations. A simulation provides a 2nd explanation for why rules of thumb for choosing sample size have persisted but also shows that the outcome of such underpowered studies will be a literature consisting of seemingly contradictory results. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

16.
Combined significance tests (combined p values) and tests of the weighted mean effect size are used to combine information across studies in meta-analysis. A combined significance test (Stouffer test) is compared with a test based on the weighted mean effect size as tests of the same null hypothesis. The tests are compared analytically in the case in which the within-group variances are known and compared through large-sample theory in the more usual case in which the variances are unknown. Generalizations suggested are then explored through a simulation study. This work demonstrates that the test based on the average effect size is usually more powerful than the Stouffer test unless there is a substantial negative correlation between within-study sample size and effect size. Thus, the test based on the average effect size is generally preferable, and there is little reason to also calculate the Stouffer test. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

17.
18.
We consider a clinical trial in which the outcome can be assessed by a continuous measure and where dropouts tend to have poorer efficacy than completers. When each subject can act as his/her own control, efficacy is measured by the difference between the outcome measurements at two times. When all subjects complete the protocol, a paired t-test can be used to test for a treatment effect, i.e., whether or not the mean difference is zero. When a patient does not return for the final evaluation, a measure of efficacy cannot be computed for that subject. Often, data from dropouts are ignored and only the observed pairs are used to analyze the data. When the reason for dropping out is not random, the result may be misleading. In this paper, we assume that (1) the distribution of the measure of efficacy (i.e., the change between two outcome measurements) is Gaussian, (2) dropouts would have worse efficacy than the median if they were observed, and (3) the dropout rate is less than 50%. We propose a median-based t-like statistic using the sample median in place of the sample mean. The variance of the median is estimated using only data from the complete half-sample, i.e., the half-sample with better efficacy. Simulations under five patterns of dropouts are performed to compare the proposed statistic with the paired t-test. The results show that the median-based statistic provides a conservative bound for the test of significance of the treatment. In contrast, because the paired t-test does not preserve its level of significance, except when the dropout mechanism is uniform, the paired t-test should not be used for trials in which dropouts tend to have poorer efficacy than completers.  相似文献   

19.
Extends statistical theory for procedures based on the Glass estimator of effect size for methods used in the quantitative synthesis of research. An unbiased estimator of effect size is given. A weighted estimator of effect size based on data from several experiments is defined and shown to be optimal (asymptotically efficient). An approximate (large-sample) test for homogeneity of effect size across experiments is also given. The results of an empirical sampling study show that the large-sample distributions of the weighted estimator and the homogeneity statistic are quite accurate when the experimental and control group sample sizes exceed 10 and the effect sizes are smaller than about 1.5. (12 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

20.
Combined 3 factors, parameter used, technique used, and method of control of Type I errors, into a model that includes 100 different statistical tests, of which 64 are defensible. Tests on complex hypotheses about correlations, ρ, proportions, P, and variances, ?–2, comparable to tests on means, μ, are available. For the equal n case, the statistics needed can all be formulated either as t statistics or as omnibus F statistics. The technique factor with 5 levels includes 3 variations whereby a t is contrasted with 1 of 3 critical values appropriate for a given set of contrasts. The F statistic may be used on 1-way or multifactor designs on any of the above parameters. The experiment's design and experimental hypotheses dictate which cells of the crossing of these 2 factors are appropriate. The experimenter's major choice is the method of control of Type I errors. A simultaneous and 4 stepwise methods are discussed as general methods that could be used with most statistics. Setting alpha as the familywise rate of Type I errors and the use of simultaneous methods are recommended. (38 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号