共查询到20条相似文献,搜索用时 10 毫秒
1.
Critics have put forth several arguments against the use of tests of statistical significance (TOSSes). Among these, the converse inequality argument stands out but remains sketchy, as does criticism of it. The argument states that we want P(HΔ) (where H and D represent hypothesis and data, respectively), we get P(DΗ), and the 2 do not equal one another. Each of the terms in 'P(DΗ)?≠?P(HΔ)' requires clarification. Furthermore, the argument as a whole allows for multiple interpretations. If the argument questions the logic of TOSSes, then defenses of TOSSes fall into 2 distinct types. Clarification and analysis of the argument suggest more moderate conclusions than previously offered by friends and critics of TOSSes. Furthermore, the general method of clarification through formalization may offer a way out of the current impasse. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
2.
"Inasmuch as explicit terminology is needed to convey the probabilities of committing statistical errors in the respective areas of interval estimation and testing hypotheses, the concept of confidence should never be associated with the statistical test of an H regardless of the nature of the test being employed." (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
3.
The author defends the use of parametric tests (Boneau, 1960), and has been challenged on more than one occasion to justify the use of the t test in many typical psychological situations where there are measurement considerations. Intelligence is often given as an instance, the point being that intelligence is actually measured by an ordinal scale, that equal differences between scores represent different magnitudes at different places on the underlying continuum. This is seen as somehow invalidating the use of the t test with such scores. Burke (1953) has presented an argument which should have ended further discussion, but, in view of the present concern, a restatement of the argument and the addition of a few comments would seem indicated. The present concern seems to have been stimulated by the publication by psychologists of two texts in the field of statistics (Senders, 1958; Siegel, 1956) both of which are organized around Stevens' (1951) system of classifying measurement scales. Siegel and Senders belabor the point that parametric statistics, specifically the t and F tests should be avoided when the measurement scales are no stronger than ordinal, a state of affairs purportedly typical in psychology. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
4.
Calculation of the statistical power of statistical tests is important in planning and interpreting the results of research studies, including meta-analyses. It is particularly important in moderator analyses in meta-analysis, which are often used as sensitivity analyses to rule out moderator effects but also may have low statistical power. This article describes how to compute statistical power of both fixed- and mixed-effects moderator tests in meta-analysis that are analogous to the analysis of variance and multiple regression analysis for effect sizes. It also shows how to compute power of tests for goodness of fit associated with these models. Examples from a published meta-analysis demonstrate that power of moderator tests and goodness-of-fit tests is not always high. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
5.
Reviews the misuse of statistical tests in psychotherapy research studies published in the Journal of Consulting and Clinical Psychology in the years 1967–1968, 1977–1978, and 1987–1988. It focuses on 3 major problems in statistical practice: inappropriate uses of null hypothesis tests and p values, neglect of effect size, and inflation of Type 1 error rate. The impressive frequency of these problems is documented, and changes in statistical practices over the past 3 decades are interpreted in light of trends in psychotherapy research. The article concludes with practical suggestions for rational application of statistical tests. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
6.
Mallinckrodt Brent; Abraham W. Todd; Wei Meifen; Russell Daniel W. 《Canadian Metallurgical Quarterly》2006,53(3):372
P. A. Frazier, A. P. Tix, and K. E. Barron (2004) highlighted a normal theory method popularized by R. M. Baron and D. A. Kenny (1986) for testing the statistical significance of indirect effects (i.e., mediator variables) in multiple regression contexts. However, simulation studies suggest that this method lacks statistical power relative to some other approaches. The authors describe an alternative developed by P. E. Shrout and N. Bolger (2002) based on bootstrap resampling methods. An example and step-by-step guide for performing bootstrap mediation analyses are provided. The test of joint significance is also briefly described as an alternative to both the normal theory and bootstrap methods. The relative advantages and disadvantages of each approach in terms of precision in estimating confidence intervals of indirect effects, Type I error, and Type II error are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
7.
Hausknecht John P.; Halpert Jane A.; Di Paolo Nicole T.; Moriarty Gerrard Meghan O. 《Canadian Metallurgical Quarterly》2007,92(2):373
Previous studies have indicated that as many as 25% to 50% of applicants in organizational and educational settings are retested with measures of cognitive ability. Researchers have shown that practice effects are found across measurement occasions such that scores improve when these applicants retest. In this study, the authors used meta-analysis to summarize the results of 50 studies of practice effects for tests of cognitive ability. Results from 107 samples and 134,436 participants revealed an adjusted overall effect size of .26. Moderator analyses indicated that effects were larger when practice was accompanied by test coaching and when identical forms were used. Additional research is needed to understand the impact of retesting on the validity inferences drawn from test scores. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
8.
Dunlap William P.; Burke Michael J.; Smith-Crowe Kristin 《Canadian Metallurgical Quarterly》2003,88(2):356
The authors demonstrated that the most common statistical significance test used with rWG-type interrater agreement indexes in applied psychology, based on the chi-square distribution, is flawed and inaccurate. The chi-square test is shown to be extremely conservative even for modest, standard significance levels (e.g., .05). The authors present an alternative statistical significance test, based on Monte Carlo procedures, that produces the equivalent of an approximate randomization test for the null hypothesis that the actual distribution of responding is rectangular and demonstrate its superiority to the chi-square test. Finally, the authors provide tables of critical values and offer downloadable software to implement the approximate randomization test for rWG type and for average deviation (AD)-type interrater agreement indexes. The implications of these results for studying a broad range of interrater agreement problems in applied psychology are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
9.
"Construct validation was introduced in order to specify types of research required in developing tests for which the conventional views on validation are inappropriate. Personality tests, and some tests of ability, are interpreted in terms of attributes for which there is no adequate criterion. This paper indicates what sorts of evidence can substantiate such an interpretation, and how such evidence is to be interpreted." 60 references. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
10.
A decade ago, a meta-analysis showed that identification of a suspect from a sequential lineup versus a simultaneous lineup was more diagnostic of guilt (Steblay, Dysart, Fulero, & Lindsay, 2001). Since then, controversy and debate regarding sequential superiority has emerged. We report the results of a new meta-analysis involving 72 tests of simultaneous and sequential lineups from 23 different labs involving 13,143 participant-witnesses. The results are very similar to the 2001 results in showing that the sequential lineup is less likely to result in an identification of the suspect, but also more diagnostic of guilt than is the simultaneous lineup. An examination of the full diagnostic design dataset (27 tests that used the full simultaneous/sequential × culprit-present/culprit-absent design) showed that the average gap in correct identifications favoring the simultaneous lineup over the sequential lineup—8%—is smaller than the 15% figure obtained from the 2001 meta-analysis (and from the current full 72-test dataset). The lower error rate incurred for culprit-absent lineups with use of a sequential format remains consistent across the years, with 22% fewer errors than simultaneous lineups. A Bayesian analysis shows that the posterior probability of guilt following an identification of the suspect is higher for the sequential lineup across the entire base rate for culprit presence/absence. New ways to think about policy issues are discussed. (PsycINFO Database Record (c) 2011 APA, all rights reserved) 相似文献
11.
Null hypothesis statistical testing (NHST) has been debated extensively but always successfully defended. The technical merits of NHST are not disputed in this article. The widespread misuse of NHST has created a human factors problem that this article intends to ameliorate. This article describes an integrated, alternative inferential confidence interval approach to testing for statistical difference, equivalence, and indeterminacy that is algebraically equivalent to standard NHST procedures and therefore exacts the same evidential standard. The combined numeric and graphic tests of statistical difference, equivalence, and indeterminacy are designed to avoid common interpretive problems associated with NHST procedures. Multiple comparisons, power, sample size, test reliability, effect size, and cause-effect ratio are discussed. A section on the proper interpretation of confidence intervals is followed by a decision rule summary and caveats. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
12.
13.
Hertzog Christopher; Lindenberger Ulman; Ghisletta Paolo; Oertzen Timo von 《Canadian Metallurgical Quarterly》2006,11(3):244
We evaluated the statistical power of single-indicator latent growth curve models (LGCMs) to detect correlated change between two variables (covariance of slopes) as a function of sample size, number of longitudinal measurement occasions, and reliability (measurement error variance). Power approximations following the method of Satorra and Saris (1985) were used to evaluate the power to detect slope covariances. Even with large samples (N=500) and several longitudinal occasions (4 or 5), statistical power to detect covariance of slopes was moderate to low unless growth curve reliability at study onset was above .90. Studies using LGCMs may fail to detect slope correlations because of low power rather than a lack of relationship of change between variables. The present findings allow researchers to make more informed design decisions when planning a longitudinal study and aid in interpreting LGCM results regarding correlated interindividual differences in rates of development. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
14.
From some 150 factors in objective personality tests, 18 potentially invariant patterns have been found by cross matching in all possible ways. These are divided into 12 of a satisfactory degree of invariance and universality, and 6 of lesser statistical significance. The former are discussed in this paper. 63 references. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
15.
The authors (see record 1979-00153-001) argued that the reliability coefficient for the dependent variable in a controlled experiment has no direct relevance for hypothesis testing. Specifically, they demonstrated that increasing the reliability coefficient for the dependent variable did not necessarily increase the power of standard statistical tests. The authors present further evidence that large reliability coefficients are not always desirable in true experiments, and replies to J. P. Sutcliffe's (see record 1980-29332-001) basic criticisms of Nicewander and Price's contentions. (6 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
16.
Reviews the book, Randomization Tests by Eugene S. Edgington (1980). Edgington begins his preface by suggesting that his book has two goals: "a practical guide for experimenters" and "a textbook for courses in applied statistics." As indicated above, the book is not the detailed and authoritative volume which experimenters need as a guide to randomization tests. However, Edgington's cogent criticisms of "the long-standing fiction of random sampling in experimental research" (p. iii) will lead experimenters to consider the merits of randomization tests. Similarly, the book is not thorough enough to be a successful textbook, but it should alert all teachers of statistics and experimental design to the importance of randomization and to the weakness of the random-sampling assumption in most statistical tests. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
17.
We conducted a reliability-generalization meta-analysis of 7 of the most frequently used measures of relationship satisfaction: the Locke–Wallace Marital Adjustment Test (LWMAT), the Kansas Marital Satisfaction Scale (KMS), the Quality of Marriage Index, the Relationship Assessment Scale, the Marital Opinion Questionnaire, Karney and Bradbury's (1997) semantic differential scale, and the Couples Satisfaction Index. Six hundred thirty-nine reliability coefficients from 398 articles and 636,806 individuals provided internal consistency reliability estimates for this meta-analysis. We present the average score reliabilities for each measure, characterize the variance in score reliabilities across studies, and consider sample and study characteristics that are predictive of score reliability. Overall, the KMS and the LWMAT appear to be the strongest and weakest measures, respectively, from a reliability perspective. We discuss the importance of considering reliability invariance when making cross-group comparisons and provide recommendations for researchers when electing a measure of relationship satisfaction. (PsycINFO Database Record (c) 2011 APA, all rights reserved) 相似文献
18.
The test of significance does not provide the information concerning psychological phenomena characteristically attributed to it; and a great deal of mischief has been associated with its use. The basic logic associated with the test of significance is reviewed. The null hypothesis is characteristically false under any circumstances. Publication practices foster the reporting of small effects in populations. Psychologists have "adjusted" by misinterpretation, taking the p value as a "measure," assuming that the test of significance provides automaticity of inference, and confusing the aggregate with the general. The difficulties are illuminated by bringing to bear the contributions from the decision-theory school on the Fisher approach. The Bayesian approach is suggested. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
19.
A report on one university's program and problems in teaching courses in the area of tests and measurements. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
20.
Points out the values of high speed computers in psychological statistical work and proposes that the APA acquire such equipment. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献