首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Because the probability of obtaining an experimental finding given that the null hypothesis is true [p(H?/F)] is not the same as the probability that the null hypothesis is true given a finding [p(H?/F)], calculating the former probability does not justify conclusions about the latter one. As the standard null-hypothesis significance-testing procedure does just that, it is logically invalid (J. Cohen, 1994). Theoretically, Bayes's theorem yields [p(H?/F)], but in practice, researchers rarely know the correct values for 2 of the variables in the theorem. Nevertheless, by considering a wide range of possible values for the unknown variables, it is possible to calculate a range of theoretical values for [p(H?/F)] and to draw conclusions about both hypothesis testing and theory evaluation. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

2.
Null hypothesis significance testing has dominated quantitative research in education and psychology. However, the statistical significance of a test as indicated by a p-value does not speak to the practical significance of the study. Thus, reporting effect size to supplement p-value is highly recommended by scholars, journal editors, and academic associations. As a measure of practical significance, effect size quantifies the size of mean differences or strength of associations and directly answers the research questions. Furthermore, a comparison of effect sizes across studies facilitates meta-analytic assessment of the effect size and accumulation of knowledge. In the current comprehensive review, we investigated the most recent effect size reporting and interpreting practices in 1,243 articles published in 14 academic journals from 2005 to 2007. Overall, 49% of the articles reported effect size—57% of which interpreted effect size. As an empirical study for the sake of good research methodology in education and psychology, in the present study we provide an illustrative example of reporting and interpreting effect size in a published study. Furthermore, a 7-step guideline for quantitative researchers is also summarized along with some recommended resources on how to understand and interpret effect size. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

3.
The test of significance does not provide the information concerning psychological phenomena characteristically attributed to it; and a great deal of mischief has been associated with its use. The basic logic associated with the test of significance is reviewed. The null hypothesis is characteristically false under any circumstances. Publication practices foster the reporting of small effects in populations. Psychologists have "adjusted" by misinterpretation, taking the p value as a "measure," assuming that the test of significance provides automaticity of inference, and confusing the aggregate with the general. The difficulties are illuminated by bringing to bear the contributions from the decision-theory school on the Fisher approach. The Bayesian approach is suggested. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

4.
Increasing emphasis has been placed on the use of effect size reporting in the analysis of social science data. Nonetheless, the use of effect size reporting remains inconsistent, and interpretation of effect size estimates continues to be confused. Researchers are presented with numerous effect sizes estimate options, not all of which are appropriate for every research question. Clinicians also may have little guidance in the interpretation of effect sizes relevant for clinical practice. The current article provides a primer of effect size estimates for the social sciences. Common effect sizes estimates, their use, and interpretations are presented as a guide for researchers. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

5.
In his article, “An alternative to null-hypothesis significance tests,” Killeen (2005) urged the discipline to abandon the practice of pobs-based null hypothesis testing and to quantify the signal-to-noise characteristics of experimental outcomes with replication probabilities. He described the coefficient that he invented, prep, as the probability of obtaining “an effect of the same sign as that found in an original experiment” (Killeen, 2005, p. 346). The journal Psychological Science quickly came to encourage researchers to employ prep, rather than pobs, in the reporting of their experimental findings. In the current article, we (a) establish that Killeen's derivation of prep contains an error, the result of which is that prep is not, in fact, the probability that Killeen set out to derive; (b) establish that prep is not a replication probability of any kind but, rather, is a quasi-power coefficient; and (c) suggest that Killeen has mischaracterized both the relationship between replication probabilities and statistical inference, and the kinds of claims that are licensed by knowledge of the value assumed by the replication probability that he attempted to derive. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

6.
Some methodologists have recently suggested that scientific psychology's overreliance on null hypothesis significance testing (NHST) impedes the progress of the discipline. In response, a number of defenders have maintained that NHST continues to play a vital role in psychological research. Both sides of the argument to date have been presented abstractly. The authors take a different approach to this issue by illustrating the use of NHST along with 2 possible alternatives (meta-analysis as a primary data analysis strategy and Bayesian approaches) in a series of 3 studies. Comparing and contrasting the approaches on actual data brings out the strengths and weaknesses of each approach. The exercise demonstrates that the approaches are not mutually exclusive but instead can be used to complement one another. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

7.
Null hypothesis statistical testing (NHST) has been debated extensively but always successfully defended. The technical merits of NHST are not disputed in this article. The widespread misuse of NHST has created a human factors problem that this article intends to ameliorate. This article describes an integrated, alternative inferential confidence interval approach to testing for statistical difference, equivalence, and indeterminacy that is algebraically equivalent to standard NHST procedures and therefore exacts the same evidential standard. The combined numeric and graphic tests of statistical difference, equivalence, and indeterminacy are designed to avoid common interpretive problems associated with NHST procedures. Multiple comparisons, power, sample size, test reliability, effect size, and cause-effect ratio are discussed. A section on the proper interpretation of confidence intervals is followed by a decision rule summary and caveats. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

8.
Reports an error in "Effect sizes for experimenting psychologists" by Ralph L. Rosnow and Robert Rosenthal (Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale, 2003[Sep], Vol 57[3], 221-237). A portion of the note to Table 1 was incorrect. The second sentence of the note should read as follows: Fisher’s ?r is the log transformation of r, that is, ? loge [(1 + r)/(1 - r)]. (The following abstract of the original article appeared in record 2003-08374-009.) [Correction Notice: An erratum for this article was reported in Vol 63(1) of Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale (see record 2009-03064-004). Correction for Note in TABLE 1 (on page 222): The second sentence should read as follows: Fisher’s zr is the log transformation of r, that is, 1?2 loge[(1 + r)/(1 ? r)].] This article describes three families of effect size estimators and their use in situations of general and specific interest to experimenting psychologists. The situations discussed include both between- and within-group (repeated measures) designs. Also described is the counternull statistic, which is useful in preventing common errors of interpretation in null hypothesis significance testing. The emphasis is on correlation (r-type) effect size indicators, but a wide variety of difference-type and ratio-type effect size estimators are also described. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

9.
Researchers have looked at comparisons between medical epidemiological research and psychological research using effect size r in an effort to compare relative effects. Often the outcomes of such efforts have demonstrated comparatively low effects for medical epidemiology research in comparison with effect sizes seen in psychology. The conclusion has often been that relatively small effects seen in psychology research are as strong as those found in important epidemiological medical research. The author suggests that many of the calculated effect sizes from medical epidemiological research on which this conclusion has been based are flawed. Specifically, rather than calculating effect sizes for treatment, many results have been for a Treatment Effect × Disease Effect interaction that was irrelevant to the main study hypothesis. A technique for developing a “hypothesis-relevant” effect size r is proposed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

10.
The authors demonstrated that the most common statistical significance test used with rWG-type interrater agreement indexes in applied psychology, based on the chi-square distribution, is flawed and inaccurate. The chi-square test is shown to be extremely conservative even for modest, standard significance levels (e.g., .05). The authors present an alternative statistical significance test, based on Monte Carlo procedures, that produces the equivalent of an approximate randomization test for the null hypothesis that the actual distribution of responding is rectangular and demonstrate its superiority to the chi-square test. Finally, the authors provide tables of critical values and offer downloadable software to implement the approximate randomization test for rWG type and for average deviation (AD)-type interrater agreement indexes. The implications of these results for studying a broad range of interrater agreement problems in applied psychology are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

11.
Reviews Kline's book (see record 2004-13019-000) which reviews the controversy regarding significance testing, offers methods for effect size and confidence interval estimation, and suggests some alternative methodologies. Whether or not one accepts Kline's view of the future of statistical significance testing, there is much of value in this book. As a textbook, it could serve as a reference for an upper level undergraduate course but it would be more appropriate for a graduate course. The book is a thought-provoking examination of the uneasy alliance between null hypothesis significance testing, and effect size and confidence interval estimation. There is much in this book for those on both sides of the null hypothesis testing debate and for those unsure where they stand. Whatever the future holds, Kline has done well in illustrating recent advances to statistical decision-making. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

12.
Selected literature related to statistical testing is reviewed to compare the theoretical models underlying parametric and nonparametric inference. Specifically, we show that these models evaluate different hypotheses, are based on different concepts of probability and resultant null distributions, and support different substantive conclusions. We suggest that cognitive scientists should be aware of both models, thus providing them with a better appreciation of the implications and consequences of their choices among potential methods of analysis. This is especially true when it is recognized that most cognitive science research employs design features that do not justify parametric procedures, but that do support nonparametric methods of analysis, particularly those based on the method of permutation/randomization. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

13.
When psychologists test a commonsense (CS) hypothesis and obtain no support, they tend to erroneously conclude that the CS belief is wrong. In many such cases it appears, after many years, that the CS hypothesis was valid after all. It is argued that this error of accepting the "theoretical" null hypothesis reflects confusion between the operationalized hypothesis and the theory or generalization that it is designed to test. That is, on the basis of reliable null data one can accept the operationalized null hypothesis (e.g., "A measure of attitude x is not correlated with a measure of behavior y"). In contrast, one cannot generalize from the findings and accept the abstract or theoretical null (e.g., "We know that attitudes do not predict behavior"). The practice of accepting the theoretical null hypothesis hampers research and reduces the trust of the public in psychological research. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

14.
The purpose of the recently proposed prep statistic is to estimate the probability of concurrence, that is, the probability that a replicate experiment yields an effect of the same sign (Killeen, 2005a). The influential journal Psychological Science endorses prep and recommends its use over that of traditional methods. Here we show that prep overestimates the probability of concurrence. This is because prep was derived under the assumption that all effect sizes in the population are equally likely a priori. In many situations, however, it is advisable also to entertain a null hypothesis of no or approximately no effect. We show how the posterior probability of the null hypothesis is sensitive to a priori considerations and to the evidence provided by the data; and the higher the posterior probability of the null hypothesis, the smaller the probability of concurrence. When the null hypothesis and the alternative hypothesis are equally likely a priori, prep may overestimate the probability of concurrence by 30% and more. We conclude that prep provides an upper bound on the probability of concurrence, a bound that brings with it the danger of having researchers believe that their experimental effects are much more reliable than they actually are. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

15.
In a sample of 12,030 subjects, ranging in age from 8 to 99 years, significant decreases in both mixed and consistent left-handedness were found as age increased. There were also significant sex differences, with males more likely to be left- or mixed-handed. These age and sex differences were reported as non-significant in Porac's (1993) smaller sample of 654. Methodological issues associated with asserting the null hypothesis in handedness studies when statistical power is low are also discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

16.
Wider use in psychology of confidence intervals (CIs), especially as error bars in figures, is a desirable development. However, psychologists seldom use CIs and may not understand them well. The authors discuss the interpretation of figures with error bars and analyze the relationship between CIs and statistical significance testing. They propose 7 rules of eye to guide the inferential use of figures with error bars. These include general principles: Seek bars that relate directly to effects of interest, be sensitive to experimental design, and interpret the intervals. They also include guidelines for inferential interpretation of the overlap of CIs on independent group means. Wider use of interval estimation in psychology has the potential to improve research communication substantially. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

17.
A new approach for using path analysis to appraise the verisimilitude of theories is described. Rather than trying to test a model's truth (correctness), this method corroborates a class of path diagrams by determining how well they predict intradata relations in comparison with other diagrams. The observed correlation matrix is partitioned into disjoint sets. One set is used to estimate the model parameters, and a nonoverlapping set is used to assess the model's verisimilitude. Computer code was written to generate competing models and to test the conjectured model's superiority (relative to the generated set) using diagram combinatorics and is available on the Web. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

18.
Classic parametric statistical significance tests, such as analysis of variance and least squares regression, are widely used by researchers in many disciplines, including psychology. For classic parametric tests to produce accurate results, the assumptions underlying them (e.g., normality and homoscedasticity) must be satisfied. These assumptions are rarely met when analyzing real data. The use of classic parametric methods with violated assumptions can result in the inaccurate computation of p values, effect sizes, and confidence intervals. This may lead to substantive errors in the interpretation of data. Many modern robust statistical methods alleviate the problems inherent in using parametric methods with violated assumptions, yet modern methods are rarely used by researchers. The authors examine why this is the case, arguing that most researchers are unaware of the serious limitations of classic methods and are unfamiliar with modern alternatives. A range of modern robust and rank-based significance tests suitable for analyzing a wide range of designs is introduced. Practical advice on conducting modern analyses using software such as SPSS, SAS, and R is provided. The authors conclude by discussing robust effect size indices. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

19.
Underpowered studies persist in the psychological literature. This article examines reasons for their persistence and the effects on efforts to create a cumulative science. The "curse of multiplicities" plays a central role in the presentation. Most psychologists realize that testing multiple hypotheses in a single study affects the Type I error rate, but corresponding implications for power have largely been ignored. The presence of multiple hypothesis tests leads to 3 different conceptualizations of power. Implications of these 3 conceptualizations are discussed from the perspective of the individual researcher and from the perspective of developing a coherent literature. Supplementing significance tests with effect size measures and confidence intervals is shown to address some but not necessarily all problems associated with multiple testing. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

20.
The relationship between a classic 1953 study by R. L. Solomon and L. C. Wynne on traumatic avoidance learning and the pioneering efforts by Robert Bush and Frederick Mosteller and others to develop mathematical models of learning is analyzed. The main purpose is to explore how Bush and Mosteller disembedded a carefully selected set of Solomon and Wynne's data from its original context, which allowed something as seemingly humble as a set of numbers to become a widely available and valuable resource for the newly emerging field of mathematical learning theory (MLT). The creative use that the MLT community made of these data once Bush and Mosteller had systematically reduced the empirical and conceptual uncertainties within Solomon and Wynne's study is also discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号