首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 500 毫秒
1.
The test of significance does not provide the information concerning psychological phenomena characteristically attributed to it; and a great deal of mischief has been associated with its use. The basic logic associated with the test of significance is reviewed. The null hypothesis is characteristically false under any circumstances. Publication practices foster the reporting of small effects in populations. Psychologists have "adjusted" by misinterpretation, taking the p value as a "measure," assuming that the test of significance provides automaticity of inference, and confusing the aggregate with the general. The difficulties are illuminated by bringing to bear the contributions from the decision-theory school on the Fisher approach. The Bayesian approach is suggested. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

2.
When psychologists test a commonsense (CS) hypothesis and obtain no support, they tend to erroneously conclude that the CS belief is wrong. In many such cases it appears, after many years, that the CS hypothesis was valid after all. It is argued that this error of accepting the "theoretical" null hypothesis reflects confusion between the operationalized hypothesis and the theory or generalization that it is designed to test. That is, on the basis of reliable null data one can accept the operationalized null hypothesis (e.g., "A measure of attitude x is not correlated with a measure of behavior y"). In contrast, one cannot generalize from the findings and accept the abstract or theoretical null (e.g., "We know that attitudes do not predict behavior"). The practice of accepting the theoretical null hypothesis hampers research and reduces the trust of the public in psychological research. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

3.
Evidence of group matching frequently takes the form of a nonsignificant test of statistical difference. Theoretical hypotheses of no difference are also tested in this way. These practices are flawed in that null hypothesis statistical testing provides evidence against the null hypothesis and failing to reject H? is not evidence supportive of it. Tests of statistical equivalence are needed. This article corrects the inferential confidence interval (ICI) reduction factor introduced by W. W. Tryon (2001) and uses it to extend his discussion of statistical equivalence. This method is shown to be algebraically equivalent with D. J. Schuirmann's (1987) use of 2 one-sided t tests, a highly regarded and accepted method of testing for statistical equivalence. The ICI method provides an intuitive graphic method for inferring statistical difference as well as equivalence. Trivial difference occurs when a test of difference and a test of equivalence are both passed. Statistical indeterminacy results when both tests are failed. Hybrid confidence intervals are introduced that impose ICI limits on standard confidence intervals. These intervals are recommended as replacements for error bars because they facilitate inferences. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

4.
Reports an error in the original article by E. E. Levitt (Psychological Bulletin, 1955[Sep], Vol 53[5], 347-370). On page 368, right-hand column; the text: "1. After eight years of research, evidence for the validity of the water-jar test as a measure of validity is still lacking." should read: "1. After eight years of research, evidence for the validity of the water-jar test as a measure of rigidity is still lacking.". (The following abstract of this article originally appeared in record 1958-02905-001.) The primary purpose of the present paper is to examine the validity of the water-jar test as a rigidity measure by critically reviewing studies involving its use as such an index." Correlations between the water-jar test (WJT) and numerous criterion measures are generally statistically nonsignificant. On the basis of several studies it is tentatively concluded that a low negative correlation between the WJT and intelligence exists. The notion that rigidity increases under stress is not supported by the research evidence. The author concludes that evidence for the validity of the WJT is lacking and that the WJT, from a psychometric point of view, is poor. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

5.
Adverse impact evaluations often call for evidence that the disparity between groups in selection rates is statistically significant, and practitioners must choose which test statistic to apply in this situation. To identify the most effective testing procedure, the authors compared several alternate test statistics in terms of Type I error rates and power, focusing on situations with small samples. Significance testing was found to be of limited value because of low power for all tests. Among the alternate test statistics, the widely-used Z-test on the difference between two proportions performed reasonably well, except when sample size was extremely small. A test suggested by G. J. G. Upton (1982) provided slightly better control of Type I error under some conditions but generally produced results similar to the Z-test. Use of the Fisher Exact Test and Yates's continuity-corrected chi-square test are not recommended because of overly conservative Type I error rates and substantially lower power than the Z-test. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

6.
We examine the operating characteristics of 17 methods for correcting p-values for multiple testing on synthetic data with known statistical properties. These methods are derived p-values only and not the raw data. With the test cases, we systematically varied the number of p-values, the proportion of false null hypotheses, the probability that a false null hypothesis would result in a p-value less than 5 per cent and the degree of correlation between p-values. We examined the effect of each of these factors on family-wise and false negative error rates and compared the false negative error rates of methods with an acceptable family-wise error. Only four methods were not bettered in this comparison. Unfortunately, however, a uniformly best method of those examined does not exist. A suggested strategy for examining corrections uses a succession of methods that are increasingly lax in family-wise error. A computer program for these corrections is available.  相似文献   

7.
Comments on the article by R. L. Hagen (see record 1997-02239-002) supporting use of the null hypothesis statistical test (NHST). Hagen did an admirable job of reminding readers that the NHST represents a brilliant and useful innovation, but does not offer a strong case for its continued use as the primary inferential strategy in psychology. The question is not "Is it useless?" but "Is there something better?" Popular opinion holds that interval estimation represents a superior strategy to NHST in many ways. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

8.
Reviewed studies on the relation between gender self-concept and performance on spatial, mathematical, and verbal tasks to evaluate S. C. Nash's (1979) hypothesis that individuals will perform better on cognitive tasks when their self-concepts match the gender stereotyping of the tasks. Meta-analytic techniques were used to estimate the average effect sizes and to determine the significance of the combined probabilities. The influence of Ss' sex and age, date of study, type of spatial task, and type of self-concept measure on these associations was also examined. In general, the results from spatial and mathematical tasks, which are usually stereotyped as masculine, support Nash's hypothesis. Higher masculine and lower feminine self-concept scores were associated with better performance. These relations were observed more consistently for females than for males. There was some evidence of better spatial and mathematical performance among adolescent boys who described themselves as feminine. Nash's hypothesis was not supported for verbal tasks. There was no evidence that androgyny, defined either as high masculine and high feminine scores or as a balance between masculine and feminine scores, is associated with better cognitive performance. (3? p ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

9.
J. R. Keith et al. (2002) examined the effects of cardiopulmonary bypass surgery on cognition and suggest that the use of parametric, inferential statistics may have advantages over incidence reports. This commentary addresses several issues that arise when conducting outcomes studies within the context of evidence-based medicine. "Consumer friendly" research within the context of evidence-based medicine must carefully attend to the selection of appropriate and relevant reference groups and recognize that clinicians practice and record outcomes as individual rather than as group events. Traditional null hypothesis significance testing and inferential statistics are useful in establishing the reliability of group differences but do not provide the statistical indexes and base-rate information that clinicians can easily use in their treatment of individual patients. Data analysis and presentation of results using methods from clinical epidemiology can make neuropsychological outcomes research consumer friendly and help bridge the all too frequent schism between academic research and clinical practice. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

10.
Reports an error in the article "A-B Therapist Status, Patient Diagnosis, and Psychotherapy Outcome in a Psychiatric Outpatient Population" by Jerry G. Matthews and Barry R. Burkhart (Journal of Consulting and Clinical Psychology, 1977, Vol. 45, No. 3, pp. 47S-482), the next to last sentence of the abstract is incorrect. The sentence reads: "Separate analyses of variance demonstrated further support for the super-A hypothesis with therapists' ratings as the dependent variable, whereas the interaction hypothesis received support, with number of sessions as the dependent measure." The sentence should read: "Separate analyses of variance demonstrated further support for the super-A hypothesis with number of sessions attended as the dependent measure, whereas the interaction hypothesis received support, with therapists' ratings as the dependent variable." (The following abstract originally appeared in record 1978-03783-001) Previous research generally has supported the hypothesis that A therapists obtain better therapy outcomes with schizophrenics, while B therapists do better with neurotics. Based on recent evidence, a 2nd hypothesis (super A) has been advanced which predicts that A therapists do at least as well with neurotic patients as do B therapists and that As obtain significantly more positive outcomes with schizophrenics. To examine these hypotheses, the therapy outcomes of 7 A and 4 B therapists, differentiated by their scores on the 23-item Whitehorn and Betz (1957) A-sub( scale, with their 18 schizophrenic and 18 neurotic patients were examined. A multivariate ANOVA computed for the 2 outcome measures, therapists' ratings of patient improvement and number of therapy sessions, clearly supported the super-A hypothesis. Separate ANOVAs demonstrated further support for the super-A hypothesis with therapists' ratings as the dependent variable, whereas the interaction hypothesis received support with number of sessions attended as the dependent measure. Of considerable importance was the fact that the addition of ataractic medication to the treatment of schizophrenics did not attenuate the effect of the A-sub( therapist distinction on therapeutic outcome. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

11.
Develops the notion of multivariate association among the variables in a set. Procedures stemming from an earlier inferential test are presented for comparing, between 2 populations, the strength of association that is present simultaneously in all pairs in a set of variables. Results are outlined of an investigation, using computer simulation methods, of the test statistic's sampling properties. The statistic is shown to follow closely the central F distribution, thus permitting adequate control of Type-I error. Applications are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

12.
The 2?×?2 table has received an enormous amount of attention in the research literature. Most studies have focused on Type I error rates and the power of the chi-square statistic, but some have been more concerned with the theoretical justification behind methods of analysis. Little consensus has been achieved in either area. The reason for this is that 2 basic inferential paradigms that underlie much of the work in 2?×?2 tables are incompatible. Thus, empirical studies of Type I error rates of the chi-square test within the Neyman–Pearson framework are considered irrelevant by advocates of R. A. Fisher's exact test. Both approaches are described in this article. G. A. Barnard's (1947) test is shown to be theoretically superior to the chi-square test and all of its corrected cousins. However, Fisher's exact test is advocated as the most rational choice. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

13.
This study, based on Freudian theory, used a forced-choice word association format to test the hypothesis that regressed schizophrenic Ss would prefer children's to adult's association. 16 schizophrenic, 16 sociopathic, and 16 normal male patients matched for age and education were tested on a 51-item test in which they were forced to choose their associations from among randomly arranged adult preferred, children preferred, and irrelevant alternatives. Using choice of children's responses minus choice of irrelevant response as a measure to control for random error markings, schizophrenic Ss differed significantly from normal Ss as predicted. Normal and sociopathic Ss did not differ. Sociopathic and schizophrenic Ss differed at p  相似文献   

14.
Hypothesis testing with multiple outcomes requires adjustments to control Type I error inflation, which reduces power to detect significant differences. Maintaining the prechosen Type I error level is challenging when outcomes are correlated. This problem concerns many research areas, including neuropsychological research in which multiple, interrelated assessment measures are common. Standard p value adjustment methods include Bonferroni-, Sidak-, and resampling-class methods. In this report, the authors aimed to develop a multiple hypothesis testing strategy to maximize power while controlling Type I error. The authors conducted a sensitivity analysis, using a neuropsychological dataset, to offer a relative comparison of the methods and a simulation study to compare the robustness of the methods with respect to varying patterns and magnitudes of correlation between outcomes. The results lead them to recommend the Hochberg and Hommel methods (step-up modifications of the Bonferroni method) for mildly correlated outcomes and the step-down minP method (a resampling-based method) for highly correlated outcomes. The authors note caveats regarding the implementation of these methods using available software. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

15.
Wider use in psychology of confidence intervals (CIs), especially as error bars in figures, is a desirable development. However, psychologists seldom use CIs and may not understand them well. The authors discuss the interpretation of figures with error bars and analyze the relationship between CIs and statistical significance testing. They propose 7 rules of eye to guide the inferential use of figures with error bars. These include general principles: Seek bars that relate directly to effects of interest, be sensitive to experimental design, and interpret the intervals. They also include guidelines for inferential interpretation of the overlap of CIs on independent group means. Wider use of interval estimation in psychology has the potential to improve research communication substantially. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

16.
47 (81%) of 53 personnel of a small plant guessed how much and in which direction the "erroneous" (but actually correct) company clocks were in error, in order to test hypothesis that, contarry to popular notion, time drag is estimated as greater in the variety-type than in the repetitive-type jobs. Significant correlations (p  相似文献   

17.
The article, "Single-Sample Tests for Many Correlations," by Robert E. Larzelere and Stanley A. Mulaik (Psychological Bulletin, 1977, Vol. 84, No. 3, pp. 557-569) contains the following errors: On page 558, the sentence that includes Equation 1 should read: "Assuming that the X and Y variables have a joint multivariate normal distribution, one then regards the multiple correlation as significantly different from zero if: (1)F=R2(N-p-1)/[(1-R2)p] is greater than Fα, the corresponding critical value of the F distribution with p and N-p-1 degrees of freedom at the 100(1-α) percentile level, with N the sample size." On page 559, the sentence that includes Equation 2 should read: "The population correlation p(W, Y) between W and Y is then regarded as significantly different from zero if: (2) F=rwy2(N-p-1)/[1-rwy2]p) is greater than the critical value of Fα of the F distribution with p and N-p-1 degrees of freedom at the 100(1-α) percentile level." On page 559, the sentence that reads, "If p(W, Y) is regarded as equal to zero, then any variable Xι with a nonzero weight wι in the linear combination W is also considered to have a zero correlation with Y," should be deleted. Thanks are due to Paul A. Games. (The following abstract originally appeared in record 1978-00149-001) When more than one correlation coefficient is tested for significance in a study, the probability of making at least one Type I error rises rapidly as the number of tests increases, and the probability of making a Type I error after a Type I error on a previous test is usually greater than the nominal significance level used in each test. To avoid excessive Type I errors with multiple tests of correlations, it is noted that researchers should use procedures that answer research questions with a single statistical test and/or should use special multiple-test procedures. A review of simultaneous-test and multiple-test procedures for correlations (e.g. Bartlett and Rajalakshman's test, multistage Bonferroni procedure, union-intersection tests, and the rank adjusted method) is presented, and several new procedures are described. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

18.
Reviews "Design for decision," by Irwin D. J. Bross (see record 1954-05210-000). Some American statisticians maintain that in Wald's decision theory modern statistics, as a system of inductive logic, has progressed as far beyond Fisher as Fisher advanced it beyond the Pearsonian era. In this book, Bross successfully describes, in a nontechnical style, how statistical tests and estimation relate to the broad modern conceptions of statistical decision and game theory. This he does with frequent humorous, or even facetious asides. The book is singularly free of error, because Bross is capable of dealing with each item at a far more technical level than was required for his present task. I do believe he could have let his readers know, in many instances, that there are effective standard mathematical methods for obtaining decision makers. Also, he fails to clarify the differences between experiments and normative studies with all their critical implications. I highly recommend "Design for Decision" to all who want a painless injection of the simple, basic ideas which have revolutionized modern statistics. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

19.
In response to the growing need for statistical information regarding slope stability risk analysis, this work applies inferential analysis to a compiled database of 157 failed slopes and corresponding 301 safety factor (SF) calculations. As presented in the companion paper, this database also includes a number of slope stability factors, including analytical method used, stress approach (effective versus total), assumed slip surface geometry, slope type, applied correction factors, and soil Atterburg limits. Although the SF data were found to be fairly well fit by a lognormal distribution, pronounced curvature of the residuals was observed, likely related to various unaccounted slope factors. In response, inferential statistics are used in this paper to analyze the effects of analytical method, slope type, soil plasticity, and effective versus total stress analysis. ANOVA hypothesis testing indicated significant differences between analytical methods and significant interactions between slope types and pore-water stress approaches. Direct SF calculation methods, such as infinite slope, wedge, and the ordinary method of slices were found to produce SF near 1 as expected, but higher order methods in general, and force methods in particular, predicted safety factors significantly greater than 1. Clay content alone was not a discernible influence on SF calculations. A reduced factor ANOVA model was developed to predict SF, given analytical method (a main effect) and the interactions between analytical method with both slope type and pore-water pressure approach.  相似文献   

20.
In contrast to the standard use of regression, in which an individual's score on the dependent variable is unknown, neuropsychologists are often interested in comparing a predicted score with a known obtained score. Existing inferential methods use the standard error for a new case (sN+1) to provide confidence limits on a predicted score and hence are tailored to the standard usage. However, sN+1 can be used to test whether the discrepancy between a patient's predicted and obtained scores was drawn from the distribution of discrepancies in a control population. This method simultaneously provides a point estimate of the percentage of the control population that would exhibit a larger discrepancy. A method for obtaining confidence limits on this percentage is also developed. These methods can be used with existing regression equations and are particularly useful when the sample used to generate a regression equation is modest in size. Monte Carlo simulations confirm the validity of the methods, and computer programs that implement them are described and made available. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号