首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The authors (see record 1979-00153-001) argued that the reliability coefficient for the dependent variable in a controlled experiment has no direct relevance for hypothesis testing. Specifically, they demonstrated that increasing the reliability coefficient for the dependent variable did not necessarily increase the power of standard statistical tests. The authors present further evidence that large reliability coefficients are not always desirable in true experiments, and replies to J. P. Sutcliffe's (see record 1980-29332-001) basic criticisms of Nicewander and Price's contentions. (6 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

2.
Assuming that the linear models for classical test theory and ANOVA hold simultaneously for some dependent variable, it is shown that 2 contradictory statements concerning the relationship between reliability and statistical power are both correct. J. E. Overall and J. A. Woodward (see PA, Vol 53:8623, 57:7284) showed that when the reliability of a difference or change score is zero, the power of a statistical test of a hypothesis of no change can be at a maximum. J. L. Fleiss (see record 1977-07259-001) found the opposite result (i.e., that the power of a statistical test of no pre–post change is at a maximum when the reliability of the difference or gain scores is equal to one). The role of the reliability of the dependent variable in statistical evaluations of controlled experiments is examined. It is argued that the conditions that yield high reliability coefficients are not necessarily optimal for significance testing. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

3.
Analysis of continuous variables sometimes proceeds by selecting individuals on the basis of extreme scores of a sample distribution and submitting only those extreme scores to further analysis. This sampling method is known as the extreme groups approach (EGA). EGA is often used to achieve greater statistical power in subsequent hypothesis tests. However, there are several largely unrecognized costs associated with EGA that must be considered. The authors illustrate the effects EGA can have on power, standardized effect size, reliability, model specification, and the interpretability of results. Finally, the authors discuss alternative procedures, as well as possible legitimate uses of EGA. The authors urge researchers, editors, reviewers, and consumers to carefully assess the extent to which EGA is an appropriate tool in their own research and in that of others. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

4.
Demonstrating a specific cognitive deficit usually involves comparing patients' performance on 2 or more tests. The psychometric confound occurs if the psychometric properties of these tests lead patients to show greater cognitive deficits in 1 domain. One way to avoid the psychometric confound is to use tests with a similar level of discriminating power, which is a test's ability to index true individual differences in classic psychometric theory. One suggested way to measure discriminating power is to calculate true score variance (L. J. Chapman & J. P. Chapman, 1978). Despite the centrality of these formulations, there is no systematic examination of the relationship between the observable property of true score variance and the latent property of discriminating power. The authors simulated administrations of free response tests and forced choice tests by creating different replicable ability scores for 2 groups, across a wide range of various psychometric properties (i.e., difficulty, reliability, observed variance, and number of items), and computing an ideal index of discriminating power. Simulation results indicated that true score variance had only limited ability to predict discriminating power (explained about 10% of variance in replicable ability scores). Furthermore, the ability varied across tests with wide ranges of psychometric variables, such as difficulty, observed variance, reliability, and number of items. Discriminating power depends on a complicated interaction of psychometric properties that is not well estimated solely by a test's true score variance. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

5.
Researchers often use 3-way interactions in moderated multiple regression analysis to test the joint effect of 3 independent variables on a dependent variable. However, further probing of significant interaction terms varies considerably and is sometimes error prone. The authors developed a significance test for slope differences in 3-way interactions and illustrate its importance for testing psychological hypotheses. Monte Carlo simulations revealed that sample size, magnitude of the slope difference, and data reliability affected test power. Application of the test to published data yielded detection of some slope differences that were undetected by alternative probing techniques and led to changes of results and conclusions. The authors conclude by discussing the test's applicability for psychological research. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

6.
We evaluated the statistical power of single-indicator latent growth curve models (LGCMs) to detect correlated change between two variables (covariance of slopes) as a function of sample size, number of longitudinal measurement occasions, and reliability (measurement error variance). Power approximations following the method of Satorra and Saris (1985) were used to evaluate the power to detect slope covariances. Even with large samples (N=500) and several longitudinal occasions (4 or 5), statistical power to detect covariance of slopes was moderate to low unless growth curve reliability at study onset was above .90. Studies using LGCMs may fail to detect slope correlations because of low power rather than a lack of relationship of change between variables. The present findings allow researchers to make more informed design decisions when planning a longitudinal study and aid in interpreting LGCM results regarding correlated interindividual differences in rates of development. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

7.
OBJECTIVE: Reliability of platform posturography tests is essential for the identification and treatment of balance-related disorders. The purposes of this study were to establish the reliability of the limits of stability (LOS) test and to determine the relative variance contributions from identified sources of measurement error. DESIGN: Generalizability theory was used to calculate (1) variance estimates and percentage of variation for the sources of measurement error, and (2) generalizability coefficients. Random effects repeated measures analysis of variance (RM ANOVA) was used to assess consistency of measurements across both days and targets. PARTICIPANTS: Thirty-eight community-dwelling older adults with no recent history of falls. MAIN OUTCOME MEASURES: Outcome measures derived from the LOS tests included movement velocity (MV), maximum center of gravity (COG) excursion (ME), end point COG excursion (EE), and directional control (DC). RESULTS: Estimated generalizability coefficients for 2 and 3 days of testing ranged from .69 to .91. Relative contributions of the day facet were minimal. The RM ANOVA results indicated that for three of the movement variables, no significant differences in scores were observed across days. CONCLUSIONS: The 75% and 100% LOS tests are reliable tests of dynamic balance when administered to healthy older adults with no recent history of falls. Dynamic balance measures were generally consistent across multiple evaluations.  相似文献   

8.
The attenuation paradox (lack of a monotonic relationship between reliability and validity) becomes more meaningful if the usual assumption of a normal continuous distribution in the criteria variable is dropped. On the basis of present day tests these assumptions are highly questionable. The author concludes that "there is no paradox if the criterion distributions can assume any shape." A sequence of issues to be considered in the construction of tests is presented. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

9.
Consider a study in which 2 groups are followed over time to assess group differences in the average rate of change, rate of acceleration, or higher degree polynomial effect. In designing such a study, one must decide on the duration of the study, frequency of observation, and number of participants. The authors consider how these choices affect statistical power and show that power depends on a standardized effect size, the sample size, and a person-specific reliability coefficient. This reliability, in turn, depends on study duration and frequency. These relations enable researchers to weigh alternative designs with respect to feasibility and power. The authors illustrate the approach using data from published studies of antisocial thinking during adolescence and vocabulary growth during infancy. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

10.
Through the application of the statistical tools that compose item response theory—coupled with the ideas of local independence and local dependence and the concept of the testlet—the authors illustrate item analysis, scale assembly, and scoring rules for 2 scales measuring aspects of violent circumstances and tendencies. The concepts and procedures used are general and have much broader applicability for psychological measurement. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

11.
Considers the J. E. Overall and J. A. Woodward (see record 1975-08623-001) work in which the authors assumed a restrictive statistical model for representing the results of a pre-post repeat measurement study and showed that t on pre-post differences attains its greatest power when the reliability of the difference is zero. It is argued that their result, far from being paradoxical, is a consequence of an unrealistic statistical model. If a more general and more realistic model (one with interaction) is assumed, it shows that the t test attains its greatest power when the reliability of the difference is unity. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

12.
This article directly addresses explicit contradictions in the literature regarding the relation between the power of multivariate analysis of variance (MANOVA) and the intercorrelations among the dependent variables. Artificial data sets, as well as analytical methods, revealed that (1) power increases as correlations between dependent variables with large consistent effect sizes (that are in the same direction) move from near 1.0 toward –2.0, (2) power increases as correlations become more positive or more negative between dependent variables that have very different effect sizes (i.e., one large and one negligible), and (3) power increases as correlations between dependent variables with negligible effect sizes shift from positive to negative (assuming that there are dependent variables with large effect sizes still in the design). Implications for the reliability of dependent variables and strategies for selecting these variables in MANOVA designs are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

13.
Because causal relations are neither observable nor deducible, they must be induced from observable events. The 2 dominant approaches to the psychology of causal induction—the covariation approach and the causal power approach—are each crippled by fundamental problems. This article proposes an integration of these approaches that overcomes these problems. The proposal is that reasoners innately treat the relation between covariation (a function defined in terms of observable events) and causal power (an unobservable entity) as that between scientists' law or model and their theory explaining the model. This solution is formalized in the power PC theory, a causal power theory of the probabilistic contrast model (P. W. Cheng & L. R. Novick, 1990). The article reviews diverse old and new empirical tests discriminating this theory from previous models, none of which is justified by a theory. The results uniquely support the power PC theory. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

14.
Minimal measurement error (reliability) during the collection of interval- and ratio-type data is critically important to sports medicine research. The main components of measurement error are systematic bias (e.g. general learning or fatigue effects on the tests) and random error due to biological or mechanical variation. Both error components should be meaningfully quantified for the sports physician to relate the described error to judgements regarding 'analytical goals' (the requirements of the measurement tool for effective practical use) rather than the statistical significance of any reliability indicators. Methods based on correlation coefficients and regression provide an indication of 'relative reliability'. Since these methods are highly influenced by the range of measured values, researchers should be cautious in: (i) concluding acceptable relative reliability even if a correlation is above 0.9; (ii) extrapolating the results of a test-retest correlation to a new sample of individuals involved in an experiment; and (iii) comparing test-retest correlations between different reliability studies. Methods used to describe 'absolute reliability' include the standard error of measurements (SEM), coefficient of variation (CV) and limits of agreement (LOA). These statistics are more appropriate for comparing reliability between different measurement tools in different studies. They can be used in multiple retest studies from ANOVA procedures, help predict the magnitude of a 'real' change in individual athletes and be employed to estimate statistical power for a repeated-measures experiment. These methods vary considerably in the way they are calculated and their use also assumes the presence (CV) or absence (SEM) of heteroscedasticity. Most methods of calculating SEM and CV represent approximately 68% of the error that is actually present in the repeated measurements for the 'average' individual in the sample. LOA represent the test-retest differences for 95% of a population. The associated Bland-Altman plot shows the measurement error schematically and helps to identify the presence of heteroscedasticity. If there is evidence of heteroscedasticity or non-normality, one should logarithmically transform the data and quote the bias and random error as ratios. This allows simple comparisons of reliability across different measurement tools. It is recommended that sports clinicians and researchers should cite and interpret a number of statistical methods for assessing reliability. We encourage the inclusion of the LOA method, especially the exploration of heteroscedasticity that is inherent in this analysis. We also stress the importance of relating the results of any reliability statistic to 'analytical goals' in sports medicine.  相似文献   

15.
16.
Organizational research and practice involving ratings are rife with what the authors term ill-structured measurement designs (ISMDs)--designs in which raters and ratees are neither fully crossed nor nested. This article explores the implications of ISMDs for estimating interrater reliability. The authors first provide a mock example that illustrates potential problems that ISMDs create for common reliability estimators (e.g., Pearson correlations, intraclass correlations). Next, the authors propose an alternative reliability estimator--G(q,k)--that resolves problems with traditional estimators and is equally appropriate for crossed, nested, and ill-structured designs. By using Monte Carlo simulation, the authors evaluate the accuracy of traditional reliability estimators compared with that of G(q,k) for ratings arising from ISMDs. Regardless of condition, G(q,k) yielded estimates as precise or more precise than those of traditional estimators. The advantage of G(q,k) over the traditional estimators became more pronounced with increases in the (a) overlap between the sets of raters that rated each ratee and (b) ratio of rater main effect variance to true score variance. Discussion focuses on implications of this work for organizational research and practice. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

17.
On the basis of an empirical study of measures of constructs from the cognitive domain, the personality domain, and the domain of affective traits, the authors of this study examine the implications of transient measurement error for the measurement of frequently studied individual-differences variables. The authors clarify relevant reliability concepts as they relate to transient error and present a procedure for estimating the coefficient of equivalence and stability (L. J. Cronbach, 1947), the only classical reliability coefficient that assesses a 3 major sources of measurement error (random response, transient, and specific factor errors). The authors conclude that transient error exists in all 3 trait domains and is especially large in the domain of affective traits. Their findings indicate that the nearly universal use of the coefficient of equivalence (Cronbach's alpha; L. J. Cronbach, 1951), which fails to assess transient error, leads to overestimates of reliability and undercorrections for biases due to measurement error. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

18.
Classic parametric statistical significance tests, such as analysis of variance and least squares regression, are widely used by researchers in many disciplines, including psychology. For classic parametric tests to produce accurate results, the assumptions underlying them (e.g., normality and homoscedasticity) must be satisfied. These assumptions are rarely met when analyzing real data. The use of classic parametric methods with violated assumptions can result in the inaccurate computation of p values, effect sizes, and confidence intervals. This may lead to substantive errors in the interpretation of data. Many modern robust statistical methods alleviate the problems inherent in using parametric methods with violated assumptions, yet modern methods are rarely used by researchers. The authors examine why this is the case, arguing that most researchers are unaware of the serious limitations of classic methods and are unfamiliar with modern alternatives. A range of modern robust and rank-based significance tests suitable for analyzing a wide range of designs is introduced. Practical advice on conducting modern analyses using software such as SPSS, SAS, and R is provided. The authors conclude by discussing robust effect size indices. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

19.
20.
Calculation of the statistical power of statistical tests is important in planning and interpreting the results of research studies, including meta-analyses. It is particularly important in moderator analyses in meta-analysis, which are often used as sensitivity analyses to rule out moderator effects but also may have low statistical power. This article describes how to compute statistical power of both fixed- and mixed-effects moderator tests in meta-analysis that are analogous to the analysis of variance and multiple regression analysis for effect sizes. It also shows how to compute power of tests for goodness of fit associated with these models. Examples from a published meta-analysis demonstrate that power of moderator tests and goodness-of-fit tests is not always high. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号