首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 604 毫秒
1.
The goal of this study was to explore similarities and differences in person-fit assessment under item response theory (IRT) and covariance structure analysis (CSA) measurement models. The responses of 3,245 individuals who completed 3 personality scales were analyzed under an IRT model and a CSA model. The authors then computed person-fit statistics for individual examinees under both IRT and CSA models. To be specific, for each examinee, the authors computed a standardized person-fit index for the IRT models, called Zl; in addition, an individual's contribution to chi-square, called IND{chi}, was used as a person-fit indicator for CSA models. Findings indicated that these indices are relatively free of confounds with examinee trait level. However, the relationship between Zl, and IND{chi}, values was small, suggesting that the indices identify different examinees as not fitting a model. Implications of the results and directions for future inquiry are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

2.
Investigated the utility of confirmatory factor analysis (CFA) and item response theory (IRT) models for testing the comparability of psychological measurements. Both procedures were used to investigate whether mood ratings collected in Minnesota and China were comparable. Several issues were addressed. The 1st issue was that of establishing a common measurement scale across groups, which involves full or partial measurement invariance of trait indicators. It is shown that using CFA or IRT models, test items that function differentially as trait indicators across groups need not interfere with comparing examinees on the same trait dimension. Second, the issue of model fit was addressed. It is proposed that person-fit statistics be used to judge the practical fit of IRT models. Finally, topics for future research are suggested. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

3.
Person-fit statistics test whether the likelihood of a respondent's complete vector of item scores on a test is low given the hypothesized item response theory model. This binary information may be insufficient for diagnosing the cause of a misfitting item-score vector. The authors propose a comprehensive methodology for person-fit analysis in the context of nonparametric item response theory. The methodology (a) includes H. Van der Flier's (1982) global person-fit statistic U3 to make the binary decision about fit or misfit of a person's item-score vector, (b) uses kernel smoothing (J. O. Ramsay, 1991) to estimate the person-response function for the misfitting item-score vectors, and (c) evaluates unexpected trends in the person-response function using a new local person-fit statistic (W. H. M. Emons, 2003). An empirical data example shows how to use the methodology for practical person-fit analysis. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

4.
In this article, the authors developed a common strategy for identifying differential item functioning (DIF) items that can be implemented in both the mean and covariance structures method (MACS) and item response theory (IRT). They proposed examining the loadings (discrimination) and the intercept (location) parameters simultaneously using the likelihood ratio test with a free-baseline model and Bonferroni corrected critical p values. They compared the relative efficacy of this approach with alternative implementations for various types and amounts of DIF, sample sizes, numbers of response categories, and amounts of impact (latent mean differences). Results indicated that the proposed strategy was considerably more effective than an alternative approach involving a constrained-baseline model. Both MACS and IRT performed similarly well in the majority of experimental conditions. As expected, MACS performed slightly worse in dichotomous conditions but better than IRT in polytomous cases where sample sizes were small. Also, contrary to popular belief, MACS performed well in conditions where DIF was simulated on item thresholds (item means), and its accuracy was not affected by impact. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

5.
An item response theory (IRT) approach to test linking based on summed scores is presented and demonstrated by calibrating a modified 23-item version of the Center for Epidemiologic Studies Depression Scale (CES-D) to the standard 20-item CES-D. Data are from the Depression Patient Outcomes Research Team, 11, which used a modified CES-D to measure risk for depression. Responses (N?=?1,120) to items on both the original and modified versions were calibrated simultaneously using F. Samejima's (1969, 1997) graded IRT model. The 2 scales were linked on the basis of derived summed-score-to-IRT-score translation tables. The established cut score of 16 on the standard CES-D corresponded most closely to a summed score of 20 on the modified version. The IRT summed-score approach to test linking is a straightforward, valid, and practical method that can be applied in a variety of situations. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

6.
Mixed models take the dependency between observations based on the same cluster into account by introducing 1 or more random effects. Common item response theory (IRT) models introduce latent person variables to model the dependence between responses of the same participant. Assuming a distribution for the latent variables, these IRT models are formally equivalent with nonlinear mixed models. It is shown how a variety of IRT models can be formulated as particular instances of nonlinear mixed models. The unifying framework offers the advantage that relations between different IRT models become explicit and that it is rather straightforward to see how existing IRT models can be adapted and extended. The approach is illustrated with a self-report study on anger. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

7.
Popular methods for fitting unidimensional item response theory (IRT) models to data assume that the latent variable is normally distributed in the population of respondents, but this can be unreasonable for some variables. Ramsay-curve IRT (RC-IRT) was developed to detect and correct for this nonnormality. The primary aims of this article are to introduce RC-IRT less technically than it has been described elsewhere; to evaluate RC-IRT for ordinal data via simulation, including new approaches for model selection; and to illustrate RC-IRT with empirical examples. The empirical examples demonstrate the utility of RC-IRT for real data, and the simulation study indicates that when the latent distribution is skewed, RC-IRT results can be more accurate than those based on the normal model. Along with a plot of candidate curves, the Hannan-Quinn criterion is recommended for model selection. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

8.
The authors describe the initial development of the Wagner Assessment Test (WAT), an instrument designed to assess critical thinking, using the 5-faceted view popularized by the Watson-Glaser Critical Thinking Appraisal (WGCTA; G. B. Watson & E. M. Glaser, 1980). The WAT was designed to reduce the degree of successful guessing relative to the WGCTA by increasing the number of response alternatives (i.e., 80% of WGCTA items are 2-alternative, multiple-choice), a change that was hypothesized to result in more desirable test information and standard-error functions. Analyses using the 3-parameter logistic item response theory (IRT) model in a sample of undergraduates (N = 407) supported this prediction, even when the WAT item pool was shortened to match the length of the WGCTA. Convergent validity between full-pool IRT score estimates was r = .69. Implications for subsequent research on IRT-based measurement of critical thinking are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

9.
Statistical methods based on item response theory (IRT) were used to bidirectionally evaluate the measurement equivalence of translated American and German intelligence tests. Items that displayed differential item functioning (DIF) were identified, and content analysis was used to determine probable sources, of DIF, either cultural or linguistic. The benefits of using an IRT analysis in examining the fidelity of translated tests are described. In addition, the influence of cultural differences on test translations and the use of DIF items to elucidate cultural differences are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

10.
The Psychopathy Checklist--Revised (PCL-R) is an important measure in both applied and research settings. Evidence for its validity is mostly derived from male Caucasian participants. PCL-R ratings of 359 Caucasian and 356 African American participants were compared using confirmatory factor analysis (CFA) and item response theory (IRT) analyses. Previous research has indicated that 13 items of the PCL-R can be described by a 3-factor hierarchical model. This model was replicated in this sample. No cross-group difference in factor structure could be found using CFA; the structure of psychopathy is the same in both groups. IRT methods indicated significant but small differences in the performance of 5 of the 20 PCL-R items. No significant differential test functioning was found, indicating that the item differences canceled each other out. It is concluded that the PCL-R can be used, in an unbiased way, with African American participants. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

11.
This study demonstrated the application of an innovative item response theory (IRT) based approach to evaluating measurement equivalence, comparing a newly developed Spanish version of the Posttraumatic Stress Disorder Checklist-Civilian Version (PCL-C) with the established English version. Basic principles and practical issues faced in the application of IRT methods for instrument evaluation are discussed. Data were derived from a study of the mental health consequences of community violence in both Spanish speakers (n = 102) and English speakers (n = 284). Results of differential item functioning (DIF) analyses revealed that the 2 versions were not fully equivalent on an item-by-item basis in that 6 of the 17 items displayed uniform DIF. No bias was observed, however, at the level of the composite PCL-C scale score, indicating that the 2 language versions can be combined for scale-level analyses. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

12.
13.
Self-report measures of adult attachment are typically scored in ways (e.g., averaging or summing items) that can lead to erroneous inferences about important theoretical issues, such as the degree of continuity in attachment security and the differential stability of insecure attachment patterns. To determine whether existing attachment scales suffer from scaling problems, the authors conducted an item response theory (IRT) analysis of 4 commonly used self-report inventories: Experiences in Close Relationships scales (K. A. Brennan, C. L. Clark, & P. R. Shaver, 1998), Adult Attachment Scales (N. L. Collins & S. J. Read, 1990), Relationship Styles Questionnaire (D. W. Griffin & K. Bartholomew, 1994) and J. Simpson's (1990) attachment scales. Data from 1,085 individuals were analyzed using F. Samejima's (1969) graded response model. The authors' findings indicate that commonly used attachment scales can be improved in a number of important ways. Accordingly, the authors show how IRT techniques can be used to develop new attachment scales with desirable psychometric properties. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

14.
The purpose of this article is to illustrate the power of item response theory (IRT) for the item analysis of measurement instruments in psychology. Through illustration, we show that IRT latent variable models fit data from a wide variety of sources and that interpretation of the features of these fitted models leads to interesting insights into the psychology underlying the data. The illustrations involve personality and attitude measurement as well as the evaluation of cognitive proficiency. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

15.
An item response theory (IRT) analysis was used to identify unique cultural response patterns by comparing single-culture groups with a multicultural composite. A survey designed to measure attitudes toward mental health was administered in their native languages to American, German, and French working, retired, and student teachers. Item characteristic curves (ICCs) for each national group were compared with ICCs generated by composite reference containing all 3 cultural groups, thus providing an omnicultural reference point. Items that exhibited differential item functioning, that is, items with dissimilar ICCs for the composite reference and focal groups, were indicative of unique cultural response patterns to the attitude survey items. The advantages and disadvantages of this method in an IRT are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

16.
A battery of 7 tasks composed of 105 items thought to measure phonological awareness skills was administered to 945 children in kindergarten through 2nd grade. Results from confirmatory factor analysis at the task level and modified parallel analysis at the item level indicated that performance on these tasks was well represented by a single latent dimension. A 2-parameter logistic item response (IRT) model was also fit to the performance on the 105 items. Information obtained from the IRT model demonstrated that the tasks varied in the information they provided about a child's phonological awareness skills. These results showed that phonological awareness, as measured by these tasks, appears to be well represented as a unidimensional construct, but the tasks best suited to measure phonological awareness vary across development. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

17.
A framework is presented to model instances of local dependence between items within the context of unidimensional item response theory (IRT). A distinction is made between item main effects and item interactions. Four types of models for interdependent items are considered, on the basis of the distinction between order dependency and combination dependency on the one hand, and dimension-dependent versus constant interaction on the other hand. For each of the 4 model types, variants of the 1-parameter logistic model can be formulated as well as variants of the 2-parameter logistic model. A number of existing IRT models for polytomous items that are variants of the partial credit model may be reconsidered in these terms. Two examples are given to demonstrate the approach. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

18.
The population-dependent concept of reliability is used in test score models such as classical test theory and the binomial error model, whereas in item response models, the population-independent concept of information is used. Reliability and information apply to both test score and item response models. Information is a conditional definition of precision, that is, the precision for a given subject; reliability is an unconditional definition, that is, the precision for a population of subjects. Information and reliability do not distinguish test score and item response models. The main distinction is that the parameters are specific for the test and the subject in test score models, whereas in item response models, the item parameters are separated from the subject parameters. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

19.
Describes item bias analysis in attitude and personality measurement using the techniques of item response theory (IRT). Data from 179 male and 119 female college students on the Mosher Forced-Choice Sex Guilt Inventory illustrate the procedures developed to distinguish true group differences in a psychologically meaningful construct from artifactual differences due to some aspect of the test construction process. This analysis suggests that the sex difference in scores on this inventory reflects the item composition of the measure rather than a true group difference on a global guilt continuum. Recommendations for the application of IRT item analysis are presented. (31 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

20.
Recent legal developments appear to sanction the use of psychometrically unsound procedures for examining differential item functioning (DIF) on standardized tests. More appropriate approaches involve the use of item response theory (IRT). However, many IRT-based DIF studies have used F. M. Lord's (see record 1987-17535-001) joint maximum likelihood procedure, which can lead to incorrect and misleading results. A Monte Carlo simulation was conducted to evaluate the effectiveness of two other methods of parameter estimation: marginal maximum likelihood estimation and Bayes modal estimation. Sample size and data dimensionality were manipulated in the simulation. Results indicated that both estimation methods (a) provided more accurate parameter estimates and less inflated Type I error rates than joint maximum likelihood, (b) were robust to multidimensionality, and (c) produced more accurate parameter estimates and higher rates of identifying DIF with larger samples. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号