期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy.

Stark Stephen; Chernyshenko Oleksandr S.; Drasgow Fritz 《Canadian Metallurgical Quarterly》2006,91(6):1292

In this article, the authors developed a common strategy for identifying differential item functioning (DIF) items that can be implemented in both the mean and covariance structures method (MACS) and item response theory (IRT). They proposed examining the loadings (discrimination) and the intercept (location) parameters simultaneously using the likelihood ratio test with a free-baseline model and Bonferroni corrected critical p values. They compared the relative efficacy of this approach with alternative implementations for various types and amounts of DIF, sample sizes, numbers of response categories, and amounts of impact (latent mean differences). Results indicated that the proposed strategy was considerably more effective than an alternative approach involving a constrained-baseline model. Both MACS and IRT performed similarly well in the majority of experimental conditions. As expected, MACS performed slightly worse in dichotomous conditions but better than IRT in polytomous cases where sample sizes were small. Also, contrary to popular belief, MACS performed well in conditions where DIF was simulated on item thresholds (item means), and its accuracy was not affected by impact. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

2.

Evaluation of two methods for estimating item response theory parameters when assessing differential item functioning.

Lim Rodney G.; Drasgow Fritz 《Canadian Metallurgical Quarterly》1990,75(2):164

Recent legal developments appear to sanction the use of psychometrically unsound procedures for examining differential item functioning (DIF) on standardized tests. More appropriate approaches involve the use of item response theory (IRT). However, many IRT-based DIF studies have used F. M. Lord's (see record 1987-17535-001) joint maximum likelihood procedure, which can lead to incorrect and misleading results. A Monte Carlo simulation was conducted to evaluate the effectiveness of two other methods of parameter estimation: marginal maximum likelihood estimation and Bayes modal estimation. Sample size and data dimensionality were manipulated in the simulation. Results indicated that both estimation methods (a) provided more accurate parameter estimates and less inflated Type I error rates than joint maximum likelihood, (b) were robust to multidimensionality, and (c) produced more accurate parameter estimates and higher rates of identifying DIF with larger samples. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

3.

Applications of item response theory to analysis of attitude scale translations.

Hulin Charles L.; Drasgow Fritz; Komocar John 《Canadian Metallurgical Quarterly》1982,67(6):818

相似文献

4.

Psychometric equivalence of a translation of the Job Descriptive Index into Hebrew.

Hulin Charles L.; Mayer Laura J. 《Canadian Metallurgical Quarterly》1986,71(1):83

相似文献

5.

Differential item functioning in a Spanish translation of the PTSD Checklist: Detection and evaluation of impact.

Orlando Maria; Marshall Grant N. 《Canadian Metallurgical Quarterly》2002,14(1):50

This study demonstrated the application of an innovative item response theory (IRT) based approach to evaluating measurement equivalence, comparing a newly developed Spanish version of the Posttraumatic Stress Disorder Checklist-Civilian Version (PCL-C) with the established English version. Basic principles and practical issues faced in the application of IRT methods for instrument evaluation are discussed. Data were derived from a study of the mental health consequences of community violence in both Spanish speakers (n = 102) and English speakers (n = 284). Results of differential item functioning (DIF) analyses revealed that the 2 versions were not fully equivalent on an item-by-item basis in that 6 of the 17 items displayed uniform DIF. No bias was observed, however, at the level of the composite PCL-C scale score, indicating that the 2 language versions can be combined for scale-level analyses. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

6.

Identification of unique cultural response patterns by means of item response theory.

Ellis Barbara B.; Kimmel Herbert D. 《Canadian Metallurgical Quarterly》1992,77(2):177

An item response theory (IRT) analysis was used to identify unique cultural response patterns by comparing single-culture groups with a multicultural composite. A survey designed to measure attitudes toward mental health was administered in their native languages to American, German, and French working, retired, and student teachers. Item characteristic curves (ICCs) for each national group were compared with ICCs generated by composite reference containing all 3 cultural groups, thus providing an omnicultural reference point. Items that exhibited differential item functioning, that is, items with dissimilar ICCs for the composite reference and focal groups, were indicative of unique cultural response patterns to the attitude survey items. The advantages and disadvantages of this method in an IRT are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

7.

Relationship between culture and responses to biodata employment items. 总被引：1，自引：0，他引：1

Whitney David J.; Schmitt Neal 《Canadian Metallurgical Quarterly》1997,82(1):113

The relationship between Black–White cultural value differences and responses to biodata employment items was investigated. Black and White college students were found to differ in endorsement of cultural values pertaining to basic human nature, the relationship between the individual and nature, temporal focus, and interpersonal relations. Using the loglinear approach suggested by B. F. Green et al (see record 1990-02999-001), the researchers found that over one quarter of the biodata employment items they examined exhibited differential item functioning (DIF) between racial subgroups. Although cultural values of the respondent were related to biodata response option selection, only limited evidence was found for the hypothesis that cultural values are associated with the observed differences in Black–White response choices. Recommendations regarding the further investigation of cultural influences on DIF are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

8.

Are cross-cultural comparisons of personality profiles meaningful? Differential item and facet functioning in the Revised NEO Personality Inventory.

Church A. Timothy; Alvarez Juan M.; Mai Nhu T. Q.; French Brian F.; Katigbak Marcia S.; Ortiz Fernando A. 《Canadian Metallurgical Quarterly》2011,101(5):1068

Measurement invariance is a prerequisite for confident cross-cultural comparisons of personality profiles. Multigroup confirmatory factor analysis was used to detect differential item functioning (DIF) in factor loadings and intercepts for the Revised NEO Personality Inventory (P. T. Costa, Jr., & R. R. McCrae, 1992) in comparisons of college students in the United States (N = 261), Philippines (N = 268), and Mexico (N = 775). About 40%–50% of the items exhibited some form of DIF and item-level noninvariance often carried forward to the facet level at which scores are compared. After excluding DIF items, some facet scales were too short or unreliable for cross-cultural comparisons, and for some other facets, cultural mean differences were reduced or eliminated. The results indicate that considerable caution is warranted in cross-cultural comparisons of personality profiles. (PsycINFO Database Record (c) 2011 APA, all rights reserved) 相似文献

9.

Beyond group-mean differences: The concept of item bias.

Thissen David; Steinberg Lynne; Gerrard Meg 《Canadian Metallurgical Quarterly》1986,99(1):118

Describes item bias analysis in attitude and personality measurement using the techniques of item response theory (IRT). Data from 179 male and 119 female college students on the Mosher Forced-Choice Sex Guilt Inventory illustrate the procedures developed to distinguish true group differences in a psychologically meaningful construct from artifactual differences due to some aspect of the test construction process. This analysis suggests that the sex difference in scores on this inventory reflects the item composition of the measure rather than a true group difference on a global guilt continuum. Recommendations for the application of IRT item analysis are presented. (31 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

10.

Gender differences on negative affectivity: an IRT study of differential item functioning on the Multidimensional Personality Questionnaire Stress Reaction Scale

LL Smith SP Reise 《Canadian Metallurgical Quarterly》1998,75(5):1350-1362

Item response theory methods were used to study differential item functioning (DIF) between gender groups on a measure of stress reaction. Results revealed that women were more likely to endorse items describing emotional vulnerability and sensitivity, whereas men were more likely to endorse items describing tension, irritability, and being easily upset. Item factor analysis yielded 5 correlated factors, and the DIF analysis, in turn, revealed differential gender mean differences on these factors. This finding illustrates how even in an essentially unidimensional scale, comparison of group mean differences can be affected by multidimensionality caused by item clusters that share similar content. Results do not support arguments that measures of negative affective dispositions "artificially" produce gender mean differences by focusing on specific selected content areas. 相似文献

11.

Racial and gender bias in ability and achievement tests: Resolving the apparent paradox.

Hunter John E.; Schmidt Frank L. 《Canadian Metallurgical Quarterly》2000,6(1):151

The study of potential racial and gender bias in individual test items is a major research area today. The fact that research has established that total scores on ability and achievement tests are predictively unbiased raises the question of whether there is in fact any real bias at the item level. No theoretical rationale for expecting such bias has been advanced. It appears that findings of item bias (differential item functioning; DIF) can be explained by three factors: failure to control for measurement error in ability estimates, violations of the unidimensionality assumption required by DIF detection methods, and reliance on significance testing (causing tiny artifactual DIF effects to be statistically significant because sample sizes are very large). After taking into account these artifacts, there appears to be no evidence that items on currently used tests function differently in different racial and gender groups. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

12.

Differential item functioning in the Danish translation of the SF-36

JB Bjorner S Kreiner JE Ware MT Damsgaard P Bech 《Canadian Metallurgical Quarterly》1998,51(11):1189-1202

Statistical analyses of Differential Item Functioning (DIF) can be used for rigorous translation evaluations. DIF techniques test whether each item functions in the same way, irrespective of the country, language, or culture of the respondents. For a given level of health, the score on any item should be independent of nationality. This requirement can be tested through contingency-table methods, which are efficient for analyzing all types of items. We investigated DIF in the Danish translation of the SF-36 Health Survey, using two general population samples (USA, n = 1,506; Denmark, n = 3,950). DIF was identified for 12 out of 35 items. These results agreed with independent ratings of translation quality, but the statistical techniques were more sensitive. When included in scales, the items exhibiting DIF had only a little impact on conclusions about cross-national differences in health in the general population. However, if used as single items, the DIF items could seriously bias results from cross-national comparisons. Also, the DIF items might have larger impact on cross-national comparison of groups with poorer health status. We conclude that analysis of DIF is useful for evaluating questionnaire translations. 相似文献

13.

Assessing differential functioning in a satisfaction scale.

Collins William C.; Raju Nambury S.; Edwards Jack E. 《Canadian Metallurgical Quarterly》2000,85(3):451

In this study, an item response theory-based differential functioning of items and tests (DFIT) framework (N. S. Raju, W. J. van der Linden, & P. F. Fleer, 1995) was applied to a Likert-type scale. Several differential item functioning (DIF) analyses compared the item characteristics of a 10-item satisfaction scale for Black and White examinees and for female and male examinees. F. M. Lord's (1980) chi-square and the extended signed area (SA) measures were also used. The results showed that the DFIT indices consistently performed in the expected manner. The results from Lord's chi-square and the SA procedures were somewhat varied across comparisons. A discussion of these results along with an illustration of an item with significant DIF and suggestions for future DIF research are presented. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

14.

Cognitive psychology meets psychometric theory: On the relation between process models for decision making and latent variable models for individual differences.

van der Maas Han L. J.; Molenaar Dylan; Maris Gunter; Kievit Rogier A.; Borsboom Denny 《Canadian Metallurgical Quarterly》2011,118(2):339

This article analyzes latent variable models from a cognitive psychology perspective. We start by discussing work by Tuerlinckx and De Boeck (2005), who proved that a diffusion model for 2-choice response processes entails a 2-parameter logistic item response theory (IRT) model for individual differences in the response data. Following this line of reasoning, we discuss the appropriateness of IRT for measuring abilities and bipolar traits, such as pro versus contra attitudes. Surprisingly, if a diffusion model underlies the response processes, IRT models are appropriate for bipolar traits but not for ability tests. A reconsideration of the concept of ability that is appropriate for such situations leads to a new item response model for accuracy and speed based on the idea that ability has a natural zero point. The model implies fundamentally new ways to think about guessing, response speed, and person fit in IRT. We discuss the relation between this model and existing models as well as implications for psychology and psychometrics. (PsycINFO Database Record (c) 2011 APA, all rights reserved) 相似文献

15.

Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance.

Reise Steven P.; Widaman Keith F.; Pugh Robin H. 《Canadian Metallurgical Quarterly》1993,114(3):552

Investigated the utility of confirmatory factor analysis (CFA) and item response theory (IRT) models for testing the comparability of psychological measurements. Both procedures were used to investigate whether mood ratings collected in Minnesota and China were comparable. Several issues were addressed. The 1st issue was that of establishing a common measurement scale across groups, which involves full or partial measurement invariance of trait indicators. It is shown that using CFA or IRT models, test items that function differentially as trait indicators across groups need not interfere with comparing examinees on the same trait dimension. Second, the issue of model fit was addressed. It is proposed that person-fit statistics be used to judge the practical fit of IRT models. Finally, topics for future research are suggested. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

16.

Assessing the fit of measurement models at the individual level: A comparison of item response theory and covariance structure approaches.

Reise Steven P.; Widaman Keith F. 《Canadian Metallurgical Quarterly》1999,4(1):3

The goal of this study was to explore similarities and differences in person-fit assessment under item response theory (IRT) and covariance structure analysis (CSA) measurement models. The responses of 3,245 individuals who completed 3 personality scales were analyzed under an IRT model and a CSA model. The authors then computed person-fit statistics for individual examinees under both IRT and CSA models. To be specific, for each examinee, the authors computed a standardized person-fit index for the IRT models, called Zl; in addition, an individual's contribution to chi-square, called IND{chi}, was used as a person-fit indicator for CSA models. Findings indicated that these indices are relatively free of confounds with examinee trait level. However, the relationship between Zl, and IND{chi}, values was small, suggesting that the indices identify different examinees as not fitting a model. Implications of the results and directions for future inquiry are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

17.

Use of log-linear models for assessing differential item functioning in a measure of psychological functioning.

Dancer L. Suzanne; Anderson Arthur J.; Derlin Roberta L. 《Canadian Metallurgical Quarterly》1994,62(4):710

Log-linear models are used to investigate contingency tables that cross-classify respondents according to item response, mental health status (MHS), and the background variables of ethnicity and gender. Specifically, log-linear models are used to examine item validity, defined as an item response by MHS interaction, and differential item functioning (DIF), defined as an interaction between item response and a background variable. The investigation focused on a set of items that measure subjective well-being and coping behavior. Female (n?=?627) and male (n?=?338) respondents represented 3 ethnic groups: African American, Anglo-American, and Hispanic/Latino. Strong evidence of item validity and some evidence of DIF was found. Most of the interaction between item response and either ethnicity or gender occurred among Ss with diminished mental health. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

18.

Psychopathy and ethnicity: Structural, item, and test generalizability of the Psychopathy Checklist—Revised (PCL-R) in Caucasian and African American participants.

Cooke David J.; Kosson David S.; Michie Christine 《Canadian Metallurgical Quarterly》2001,13(4):531

The Psychopathy Checklist--Revised (PCL-R) is an important measure in both applied and research settings. Evidence for its validity is mostly derived from male Caucasian participants. PCL-R ratings of 359 Caucasian and 356 African American participants were compared using confirmatory factor analysis (CFA) and item response theory (IRT) analyses. Previous research has indicated that 13 items of the PCL-R can be described by a 3-factor hierarchical model. This model was replicated in this sample. No cross-group difference in factor structure could be found using CFA; the structure of psychopathy is the same in both groups. IRT methods indicated significant but small differences in the performance of 5 of the 20 PCL-R items. No significant differential test functioning was found, indicating that the item differences canceled each other out. It is concluded that the PCL-R can be used, in an unbiased way, with African American participants. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

19.

Measurement Equivalence and Differential Item Functioning in Family Psychology.

Bingenheimer Jeffrey B.; Raudenbush Stephen W.; Leventhal Tama; Brooks-Gunn Jeanne 《Canadian Metallurgical Quarterly》2005,19(3):441

Several hypotheses in family psychology involve comparisons of sociocultural groups. Yet the potential for cross-cultural inequivalence in widely used psychological measurement instruments threatens the validity of inferences about group differences. Methods for dealing with these issues have been developed via the framework of item response theory. These methods deal with an important type of measurement inequivalence, called differential item functioning (DIF). The authors introduce DIF analytic methods, linking them to a well-established framework for conceptualizing cross-cultural measurement equivalence in psychology (C.H. Hui & H.C. Triandis, 1985). They illustrate the use of DIF methods using data from the Project on Human Development in Chicago Neighborhoods (PHDCN). Focusing on the Caregiver Warmth and Environmental Organization scales from the PHDCN's adaptation of the Home Observation for Measurement of the Environment Inventory, the authors obtain results that exemplify the range of outcomes that may result when these methods are applied to psychological measurement instruments. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

20.

Analysis of item response and differential item functioning of alcohol expectancies in middle school youths.

McCarthy Denis M.; Pedersen Sarah L.; D’Amico Elizabeth J. 《Canadian Metallurgical Quarterly》2009,21(3):444

Drinking behavior in preadolescence is a significant predictor of both short- and long-term negative consequences. This study examined the psychometric properties of 1 known risk factor for drinking in this age group, alcohol expectancies, within an item response theory framework. In a sample of middle school youths (N = 1,273), the authors tested differential item functioning (DIF) in positive and negative alcohol expectancies across grade, gender, and ethnicity. Multiple-indicator multiple-cause model analyses tested differences in alcohol use as a potential explanation for observed DIF across groups. Results showed that most expectancy items did not exhibit DIF. For items where DIF was indicated, differences in alcohol use did not explain differences in item parameters. Positive and negative expectancies also systematically differed in the location parameter. Latent variable scale scores of both positive and negative expectancies were associated with drinking behavior cross-sectionally, while only positive expectancies predicted drinking prospectively. Improving the measurement of alcohol expectancies can help researchers better assess this important risk factor for drinking in this population, particularly the identification of those with either very high positive or very low negative alcohol expectancies. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献