首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 26 毫秒
1.
Statistical analyses of Differential Item Functioning (DIF) can be used for rigorous translation evaluations. DIF techniques test whether each item functions in the same way, irrespective of the country, language, or culture of the respondents. For a given level of health, the score on any item should be independent of nationality. This requirement can be tested through contingency-table methods, which are efficient for analyzing all types of items. We investigated DIF in the Danish translation of the SF-36 Health Survey, using two general population samples (USA, n = 1,506; Denmark, n = 3,950). DIF was identified for 12 out of 35 items. These results agreed with independent ratings of translation quality, but the statistical techniques were more sensitive. When included in scales, the items exhibiting DIF had only a little impact on conclusions about cross-national differences in health in the general population. However, if used as single items, the DIF items could seriously bias results from cross-national comparisons. Also, the DIF items might have larger impact on cross-national comparison of groups with poorer health status. We conclude that analysis of DIF is useful for evaluating questionnaire translations.  相似文献   

2.
Statistical methods based on item response theory (IRT) were used to bidirectionally evaluate the measurement equivalence of translated American and German intelligence tests. Items that displayed differential item functioning (DIF) were identified, and content analysis was used to determine probable sources, of DIF, either cultural or linguistic. The benefits of using an IRT analysis in examining the fidelity of translated tests are described. In addition, the influence of cultural differences on test translations and the use of DIF items to elucidate cultural differences are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

3.
The Mattis Dementia Rating Scale (MDRS) is a commonly used cognitive measure designed to assess the course of decline in progressive dementias. However, little information is available about possible systematic racial bias on the items presented in this test. We investigated race as a potential source of test bias and differential item functioning in 40 pairs of African American and Caucasian dementia patients (N = 80), matched on age, education, and gender. Principal component analysis revealed similar patterns and magnitudes across component loadings for each racial group, indicating no clear evidence of test bias on account of race. Results of an item analysis of the MDRS revealed differential item functioning across groups on only 4 of 36 items, which may potentially be dropped to produce a modified MDRS that may be less sensitive to cultural factors. Given the absence of test bias because of race, the observed racial differences on the total MDRS score are most likely associated with group differences in dementia severity. We conclude that the MDRS shows no appreciable evidence of test bias and minimal differential item functioning (item bias) because of race, suggesting that the MDRS may be used in both African American and Caucasian dementia patients to assess dementia severity.  相似文献   

4.
The study of potential racial and gender bias in individual test items is a major research area today. The fact that research has established that total scores on ability and achievement tests are predictively unbiased raises the question of whether there is in fact any real bias at the item level. No theoretical rationale for expecting such bias has been advanced. It appears that findings of item bias (differential item functioning; DIF) can be explained by three factors: failure to control for measurement error in ability estimates, violations of the unidimensionality assumption required by DIF detection methods, and reliance on significance testing (causing tiny artifactual DIF effects to be statistically significant because sample sizes are very large). After taking into account these artifacts, there appears to be no evidence that items on currently used tests function differently in different racial and gender groups. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

5.
Drinking behavior in preadolescence is a significant predictor of both short- and long-term negative consequences. This study examined the psychometric properties of 1 known risk factor for drinking in this age group, alcohol expectancies, within an item response theory framework. In a sample of middle school youths (N = 1,273), the authors tested differential item functioning (DIF) in positive and negative alcohol expectancies across grade, gender, and ethnicity. Multiple-indicator multiple-cause model analyses tested differences in alcohol use as a potential explanation for observed DIF across groups. Results showed that most expectancy items did not exhibit DIF. For items where DIF was indicated, differences in alcohol use did not explain differences in item parameters. Positive and negative expectancies also systematically differed in the location parameter. Latent variable scale scores of both positive and negative expectancies were associated with drinking behavior cross-sectionally, while only positive expectancies predicted drinking prospectively. Improving the measurement of alcohol expectancies can help researchers better assess this important risk factor for drinking in this population, particularly the identification of those with either very high positive or very low negative alcohol expectancies. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

6.
In this article, the authors developed a common strategy for identifying differential item functioning (DIF) items that can be implemented in both the mean and covariance structures method (MACS) and item response theory (IRT). They proposed examining the loadings (discrimination) and the intercept (location) parameters simultaneously using the likelihood ratio test with a free-baseline model and Bonferroni corrected critical p values. They compared the relative efficacy of this approach with alternative implementations for various types and amounts of DIF, sample sizes, numbers of response categories, and amounts of impact (latent mean differences). Results indicated that the proposed strategy was considerably more effective than an alternative approach involving a constrained-baseline model. Both MACS and IRT performed similarly well in the majority of experimental conditions. As expected, MACS performed slightly worse in dichotomous conditions but better than IRT in polytomous cases where sample sizes were small. Also, contrary to popular belief, MACS performed well in conditions where DIF was simulated on item thresholds (item means), and its accuracy was not affected by impact. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

7.
The authors examined gender bias in the diagnostic criteria for Diagnostic and Statistical Manual of Mental Disorders (4th ed., text revision; American Psychiatric Association, 2000) personality disorders. Participants (N=599) were selected from 2 large, nonclinical samples on the basis of information from self-report questionnaires and peer nominations that suggested the presence of personality pathology. All were interviewed with the Structured Interview for DSM-IV Personality (B. Pfohl, N. Blum, & M. Zimmerman, 1997). Using item response theory methods, the authors compared data from 315 men and 284 women, searching for evidence of differential item functioning in the diagnostic features of 10 personality disorder categories. Results indicated significant but moderate measurement bias pertaining to gender for 6 specific criteria. In other words, men and women with equivalent levels of pathology endorsed the items at different rates. For 1 paranoid personality disorder criterion and 3 antisocial criteria, men were more likely to endorse the biased items. For 2 schizoid personality disorder criteria, women were more likely to endorse the biased items. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

8.
Log-linear models are used to investigate contingency tables that cross-classify respondents according to item response, mental health status (MHS), and the background variables of ethnicity and gender. Specifically, log-linear models are used to examine item validity, defined as an item response by MHS interaction, and differential item functioning (DIF), defined as an interaction between item response and a background variable. The investigation focused on a set of items that measure subjective well-being and coping behavior. Female (n?=?627) and male (n?=?338) respondents represented 3 ethnic groups: African American, Anglo-American, and Hispanic/Latino. Strong evidence of item validity and some evidence of DIF was found. Most of the interaction between item response and either ethnicity or gender occurred among Ss with diminished mental health. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

9.
This study investigated the differential responding of 49 male and 92 female college students to subtle and obvious MCMI scale items. It had been predicted that item subtlety would be correlated positively with item endorsement. This prediction was supported across all 175 MCMI items (r = .34). In addition, subjects endorsed a greater percentage of subtle than obvious subscale items for eight Basic Personality scales and two of three Pathological Personality scales. However, this pattern was not consistent for the nine Symptom Disorder scales. It also had been predicted that gender would moderate subjects' differential responding to subtle and obvious items, whereby males would show a greater tendency than females to endorse relatively more subtle than obvious items. This prediction was not supported.  相似文献   

10.
Measurement invariance is a prerequisite for confident cross-cultural comparisons of personality profiles. Multigroup confirmatory factor analysis was used to detect differential item functioning (DIF) in factor loadings and intercepts for the Revised NEO Personality Inventory (P. T. Costa, Jr., & R. R. McCrae, 1992) in comparisons of college students in the United States (N = 261), Philippines (N = 268), and Mexico (N = 775). About 40%–50% of the items exhibited some form of DIF and item-level noninvariance often carried forward to the facet level at which scores are compared. After excluding DIF items, some facet scales were too short or unreliable for cross-cultural comparisons, and for some other facets, cultural mean differences were reduced or eliminated. The results indicate that considerable caution is warranted in cross-cultural comparisons of personality profiles. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

11.
A psychometric analysis of 2 interview-based measures of cognitive deficits was conducted: the 21-item Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS; Ventura et al., 2008), and the 20-item Schizophrenia Cognition Rating Scale (SCoRS; Keefe et al., 2006), which were administered on 2 occasions to a sample of people with schizophrenia. Traditional psychometrics, bifactor analysis, and item response theory methods were used to explore item functioning and dimensionality and to compare instruments. Despite containing similar item content, responses to the CGI-CogS demonstrated superior psychometric properties (e.g., higher item intercorrelations, better spread of ratings across response categories) relative to the SCoRS. The authors argue that these differences arise mainly from the differential use of prompts and how the items are phrased and scored. Bifactor analysis demonstrated that although both measures capture a broad range of cognitive functioning (e.g., working memory, social cognition), the common variance on each is overwhelmingly explained by a single general factor. Item response theory analyses of the combined pool of 41 items showed that measurement precision is peaked in the mild to moderate range of cognitive impairment. Finally, simulated adaptive testing revealed that only about 10 to 12 items are necessary to achieve latent trait level estimates with reasonably small standard errors for most individuals. This suggests that these interview-based measures of cognitive deficits could be shortened without loss of measurement precision. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

12.
PURPOSE: To describe the development of an instrument to measure women's knowledge of osteoporosis based on Orem's self-care theory and the latest clinical research on osteoporosis. SAMPLE: One hundred and four women from four groups including graduate and undergraduate nursing students, sociology students, and a community sample, completed the instrument. METHODS: Items for the instrument were developed from three objectives related to osteoporosis risk factors, known facts and preventive behaviors. There were 34 items on the original instrument. It was content validated by experts and subjected to item analysis. The report contains a copy of the instrument with the theoretical classification and item analysis. FINDINGS: The Facts on Osteoporosis Quiz had a content validity index of .92, a reliability of .83 and a reading level of sixth grade. Item difficulty and item discrimination were used to delete items. The final instrument contains 25 items. CONCLUSION: The quiz is a simple, inexpensive measure that can be used in various settings by nurses to assess women's knowledge of self-care in osteoporosis.  相似文献   

13.
This study used data from 3 sites to examine the invariance and psychometric characteristics of the Brief Symptom Inventory–18 across Black, Hispanic, and White mothers of 5th graders (N = 4,711; M = 38.07 years of age, SD = 7.16). Internal consistencies were satisfactory for all subscale scores of the instrument regardless of ethnic group membership. Mean and covariance structures analysis indicated that the hypothesized 3-factor structure of the instrument was not robust across ethnic groups. It provided a reasonable approximation to the data for Black and White women but not for Hispanic women. Tests for differential item functioning (DIF) were therefore conducted for only Black and White women. Analyses revealed no more than trivial instances of nonuniform DIF but more substantial evidence of uniform DIF for 3 of the 18 items. After having established partial strong factorial invariance of the instrument, latent factor means were found to be significantly higher for Black than for White women on all 3 subscales (somatization, depression, anxiety). In conclusion, the instrument may be used for mean comparisons between Black and White women. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

14.
Proposes and empirically evaluates a general model of faking on self-report personality test items. The model predicts that differential test item response latencies should be faster for schema-congruent test answers than for noncongruent responses. Thus, individuals faking good should take relatively longer to endorse socially undesirable test item content than desirable test item content. Conversely, individuals faking bad should endorse socially desirable test item content relatively slower than undesirable test item content. Support for the model was found to generalize across personality inventories and across populations of university students and maximum security prisoners. Conflicting results from previous research are viewed in terms of the model. Further testing of the model's generality and practical relevance is discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

15.
This study demonstrated the application of an innovative item response theory (IRT) based approach to evaluating measurement equivalence, comparing a newly developed Spanish version of the Posttraumatic Stress Disorder Checklist-Civilian Version (PCL-C) with the established English version. Basic principles and practical issues faced in the application of IRT methods for instrument evaluation are discussed. Data were derived from a study of the mental health consequences of community violence in both Spanish speakers (n = 102) and English speakers (n = 284). Results of differential item functioning (DIF) analyses revealed that the 2 versions were not fully equivalent on an item-by-item basis in that 6 of the 17 items displayed uniform DIF. No bias was observed, however, at the level of the composite PCL-C scale score, indicating that the 2 language versions can be combined for scale-level analyses. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

16.
Relationship between culture and responses to biodata employment items.   总被引:1,自引:0,他引:1  
The relationship between Black–White cultural value differences and responses to biodata employment items was investigated. Black and White college students were found to differ in endorsement of cultural values pertaining to basic human nature, the relationship between the individual and nature, temporal focus, and interpersonal relations. Using the loglinear approach suggested by B. F. Green et al (see record 1990-02999-001), the researchers found that over one quarter of the biodata employment items they examined exhibited differential item functioning (DIF) between racial subgroups. Although cultural values of the respondent were related to biodata response option selection, only limited evidence was found for the hypothesis that cultural values are associated with the observed differences in Black–White response choices. Recommendations regarding the further investigation of cultural influences on DIF are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

17.
18.
Self-report measures require respondents to comprehend the inquiry and then engage the self. Two studies investigated how these 2 processes affect the answers produced. In Study 1, 480 participants completed a locus-of-control scale describing themselves, their best friend, or Bill Cosby. Item answers became more reliable as the items moved from the beginning to the end of the measure. The similar increase for self, friend, and Cosby suggested that exposure to the content, rather than self-engagement, was driving the reliability shift. Self-engagement did activate an actor–observer difference in scale means. Study 2 focused on the content engagement process. With more item experience, respondents were better able to distinguish that prototypic items belonged to the locus-ofcontrol scale and that distractor items did not. These studies imply that early questions clarify the meaning of a measure and improve the reliability of later answers. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

19.
Personality test data from the California Psychological Inventory were collected on 99 pairs of identical and 99 pairs of fraternal adult male twins. Heritabilities were comuted for all 18 scales and compared to the heritabilities for "pure" scales with overlapping items omitted. Two of the pure scales, Responsibility and Femininity, had zero heritabilities, whereas all of the full scales had moderate to high heritabilities. It was concluded that item overlap has contributed significantly to previous failures to find evidence for the differential heritability of personality traits as measured by the CPI. CPI items were classified into genetic or environmental categories and separate factor analyses of items in these categories revealed more differences than similarities in factor structure. The genetic personality factors included Conversational Poise, Compulsiveness, and Social Ease. Environmental factors included Confidence in Leadership, Impulse Control, Philosophical Attitudes, Intellectual Interest, and Exhibitionism. Compared to the genetic factors, each of the environmental factors accounted for only a very small percentage of the variance.  相似文献   

20.
The Spanish and English Neuropsychological Assessment Scales were devised to be a broad set of psychometrically matched measures with equivalent Spanish and English versions. Study 1 in this report used item response theory methods to refine scales. Results strongly supported psychometric matching across English and Spanish versions and, for most scales, within English and Spanish versions. Study 2 supported in both English and Spanish subsamples the 6-domain model of ability that guided scale construction. Study 3 examined differential item functioning (DIF) of one scale (Object Naming) in relation to education, ethnicity, gender, and age. Effects of DIF on scale-level ability scores were limited. Results demonstrate an empirically guided psychometric approach to test construction for multiethnic and multilingual test applications. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号