首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This study demonstrated the application of an innovative item response theory (IRT) based approach to evaluating measurement equivalence, comparing a newly developed Spanish version of the Posttraumatic Stress Disorder Checklist-Civilian Version (PCL-C) with the established English version. Basic principles and practical issues faced in the application of IRT methods for instrument evaluation are discussed. Data were derived from a study of the mental health consequences of community violence in both Spanish speakers (n = 102) and English speakers (n = 284). Results of differential item functioning (DIF) analyses revealed that the 2 versions were not fully equivalent on an item-by-item basis in that 6 of the 17 items displayed uniform DIF. No bias was observed, however, at the level of the composite PCL-C scale score, indicating that the 2 language versions can be combined for scale-level analyses. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

2.
Log-linear models are used to investigate contingency tables that cross-classify respondents according to item response, mental health status (MHS), and the background variables of ethnicity and gender. Specifically, log-linear models are used to examine item validity, defined as an item response by MHS interaction, and differential item functioning (DIF), defined as an interaction between item response and a background variable. The investigation focused on a set of items that measure subjective well-being and coping behavior. Female (n?=?627) and male (n?=?338) respondents represented 3 ethnic groups: African American, Anglo-American, and Hispanic/Latino. Strong evidence of item validity and some evidence of DIF was found. Most of the interaction between item response and either ethnicity or gender occurred among Ss with diminished mental health. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

3.
OBJECTIVES: In the past few years, the SF-36 Health Survey has drawn considerable attention from researchers in non-English-speaking countries. This report contributes to the growing body of literature on this instrument by reporting the results of a national study conducted in Israel. The study examined the psychometric properties of the Hebrew translation based on a sample of the adult population of Israel and evaluated the results from a cross-national perspective. METHODS: The sample included 2,030 adults drawn from the Jewish population, aged 45 to 75 years. The SF-36 Health Survey was administered in face-to-face interviews as part of a broader health study. RESULTS: The pattern of correlations among items and the internal consistency scores pointed to high reliability. Confirmatory factor analysis using the Amos 3.61 program supported the hypothesized factorial structure. Specifically, the items clustered around eight health dimensions, as was found in studies in other societies. Clear and statistically significant differences in the SF-36 Health Survey scores were found among age groups and population groups distinguished by the degree of chronic health problems. CONCLUSIONS: Results of the analysis indicate that the instrument provided an appropriate measure of general health status. The findings clearly indicate that the translation into the Hebrew language and the application of the instrument to a culturally heterogeneous population did not diminish the qualities of the instrument. They also point to certain items that might be modified to reduce problems of synonimity and embeddedness.  相似文献   

4.
The study of potential racial and gender bias in individual test items is a major research area today. The fact that research has established that total scores on ability and achievement tests are predictively unbiased raises the question of whether there is in fact any real bias at the item level. No theoretical rationale for expecting such bias has been advanced. It appears that findings of item bias (differential item functioning; DIF) can be explained by three factors: failure to control for measurement error in ability estimates, violations of the unidimensionality assumption required by DIF detection methods, and reliance on significance testing (causing tiny artifactual DIF effects to be statistically significant because sample sizes are very large). After taking into account these artifacts, there appears to be no evidence that items on currently used tests function differently in different racial and gender groups. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

5.
Drinking behavior in preadolescence is a significant predictor of both short- and long-term negative consequences. This study examined the psychometric properties of 1 known risk factor for drinking in this age group, alcohol expectancies, within an item response theory framework. In a sample of middle school youths (N = 1,273), the authors tested differential item functioning (DIF) in positive and negative alcohol expectancies across grade, gender, and ethnicity. Multiple-indicator multiple-cause model analyses tested differences in alcohol use as a potential explanation for observed DIF across groups. Results showed that most expectancy items did not exhibit DIF. For items where DIF was indicated, differences in alcohol use did not explain differences in item parameters. Positive and negative expectancies also systematically differed in the location parameter. Latent variable scale scores of both positive and negative expectancies were associated with drinking behavior cross-sectionally, while only positive expectancies predicted drinking prospectively. Improving the measurement of alcohol expectancies can help researchers better assess this important risk factor for drinking in this population, particularly the identification of those with either very high positive or very low negative alcohol expectancies. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

6.
In this article, the authors developed a common strategy for identifying differential item functioning (DIF) items that can be implemented in both the mean and covariance structures method (MACS) and item response theory (IRT). They proposed examining the loadings (discrimination) and the intercept (location) parameters simultaneously using the likelihood ratio test with a free-baseline model and Bonferroni corrected critical p values. They compared the relative efficacy of this approach with alternative implementations for various types and amounts of DIF, sample sizes, numbers of response categories, and amounts of impact (latent mean differences). Results indicated that the proposed strategy was considerably more effective than an alternative approach involving a constrained-baseline model. Both MACS and IRT performed similarly well in the majority of experimental conditions. As expected, MACS performed slightly worse in dichotomous conditions but better than IRT in polytomous cases where sample sizes were small. Also, contrary to popular belief, MACS performed well in conditions where DIF was simulated on item thresholds (item means), and its accuracy was not affected by impact. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

7.
Controversy abounds over attributing group differences on tests to nature, nurture, or test bias. Limitations of correlational sampling from natural populations necessitate experimental methods to resolve underlying issues. In classicial psychometrics test items are selected from a larger item pool through analysis of item responses in a sample of subjects. Rats of six inbred strains (n?=?366) were tested in multiple mazes to provide a large item pool. Six populations were created, each with differing proportions of each strain. Items selected through independent item analyses within each population yielded six tests. An independent cross-validation sample (n?=?146) provided scores on all six items. This sample was also tested in another set of maze problems defined as the criterion to be predicted. Strain means and intrastrain predictive validities for the six tests varied with strain representation in the population used for item selection (p?  相似文献   

8.
Measurement invariance is a prerequisite for confident cross-cultural comparisons of personality profiles. Multigroup confirmatory factor analysis was used to detect differential item functioning (DIF) in factor loadings and intercepts for the Revised NEO Personality Inventory (P. T. Costa, Jr., & R. R. McCrae, 1992) in comparisons of college students in the United States (N = 261), Philippines (N = 268), and Mexico (N = 775). About 40%–50% of the items exhibited some form of DIF and item-level noninvariance often carried forward to the facet level at which scores are compared. After excluding DIF items, some facet scales were too short or unreliable for cross-cultural comparisons, and for some other facets, cultural mean differences were reduced or eliminated. The results indicate that considerable caution is warranted in cross-cultural comparisons of personality profiles. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

9.
Item response theory methods were used to study differential item functioning (DIF) between gender groups on a measure of stress reaction. Results revealed that women were more likely to endorse items describing emotional vulnerability and sensitivity, whereas men were more likely to endorse items describing tension, irritability, and being easily upset. Item factor analysis yielded 5 correlated factors, and the DIF analysis, in turn, revealed differential gender mean differences on these factors. This finding illustrates how even in an essentially unidimensional scale, comparison of group mean differences can be affected by multidimensionality caused by item clusters that share similar content. Results do not support arguments that measures of negative affective dispositions "artificially" produce gender mean differences by focusing on specific selected content areas.  相似文献   

10.
The purpose of this study was to determine if the Mini-Mental State Examination (MMS; M. E. Folstein, S. E. Folstein, & R. R. McHugh, 1975) demonstrates item bias with respect to measuring cognitive functioning of older Hispanics and non-Hispanics. Assessment of differential item functioning (DIF) of individual MMS items across 3 language/ethnicity groups (English test administration/non-Hispanic ethnicity, English test administration/Hispanic ethnicity, and Spanish test administration/Hispanic ethnicity) was performed by using a logistic regression procedure. Fifteen of the 26 MMS items were significantly related to total score and were shown to provide unbiased measurement across the 3 groups. Normative data are presented for older Hispanics (n?=?365) and non-Hispanics (n?=?388) on the raw MMS, a 15-item version in which items with significant DIF were eliminated, and a total score statistically adjusted for effects of education and age. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

11.
This article reports on the main developmental stages and on the preliminary psychometric assessment of the final French version of the SF-36. A standard forward/backward translation procedure was followed. When translating survey items, the emphasis was placed on conceptual equivalence. When translating response choices, we attempted to select a set of response choices that replicate the U.S. version. The distance between the response choices was checked using visual analogue scales (N = 30). The adaptation procedure also included formal ratings of the difficulty of the translation, of the quality of the translation, and of the equivalence between the American source version and the French target version. The face validity was checked during lay panel sessions at which the translated questionnaire was administered to subjects from the general public, hospital employees, and subjects with a low level of education. Standard psychometric techniques were used to evaluate the cultural adaptation of the SF-36, using data from a general population survey. The main objective of this analysis was to determine how well the scaling assumptions (summated rating or Likert-type scaling construction) of the SF-36 were satisfied. The results support the claim that the scaling properties of the French version of the SF-36 are adequate and that health outcomes may be reliably assessed using this version of the instrument.  相似文献   

12.
Data from general population samples in 11 countries (n = 1483 to 9151) were used to assess data quality and test the assumptions underlying the construction and scoring of multi-item scales from the SF-36 Health Survey. Across all countries, the rate of item-level missing data generally was low, although slightly higher for items printed in the grid format. In each country, item means generally were clustered as hypothesized within scales. Correlations between items and hypothesized scales were greater than 0.40 with one exception, supporting item internal consistency. Items generally correlated significantly higher with their own scale than with competing scales, supporting item discriminant validity. Scales could be constructed for 93-100% of respondents. Internal consistency reliability of the eight SF-36 scales was above 0.70 for all scales, with two exceptions. Floor effects were low for all except the two role functioning scales; ceiling effects were high for both role functioning scales and also were noteworthy for the Physical Functioning, Bodily Pain, and Social Functioning scales in some countries. These results support the construction and scoring of the SF-36 translations in these 11 countries using the method of summated ratings.  相似文献   

13.
In this study, an item response theory-based differential functioning of items and tests (DFIT) framework (N. S. Raju, W. J. van der Linden, & P. F. Fleer, 1995) was applied to a Likert-type scale. Several differential item functioning (DIF) analyses compared the item characteristics of a 10-item satisfaction scale for Black and White examinees and for female and male examinees. F. M. Lord's (1980) chi-square and the extended signed area (SA) measures were also used. The results showed that the DFIT indices consistently performed in the expected manner. The results from Lord's chi-square and the SA procedures were somewhat varied across comparisons. A discussion of these results along with an illustration of an item with significant DIF and suggestions for future DIF research are presented. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

14.
Statistical methods based on item response theory (IRT) were used to bidirectionally evaluate the measurement equivalence of translated American and German intelligence tests. Items that displayed differential item functioning (DIF) were identified, and content analysis was used to determine probable sources, of DIF, either cultural or linguistic. The benefits of using an IRT analysis in examining the fidelity of translated tests are described. In addition, the influence of cultural differences on test translations and the use of DIF items to elucidate cultural differences are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

15.
This article presents evaluative information on the use of the original Ontario Child Health Study scales to serve as original-level measures of conduct disorder, hyperactivity and emotional disorder among children in the general (non clinic) population. Problem checklist assessments were obtained from parents and teachers of children aged six to 16 and youth aged 12 to 16 drawn from a general population (n = 1,751); and a mental health clinic sample (n = 1,027) in the same industrialized, urban setting. The results showed that the original OCHS scales possess adequate psychometric properties to be used as original-level measures of disorder. Correlations between individual items and their hypothesized scales were very strong, indicating convergent validity, while correlations between the same items and other (non hypothesized) scales were lower, indicating discriminant validity. Item analyses indicated that individual scale items possess both convergent and discriminant validity. Although the scales were skewed to the positive end of the continuum, they demonstrated good internal consistency (all estimates > or = 0.74) and test-retest (all estimates > or = 0.65) reliability. Finally, three different validity analyses confirmed hypotheses about how the original OCHS scales should perform if they provide useful measures of disorder.  相似文献   

16.
This study used data from 3 sites to examine the invariance and psychometric characteristics of the Brief Symptom Inventory–18 across Black, Hispanic, and White mothers of 5th graders (N = 4,711; M = 38.07 years of age, SD = 7.16). Internal consistencies were satisfactory for all subscale scores of the instrument regardless of ethnic group membership. Mean and covariance structures analysis indicated that the hypothesized 3-factor structure of the instrument was not robust across ethnic groups. It provided a reasonable approximation to the data for Black and White women but not for Hispanic women. Tests for differential item functioning (DIF) were therefore conducted for only Black and White women. Analyses revealed no more than trivial instances of nonuniform DIF but more substantial evidence of uniform DIF for 3 of the 18 items. After having established partial strong factorial invariance of the instrument, latent factor means were found to be significantly higher for Black than for White women on all 3 subscales (somatization, depression, anxiety). In conclusion, the instrument may be used for mean comparisons between Black and White women. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

17.
Objective: Health communication can help reduce the cancer burden by increasing processing of information about health interventions. Negative affect is associated with information processing and may be a barrier to successful health communication. Design and Main Outcome Measures: We examined associations between negative affect and information processing at the population level. Symptoms of depression (6 items) and cancer worry (1 item) operationalized negative affect; attention to health information (5 items) and cancer information-seeking experiences (6 items) operationalized information processing. Results: Higher cancer worry was associated with more attention to health information (p  相似文献   

18.
We describe the application of a revised version of the Dubowitz neurologic examination of the newborn in 224 low-risk, term newborn infants. The method has been updated by eliminating less useful items and including new items evaluating general movements and patterns of distribution of tone. An optimality score is included to make the evaluation more quantitative and for comparison with sequential examinations with neurophysiologic and imaging findings. The score is based on the distribution of the scores for each item in the population of low-risk term infants. We defined not only the most common pattern for each item but also the variability of the findings by using 10th and 5th centiles. Because most of the items assessing tone and the Moro reflex varied with gestational age between 37 and 42 weeks, the changes were incorporated in the scoring system. The total optimality score was the sum of the optimality scores of individual items. Although the association of 4 or more deviant scores was found in less than 10% of our infants, deviant results on 1 or 2 single items could be observed in a third of this normal population, suggesting that isolated deviant signs have little diagnostic value. In contrast, an abnormal distribution of tone patterns, which we have commonly observed in infants with brain lesions, was not found in this cohort.  相似文献   

19.
The purpose of this study was to investigate health-related quality of life (HRQOL) and functional ability among the least dependent elderly in residential care, and to compare them with information on the general population. A stratified systematic sample (n = 1,587) was drawn from a one-day census of patients in all public residential homes in Finland on December 2, 1991. Sixty-nine per cent of residents in 1992 were able to participate (n = 1,097) and 86% of them returned the questionnaire (n = 948), of which n = 795 were acceptable, the response rate being 72%. A postal survey was used for data collection. The personnel of residential homes were allowed to help residents complete the questionnaire, and 90% of respondents received such help. HRQOL was measured by the Nottingham Health Profile (NHP) and functional ability by a 14-item questionnaire. Finnish studies among the general population were used for comparisons. According to the NHP, the HRQOL appeared lower in institutional care and this was associated with the dependency level. Similarly, for most ADL items the general population had less restrictions than the least dependent residential care patients. In general, women expressed more difficulties in physical mobility and lack of energy than men. The longest stay elderly expressed better HRQOL. In multivariate models adjusted for age and gender those with poor vision had worse HRQOL in almost every dimension of NHP. Difficulties in speech were connected with emotional reactions and social isolation. Chronic illness limiting normal daily life predicted more problems in energy, pain, physical mobility, and emotional reactions. The married or widowed experienced less social isolation than single elderly. Higher education was related to better HRQOL in all NHP dimensions. Poorer perceived health was associated with lack of energy, pain, and emotional reactions. We conclude from these results that there are only a few clients in residential care whose HRQOL or functional ability compare with the non-institutionalized population.  相似文献   

20.
The present study aimed to develop a short form of the Spanish version of the Nottingham Health Profile (NHP) by means of Rasch analysis. Data from several Spanish studies that included the NHP since 1987 were collected in a common database. Forty-five different studies were included, covering a total of 9,419 subjects both from the general population and with different clinical pathologies. The overall questionnaire (38 items) was simultaneously analyzed using the dichotomous response model. Parameter estimates, model-data fit and separation statistics were computed. The items of the NHP were additionally regrouped into two different scales: Physical (19 items) and Psychological (19 items). Separated Physical and Psychological parameter estimates were produced using the simultaneous item calibrations as anchor values. Misfitting items were deleted, resulting in a 22 item final short form (NHP22)-11 Physical and 11 Psychological-. The evaluation of the item hierarchies confirmed the construct validity of the new questionnaire. To demonstrate the invariance of the NHP22 item calibrations, Rasch analyses were performed separately for each study included in the sample and for several sociodemographic and health status variables. Results confirmed the validity of using the NHP22 item calibrations to measure different groups of people categorized by gender, clinical and health status.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号