首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The Spanish and English Neuropsychological Assessment Scales were devised to be a broad set of psychometrically matched measures with equivalent Spanish and English versions. Study 1 in this report used item response theory methods to refine scales. Results strongly supported psychometric matching across English and Spanish versions and, for most scales, within English and Spanish versions. Study 2 supported in both English and Spanish subsamples the 6-domain model of ability that guided scale construction. Study 3 examined differential item functioning (DIF) of one scale (Object Naming) in relation to education, ethnicity, gender, and age. Effects of DIF on scale-level ability scores were limited. Results demonstrate an empirically guided psychometric approach to test construction for multiethnic and multilingual test applications. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

2.
In this article, the authors developed a common strategy for identifying differential item functioning (DIF) items that can be implemented in both the mean and covariance structures method (MACS) and item response theory (IRT). They proposed examining the loadings (discrimination) and the intercept (location) parameters simultaneously using the likelihood ratio test with a free-baseline model and Bonferroni corrected critical p values. They compared the relative efficacy of this approach with alternative implementations for various types and amounts of DIF, sample sizes, numbers of response categories, and amounts of impact (latent mean differences). Results indicated that the proposed strategy was considerably more effective than an alternative approach involving a constrained-baseline model. Both MACS and IRT performed similarly well in the majority of experimental conditions. As expected, MACS performed slightly worse in dichotomous conditions but better than IRT in polytomous cases where sample sizes were small. Also, contrary to popular belief, MACS performed well in conditions where DIF was simulated on item thresholds (item means), and its accuracy was not affected by impact. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

3.
Because of the practical, theoretical, and legal implications of differential item functioning (DIF) for organizational assessments, studies of measurement equivalence are a necessary first step before scores can be compared across individuals from different groups. However, commonly recommended criteria for evaluating results from these analyses have several important limitations. The present study proposes an effect size index for confirmatory factor analytic (CFA) studies of measurement equivalence to address 1 of these limitations. The application of this index is illustrated with personality data from American English, Greek, and Chinese samples. Results showed a range of nonequivalence across these samples, and these differences were linked to the observed effects of DIF on the outcomes of the assessment (i.e., group-level mean differences and adverse impact). (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

4.
Statistical methods based on item response theory (IRT) were used to bidirectionally evaluate the measurement equivalence of translated American and German intelligence tests. Items that displayed differential item functioning (DIF) were identified, and content analysis was used to determine probable sources, of DIF, either cultural or linguistic. The benefits of using an IRT analysis in examining the fidelity of translated tests are described. In addition, the influence of cultural differences on test translations and the use of DIF items to elucidate cultural differences are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

5.
The purpose of this study was to determine if the Mini-Mental State Examination (MMS; M. E. Folstein, S. E. Folstein, & R. R. McHugh, 1975) demonstrates item bias with respect to measuring cognitive functioning of older Hispanics and non-Hispanics. Assessment of differential item functioning (DIF) of individual MMS items across 3 language/ethnicity groups (English test administration/non-Hispanic ethnicity, English test administration/Hispanic ethnicity, and Spanish test administration/Hispanic ethnicity) was performed by using a logistic regression procedure. Fifteen of the 26 MMS items were significantly related to total score and were shown to provide unbiased measurement across the 3 groups. Normative data are presented for older Hispanics (n?=?365) and non-Hispanics (n?=?388) on the raw MMS, a 15-item version in which items with significant DIF were eliminated, and a total score statistically adjusted for effects of education and age. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

6.
Drinking behavior in preadolescence is a significant predictor of both short- and long-term negative consequences. This study examined the psychometric properties of 1 known risk factor for drinking in this age group, alcohol expectancies, within an item response theory framework. In a sample of middle school youths (N = 1,273), the authors tested differential item functioning (DIF) in positive and negative alcohol expectancies across grade, gender, and ethnicity. Multiple-indicator multiple-cause model analyses tested differences in alcohol use as a potential explanation for observed DIF across groups. Results showed that most expectancy items did not exhibit DIF. For items where DIF was indicated, differences in alcohol use did not explain differences in item parameters. Positive and negative expectancies also systematically differed in the location parameter. Latent variable scale scores of both positive and negative expectancies were associated with drinking behavior cross-sectionally, while only positive expectancies predicted drinking prospectively. Improving the measurement of alcohol expectancies can help researchers better assess this important risk factor for drinking in this population, particularly the identification of those with either very high positive or very low negative alcohol expectancies. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

7.
An item response theory (IRT) approach to test linking based on summed scores is presented and demonstrated by calibrating a modified 23-item version of the Center for Epidemiologic Studies Depression Scale (CES-D) to the standard 20-item CES-D. Data are from the Depression Patient Outcomes Research Team, 11, which used a modified CES-D to measure risk for depression. Responses (N?=?1,120) to items on both the original and modified versions were calibrated simultaneously using F. Samejima's (1969, 1997) graded IRT model. The 2 scales were linked on the basis of derived summed-score-to-IRT-score translation tables. The established cut score of 16 on the standard CES-D corresponded most closely to a summed score of 20 on the modified version. The IRT summed-score approach to test linking is a straightforward, valid, and practical method that can be applied in a variety of situations. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

8.
Statistical analyses of Differential Item Functioning (DIF) can be used for rigorous translation evaluations. DIF techniques test whether each item functions in the same way, irrespective of the country, language, or culture of the respondents. For a given level of health, the score on any item should be independent of nationality. This requirement can be tested through contingency-table methods, which are efficient for analyzing all types of items. We investigated DIF in the Danish translation of the SF-36 Health Survey, using two general population samples (USA, n = 1,506; Denmark, n = 3,950). DIF was identified for 12 out of 35 items. These results agreed with independent ratings of translation quality, but the statistical techniques were more sensitive. When included in scales, the items exhibiting DIF had only a little impact on conclusions about cross-national differences in health in the general population. However, if used as single items, the DIF items could seriously bias results from cross-national comparisons. Also, the DIF items might have larger impact on cross-national comparison of groups with poorer health status. We conclude that analysis of DIF is useful for evaluating questionnaire translations.  相似文献   

9.
The Rutgers Alcohol Problem Index (RAPI; H. R. White & E. W. Labouvie, 1989) is a frequently used measure of alcohol-related consequences in adolescents and college students, but psychometric evaluations of the RAPI are limited and it has not been validated with college students. This study used item response theory (IRT) to examine the RAPI on students (N = 895; 65% female, 35% male) assessed in both high school and college. A series of 2-parameter IRT models were computed, examining differential item functioning across gender and time points. A reduced 18-item measure demonstrating strong clinical utility is proposed, with scores of 8 or greater implying greater need for treatment. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

10.
Person-fit statistics have been proposed to investigate the fit of an item score pattern to an item response theory (IRT) model. The author investigated how these statistics can be used to detect different types of misfit. Intelligence test data were analyzed using person-fit statistics in the context of the G. Rasch (1960) model and R. J. Mokken's (1971, 1997) IRT models. The effect of the choice of an IRT model to detect misfitting item score patterns and the usefulness of person-fit statistics for diagnosis of misfit are discussed. Results showed that different types of person-fit statistics can be used to detect different kinds of person misfit. Parametric person-fit statistics had more power than nonparametric person-fit statistics. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

11.
Measurement invariance is a prerequisite for confident cross-cultural comparisons of personality profiles. Multigroup confirmatory factor analysis was used to detect differential item functioning (DIF) in factor loadings and intercepts for the Revised NEO Personality Inventory (P. T. Costa, Jr., & R. R. McCrae, 1992) in comparisons of college students in the United States (N = 261), Philippines (N = 268), and Mexico (N = 775). About 40%–50% of the items exhibited some form of DIF and item-level noninvariance often carried forward to the facet level at which scores are compared. After excluding DIF items, some facet scales were too short or unreliable for cross-cultural comparisons, and for some other facets, cultural mean differences were reduced or eliminated. The results indicate that considerable caution is warranted in cross-cultural comparisons of personality profiles. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

12.
The Psychopathy Checklist--Revised (PCL-R) is an important measure in both applied and research settings. Evidence for its validity is mostly derived from male Caucasian participants. PCL-R ratings of 359 Caucasian and 356 African American participants were compared using confirmatory factor analysis (CFA) and item response theory (IRT) analyses. Previous research has indicated that 13 items of the PCL-R can be described by a 3-factor hierarchical model. This model was replicated in this sample. No cross-group difference in factor structure could be found using CFA; the structure of psychopathy is the same in both groups. IRT methods indicated significant but small differences in the performance of 5 of the 20 PCL-R items. No significant differential test functioning was found, indicating that the item differences canceled each other out. It is concluded that the PCL-R can be used, in an unbiased way, with African American participants. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

13.
Objective: The purpose of this study was to examine whether neuropsychological tests translated into Spanish measure the same cognitive constructs as the original English versions. Method: Older adult participants (N = 2,664), who did not exhibit dementia from the Washington Heights Inwood Columbia Aging Project (WHICAP), a community-based cohort from northern Manhattan, were evaluated with a comprehensive neuropsychological battery. The study cohort includes both English (n = 1,800) and Spanish speakers (n = 864) evaluated in their language of preference. Invariance analyses were conducted across language groups on a structural equation model comprising four neuropsychological factors (memory, language, visual-spatial ability, and processing speed). Results: The results of the analyses indicated that the four-factor model exhibited partial measurement invariance, demonstrated by invariant factor structure and factor loadings but nonequivalent observed score intercepts. Conclusion: The finding of invariant factor structure and factor loadings provides empirical evidence to support the implicit assumption that scores on neuropsychological tests are measuring equivalent psychological traits across these two language groups. At the structural level, the model exhibited invariant factor variances and covariances. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

14.
Recent legal developments appear to sanction the use of psychometrically unsound procedures for examining differential item functioning (DIF) on standardized tests. More appropriate approaches involve the use of item response theory (IRT). However, many IRT-based DIF studies have used F. M. Lord's (see record 1987-17535-001) joint maximum likelihood procedure, which can lead to incorrect and misleading results. A Monte Carlo simulation was conducted to evaluate the effectiveness of two other methods of parameter estimation: marginal maximum likelihood estimation and Bayes modal estimation. Sample size and data dimensionality were manipulated in the simulation. Results indicated that both estimation methods (a) provided more accurate parameter estimates and less inflated Type I error rates than joint maximum likelihood, (b) were robust to multidimensionality, and (c) produced more accurate parameter estimates and higher rates of identifying DIF with larger samples. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

15.
Developing spelling skills in English is a particularly demanding task for Chinese speakers because, unlike many other bilinguals learning English as a second language, they must learn two languages with different orthography as well as phonology. To disentangle socioeconomic and pedagogical factors from the underlying cognitive–linguistic processes that predict the development of spelling, we used a 6-month longitudinal design and compared children with English as their first language (English-L1; n = 50) and children with Mandarin as their first language (Mandarin-L1; n = 50) from the same kindergarten. Both groups were tested on parallel versions of English and Mandarin tasks as predictors at Time 1, and their spelling sophistication scores were then computed from a 52-item experimental task administered at Time 2. After we controlled for nonverbal IQ, age, vocabulary, and spelling achievement on Wide Range Achievement Test 4 at Time 1, regression analyses showed that phoneme awareness was the strongest predictor of spelling sophistication for English-L1 children, but syllable awareness and letter-sound knowledge were also important for Mandarin-L1 children. The implications of these differences in the cognitive–linguistic processing of bilingual children learning two dissimilar languages are briefly discussed. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

16.
17.
In this study, an item response theory-based differential functioning of items and tests (DFIT) framework (N. S. Raju, W. J. van der Linden, & P. F. Fleer, 1995) was applied to a Likert-type scale. Several differential item functioning (DIF) analyses compared the item characteristics of a 10-item satisfaction scale for Black and White examinees and for female and male examinees. F. M. Lord's (1980) chi-square and the extended signed area (SA) measures were also used. The results showed that the DFIT indices consistently performed in the expected manner. The results from Lord's chi-square and the SA procedures were somewhat varied across comparisons. A discussion of these results along with an illustration of an item with significant DIF and suggestions for future DIF research are presented. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

18.
This article analyzes latent variable models from a cognitive psychology perspective. We start by discussing work by Tuerlinckx and De Boeck (2005), who proved that a diffusion model for 2-choice response processes entails a 2-parameter logistic item response theory (IRT) model for individual differences in the response data. Following this line of reasoning, we discuss the appropriateness of IRT for measuring abilities and bipolar traits, such as pro versus contra attitudes. Surprisingly, if a diffusion model underlies the response processes, IRT models are appropriate for bipolar traits but not for ability tests. A reconsideration of the concept of ability that is appropriate for such situations leads to a new item response model for accuracy and speed based on the idea that ability has a natural zero point. The model implies fundamentally new ways to think about guessing, response speed, and person fit in IRT. We discuss the relation between this model and existing models as well as implications for psychology and psychometrics. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

19.
The authors describe the initial development of the Wagner Assessment Test (WAT), an instrument designed to assess critical thinking, using the 5-faceted view popularized by the Watson-Glaser Critical Thinking Appraisal (WGCTA; G. B. Watson & E. M. Glaser, 1980). The WAT was designed to reduce the degree of successful guessing relative to the WGCTA by increasing the number of response alternatives (i.e., 80% of WGCTA items are 2-alternative, multiple-choice), a change that was hypothesized to result in more desirable test information and standard-error functions. Analyses using the 3-parameter logistic item response theory (IRT) model in a sample of undergraduates (N = 407) supported this prediction, even when the WAT item pool was shortened to match the length of the WGCTA. Convergent validity between full-pool IRT score estimates was r = .69. Implications for subsequent research on IRT-based measurement of critical thinking are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

20.
The main aim of this article is to explicate why a transition to ideal point methods of scale construction is needed to advance the field of personality assessment. The study empirically demonstrated the substantive benefits of ideal point methodology as compared with the dominance framework underlying traditional methods of scale construction. Specifically, using a large, heterogeneous pool of order items, the authors constructed scales using traditional classical test theory, dominance item response theory (IRT), and ideal point IRT methods. The merits of each method were examined in terms of item pool utilization, model-data fit, measurement precision, and construct and criterion-related validity. Results show that adoption of the ideal point approach provided a more flexible platform for creating future personality measures, and this transition did not adversely affect the validity of personality test scores. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号