期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Spanish and English Neuropsychological Assessment Scales (SENAS): Further Development and Psychometric Characteristics.

Mungas Dan; Reed Bruce R.; Crane Paul K.; Haan Mary N.; González Hector 《Canadian Metallurgical Quarterly》2004,16(4):347

The Spanish and English Neuropsychological Assessment Scales were devised to be a broad set of psychometrically matched measures with equivalent Spanish and English versions. Study 1 in this report used item response theory methods to refine scales. Results strongly supported psychometric matching across English and Spanish versions and, for most scales, within English and Spanish versions. Study 2 supported in both English and Spanish subsamples the 6-domain model of ability that guided scale construction. Study 3 examined differential item functioning (DIF) of one scale (Object Naming) in relation to education, ethnicity, gender, and age. Effects of DIF on scale-level ability scores were limited. Results demonstrate an empirically guided psychometric approach to test construction for multiethnic and multilingual test applications. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

2.

Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy.

Stark Stephen; Chernyshenko Oleksandr S.; Drasgow Fritz 《Canadian Metallurgical Quarterly》2006,91(6):1292

In this article, the authors developed a common strategy for identifying differential item functioning (DIF) items that can be implemented in both the mean and covariance structures method (MACS) and item response theory (IRT). They proposed examining the loadings (discrimination) and the intercept (location) parameters simultaneously using the likelihood ratio test with a free-baseline model and Bonferroni corrected critical p values. They compared the relative efficacy of this approach with alternative implementations for various types and amounts of DIF, sample sizes, numbers of response categories, and amounts of impact (latent mean differences). Results indicated that the proposed strategy was considerably more effective than an alternative approach involving a constrained-baseline model. Both MACS and IRT performed similarly well in the majority of experimental conditions. As expected, MACS performed slightly worse in dichotomous conditions but better than IRT in polytomous cases where sample sizes were small. Also, contrary to popular belief, MACS performed well in conditions where DIF was simulated on item thresholds (item means), and its accuracy was not affected by impact. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

3.

Effect size indices for analyses of measurement equivalence: Understanding the practical importance of differences between groups.

Nye Christopher D.; Drasgow Fritz 《Canadian Metallurgical Quarterly》2011,96(5):966

Because of the practical, theoretical, and legal implications of differential item functioning (DIF) for organizational assessments, studies of measurement equivalence are a necessary first step before scores can be compared across individuals from different groups. However, commonly recommended criteria for evaluating results from these analyses have several important limitations. The present study proposes an effect size index for confirmatory factor analytic (CFA) studies of measurement equivalence to address 1 of these limitations. The application of this index is illustrated with personality data from American English, Greek, and Chinese samples. Results showed a range of nonequivalence across these samples, and these differences were linked to the observed effects of DIF on the outcomes of the assessment (i.e., group-level mean differences and adverse impact). (PsycINFO Database Record (c) 2011 APA, all rights reserved) 相似文献

4.

Differential item functioning: Implications for test translations.

Ellis Barbara B. 《Canadian Metallurgical Quarterly》1989,74(6):912

Statistical methods based on item response theory (IRT) were used to bidirectionally evaluate the measurement equivalence of translated American and German intelligence tests. Items that displayed differential item functioning (DIF) were identified, and content analysis was used to determine probable sources, of DIF, either cultural or linguistic. The benefits of using an IRT analysis in examining the fidelity of translated tests are described. In addition, the influence of cultural differences on test translations and the use of DIF items to elucidate cultural differences are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

5.

Differential item functioning in the Mini-Mental State Examination in English- and Spanish-speaking older adults.

Marshall Sarah C.; Mungas Dan; Weldon Minda; Reed Bruce; Haan Mary 《Canadian Metallurgical Quarterly》1997,12(4):718

The purpose of this study was to determine if the Mini-Mental State Examination (MMS; M. E. Folstein, S. E. Folstein, & R. R. McHugh, 1975) demonstrates item bias with respect to measuring cognitive functioning of older Hispanics and non-Hispanics. Assessment of differential item functioning (DIF) of individual MMS items across 3 language/ethnicity groups (English test administration/non-Hispanic ethnicity, English test administration/Hispanic ethnicity, and Spanish test administration/Hispanic ethnicity) was performed by using a logistic regression procedure. Fifteen of the 26 MMS items were significantly related to total score and were shown to provide unbiased measurement across the 3 groups. Normative data are presented for older Hispanics (n?=?365) and non-Hispanics (n?=?388) on the raw MMS, a 15-item version in which items with significant DIF were eliminated, and a total score statistically adjusted for effects of education and age. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

6.

Analysis of item response and differential item functioning of alcohol expectancies in middle school youths.

McCarthy Denis M.; Pedersen Sarah L.; D’Amico Elizabeth J. 《Canadian Metallurgical Quarterly》2009,21(3):444

Drinking behavior in preadolescence is a significant predictor of both short- and long-term negative consequences. This study examined the psychometric properties of 1 known risk factor for drinking in this age group, alcohol expectancies, within an item response theory framework. In a sample of middle school youths (N = 1,273), the authors tested differential item functioning (DIF) in positive and negative alcohol expectancies across grade, gender, and ethnicity. Multiple-indicator multiple-cause model analyses tested differences in alcohol use as a potential explanation for observed DIF across groups. Results showed that most expectancy items did not exhibit DIF. For items where DIF was indicated, differences in alcohol use did not explain differences in item parameters. Positive and negative expectancies also systematically differed in the location parameter. Latent variable scale scores of both positive and negative expectancies were associated with drinking behavior cross-sectionally, while only positive expectancies predicted drinking prospectively. Improving the measurement of alcohol expectancies can help researchers better assess this important risk factor for drinking in this population, particularly the identification of those with either very high positive or very low negative alcohol expectancies. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

7.

Summed-score linking using item response theory: Application to depression measurement.

Orlando Maria; Sherbourne Cathy D.; Thissen David 《Canadian Metallurgical Quarterly》2000,12(3):354

An item response theory (IRT) approach to test linking based on summed scores is presented and demonstrated by calibrating a modified 23-item version of the Center for Epidemiologic Studies Depression Scale (CES-D) to the standard 20-item CES-D. Data are from the Depression Patient Outcomes Research Team, 11, which used a modified CES-D to measure risk for depression. Responses (N?=?1,120) to items on both the original and modified versions were calibrated simultaneously using F. Samejima's (1969, 1997) graded IRT model. The 2 scales were linked on the basis of derived summed-score-to-IRT-score translation tables. The established cut score of 16 on the standard CES-D corresponded most closely to a summed score of 20 on the modified version. The IRT summed-score approach to test linking is a straightforward, valid, and practical method that can be applied in a variety of situations. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

8.

Differential item functioning in the Danish translation of the SF-36

JB Bjorner S Kreiner JE Ware MT Damsgaard P Bech 《Canadian Metallurgical Quarterly》1998,51(11):1189-1202

Statistical analyses of Differential Item Functioning (DIF) can be used for rigorous translation evaluations. DIF techniques test whether each item functions in the same way, irrespective of the country, language, or culture of the respondents. For a given level of health, the score on any item should be independent of nationality. This requirement can be tested through contingency-table methods, which are efficient for analyzing all types of items. We investigated DIF in the Danish translation of the SF-36 Health Survey, using two general population samples (USA, n = 1,506; Denmark, n = 3,950). DIF was identified for 12 out of 35 items. These results agreed with independent ratings of translation quality, but the statistical techniques were more sensitive. When included in scales, the items exhibiting DIF had only a little impact on conclusions about cross-national differences in health in the general population. However, if used as single items, the DIF items could seriously bias results from cross-national comparisons. Also, the DIF items might have larger impact on cross-national comparison of groups with poorer health status. We conclude that analysis of DIF is useful for evaluating questionnaire translations. 相似文献

9.

Measurement of alcohol-related consequences among high school and college students: Application of item response models to the Rutgers Alcohol Problem Index.

Neal Dan J.; Corbin William R.; Fromme Kim 《Canadian Metallurgical Quarterly》2006,18(4):402

The Rutgers Alcohol Problem Index (RAPI; H. R. White & E. W. Labouvie, 1989) is a frequently used measure of alcohol-related consequences in adolescents and college students, but psychometric evaluations of the RAPI are limited and it has not been validated with college students. This study used item response theory (IRT) to examine the RAPI on students (N = 895; 65% female, 35% male) assessed in both high school and college. A series of 2-parameter IRT models were computed, examining differential item functioning across gender and time points. A reduced 18-item measure demonstrating strong clinical utility is proposed, with scores of 8 or greater implying greater need for treatment. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

10.

Diagnosing item score patterns on a test using item response theory-based person-fit statistics.

Meijer Rob R. 《Canadian Metallurgical Quarterly》2003,8(1):72

Person-fit statistics have been proposed to investigate the fit of an item score pattern to an item response theory (IRT) model. The author investigated how these statistics can be used to detect different types of misfit. Intelligence test data were analyzed using person-fit statistics in the context of the G. Rasch (1960) model and R. J. Mokken's (1971, 1997) IRT models. The effect of the choice of an IRT model to detect misfitting item score patterns and the usefulness of person-fit statistics for diagnosis of misfit are discussed. Results showed that different types of person-fit statistics can be used to detect different kinds of person misfit. Parametric person-fit statistics had more power than nonparametric person-fit statistics. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

11.

Are cross-cultural comparisons of personality profiles meaningful? Differential item and facet functioning in the Revised NEO Personality Inventory.

Church A. Timothy; Alvarez Juan M.; Mai Nhu T. Q.; French Brian F.; Katigbak Marcia S.; Ortiz Fernando A. 《Canadian Metallurgical Quarterly》2011,101(5):1068

Measurement invariance is a prerequisite for confident cross-cultural comparisons of personality profiles. Multigroup confirmatory factor analysis was used to detect differential item functioning (DIF) in factor loadings and intercepts for the Revised NEO Personality Inventory (P. T. Costa, Jr., & R. R. McCrae, 1992) in comparisons of college students in the United States (N = 261), Philippines (N = 268), and Mexico (N = 775). About 40%–50% of the items exhibited some form of DIF and item-level noninvariance often carried forward to the facet level at which scores are compared. After excluding DIF items, some facet scales were too short or unreliable for cross-cultural comparisons, and for some other facets, cultural mean differences were reduced or eliminated. The results indicate that considerable caution is warranted in cross-cultural comparisons of personality profiles. (PsycINFO Database Record (c) 2011 APA, all rights reserved) 相似文献

12.

Psychopathy and ethnicity: Structural, item, and test generalizability of the Psychopathy Checklist—Revised (PCL-R) in Caucasian and African American participants.

Cooke David J.; Kosson David S.; Michie Christine 《Canadian Metallurgical Quarterly》2001,13(4):531

The Psychopathy Checklist--Revised (PCL-R) is an important measure in both applied and research settings. Evidence for its validity is mostly derived from male Caucasian participants. PCL-R ratings of 359 Caucasian and 356 African American participants were compared using confirmatory factor analysis (CFA) and item response theory (IRT) analyses. Previous research has indicated that 13 items of the PCL-R can be described by a 3-factor hierarchical model. This model was replicated in this sample. No cross-group difference in factor structure could be found using CFA; the structure of psychopathy is the same in both groups. IRT methods indicated significant but small differences in the performance of 5 of the 20 PCL-R items. No significant differential test functioning was found, indicating that the item differences canceled each other out. It is concluded that the PCL-R can be used, in an unbiased way, with African American participants. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

13.

Do neuropsychological tests have the same meaning in Spanish speakers as they do in English speakers?

Siedlecki Karen L.; Manly Jennifer J.; Brickman Adam M.; Schupf Nicole; Tang Ming-Xin; Stern Yaakov 《Canadian Metallurgical Quarterly》2010,24(3):402

Objective: The purpose of this study was to examine whether neuropsychological tests translated into Spanish measure the same cognitive constructs as the original English versions. Method: Older adult participants (N = 2,664), who did not exhibit dementia from the Washington Heights Inwood Columbia Aging Project (WHICAP), a community-based cohort from northern Manhattan, were evaluated with a comprehensive neuropsychological battery. The study cohort includes both English (n = 1,800) and Spanish speakers (n = 864) evaluated in their language of preference. Invariance analyses were conducted across language groups on a structural equation model comprising four neuropsychological factors (memory, language, visual-spatial ability, and processing speed). Results: The results of the analyses indicated that the four-factor model exhibited partial measurement invariance, demonstrated by invariant factor structure and factor loadings but nonequivalent observed score intercepts. Conclusion: The finding of invariant factor structure and factor loadings provides empirical evidence to support the implicit assumption that scores on neuropsychological tests are measuring equivalent psychological traits across these two language groups. At the structural level, the model exhibited invariant factor variances and covariances. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

14.

Evaluation of two methods for estimating item response theory parameters when assessing differential item functioning.

Lim Rodney G.; Drasgow Fritz 《Canadian Metallurgical Quarterly》1990,75(2):164

Recent legal developments appear to sanction the use of psychometrically unsound procedures for examining differential item functioning (DIF) on standardized tests. More appropriate approaches involve the use of item response theory (IRT). However, many IRT-based DIF studies have used F. M. Lord's (see record 1987-17535-001) joint maximum likelihood procedure, which can lead to incorrect and misleading results. A Monte Carlo simulation was conducted to evaluate the effectiveness of two other methods of parameter estimation: marginal maximum likelihood estimation and Bayes modal estimation. Sample size and data dimensionality were manipulated in the simulation. Results indicated that both estimation methods (a) provided more accurate parameter estimates and less inflated Type I error rates than joint maximum likelihood, (b) were robust to multidimensionality, and (c) produced more accurate parameter estimates and higher rates of identifying DIF with larger samples. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

15.

Cognitive–linguistic foundations of early spelling development in bilinguals.

Yeong Stephanie H. M.; Rickard Liow Susan J. 《Canadian Metallurgical Quarterly》2011,103(2):470

Developing spelling skills in English is a particularly demanding task for Chinese speakers because, unlike many other bilinguals learning English as a second language, they must learn two languages with different orthography as well as phonology. To disentangle socioeconomic and pedagogical factors from the underlying cognitive–linguistic processes that predict the development of spelling, we used a 6-month longitudinal design and compared children with English as their first language (English-L1; n = 50) and children with Mandarin as their first language (Mandarin-L1; n = 50) from the same kindergarten. Both groups were tested on parallel versions of English and Mandarin tasks as predictors at Time 1, and their spelling sophistication scores were then computed from a 52-item experimental task administered at Time 2. After we controlled for nonverbal IQ, age, vocabulary, and spelling achievement on Wide Range Achievement Test 4 at Time 1, regression analyses showed that phoneme awareness was the strongest predictor of spelling sophistication for English-L1 children, but syllable awareness and letter-sound knowledge were also important for Mandarin-L1 children. The implications of these differences in the cognitive–linguistic processing of bilingual children learning two dissimilar languages are briefly discussed. (PsycINFO Database Record (c) 2011 APA, all rights reserved) 相似文献

16.

Does computerizing paper-and-pencil job attitude scales make a difference? New IRT analyses offer insight.

Donovan Michelle A.; Drasgow Fritz; Probst Tahira M. 《Canadian Metallurgical Quarterly》2000,85(2):305

相似文献

17.

Assessing differential functioning in a satisfaction scale.

Collins William C.; Raju Nambury S.; Edwards Jack E. 《Canadian Metallurgical Quarterly》2000,85(3):451

In this study, an item response theory-based differential functioning of items and tests (DFIT) framework (N. S. Raju, W. J. van der Linden, & P. F. Fleer, 1995) was applied to a Likert-type scale. Several differential item functioning (DIF) analyses compared the item characteristics of a 10-item satisfaction scale for Black and White examinees and for female and male examinees. F. M. Lord's (1980) chi-square and the extended signed area (SA) measures were also used. The results showed that the DFIT indices consistently performed in the expected manner. The results from Lord's chi-square and the SA procedures were somewhat varied across comparisons. A discussion of these results along with an illustration of an item with significant DIF and suggestions for future DIF research are presented. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

18.

Cognitive psychology meets psychometric theory: On the relation between process models for decision making and latent variable models for individual differences.

van der Maas Han L. J.; Molenaar Dylan; Maris Gunter; Kievit Rogier A.; Borsboom Denny 《Canadian Metallurgical Quarterly》2011,118(2):339

This article analyzes latent variable models from a cognitive psychology perspective. We start by discussing work by Tuerlinckx and De Boeck (2005), who proved that a diffusion model for 2-choice response processes entails a 2-parameter logistic item response theory (IRT) model for individual differences in the response data. Following this line of reasoning, we discuss the appropriateness of IRT for measuring abilities and bipolar traits, such as pro versus contra attitudes. Surprisingly, if a diffusion model underlies the response processes, IRT models are appropriate for bipolar traits but not for ability tests. A reconsideration of the concept of ability that is appropriate for such situations leads to a new item response model for accuracy and speed based on the idea that ability has a natural zero point. The model implies fundamentally new ways to think about guessing, response speed, and person fit in IRT. We discuss the relation between this model and existing models as well as implications for psychology and psychometrics. (PsycINFO Database Record (c) 2011 APA, all rights reserved) 相似文献

19.

Development of a new critical thinking test using item response theory.

Wagner Teresa A.; Harvey Robert J. 《Canadian Metallurgical Quarterly》2006,18(1):100

The authors describe the initial development of the Wagner Assessment Test (WAT), an instrument designed to assess critical thinking, using the 5-faceted view popularized by the Watson-Glaser Critical Thinking Appraisal (WGCTA; G. B. Watson & E. M. Glaser, 1980). The WAT was designed to reduce the degree of successful guessing relative to the WGCTA by increasing the number of response alternatives (i.e., 80% of WGCTA items are 2-alternative, multiple-choice), a change that was hypothesized to result in more desirable test information and standard-error functions. Analyses using the 3-parameter logistic item response theory (IRT) model in a sample of undergraduates (N = 407) supported this prediction, even when the WAT item pool was shortened to match the length of the WGCTA. Convergent validity between full-pool IRT score estimates was r = .69. Implications for subsequent research on IRT-based measurement of critical thinking are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

20.

Constructing personality scales under the assumptions of an ideal point response process: Toward increasing the flexibility of personality measures.

Chernyshenko Oleksandr S.; Stark Stephen; Drasgow Fritz; Roberts Brent W. 《Canadian Metallurgical Quarterly》2007,19(1):88

The main aim of this article is to explicate why a transition to ideal point methods of scale construction is needed to advance the field of personality assessment. The study empirically demonstrated the substantive benefits of ideal point methodology as compared with the dominance framework underlying traditional methods of scale construction. Specifically, using a large, heterogeneous pool of order items, the authors constructed scales using traditional classical test theory, dominance item response theory (IRT), and ideal point IRT methods. The merits of each method were examined in terms of item pool utilization, model-data fit, measurement precision, and construct and criterion-related validity. Results show that adoption of the ideal point approach provided a more flexible platform for creating future personality measures, and this transition did not adversely affect the validity of personality test scores. (PsycINFO Database Record (c) 2011 APA, all rights reserved) 相似文献