首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Full-information item bifactor analysis is an important statistical method in psychological and educational measurement. Current methods are limited to single-group analysis and inflexible in the types of item response models supported. We propose a flexible multiple-group item bifactor analysis framework that supports a variety of multidimensional item response theory models for an arbitrary mixing of dichotomous, ordinal, and nominal items. The extended item bifactor model also enables the estimation of latent variable means and variances when data from more than 1 group are present. Generalized user-defined parameter restrictions are permitted within or across groups. We derive an efficient full-information maximum marginal likelihood estimator. Our estimation method achieves substantial computational savings by extending Gibbons and Hedeker's (1992) bifactor dimension reduction method so that the optimization of the marginal log-likelihood requires only 2-dimensional integration regardless of the dimensionality of the latent variables. We use simulation studies to demonstrate the flexibility and accuracy of the proposed methods. We apply the model to study cross-country differences, including differential item functioning, using data from a large international education survey on mathematics literacy. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

2.
Drinking behavior in preadolescence is a significant predictor of both short- and long-term negative consequences. This study examined the psychometric properties of 1 known risk factor for drinking in this age group, alcohol expectancies, within an item response theory framework. In a sample of middle school youths (N = 1,273), the authors tested differential item functioning (DIF) in positive and negative alcohol expectancies across grade, gender, and ethnicity. Multiple-indicator multiple-cause model analyses tested differences in alcohol use as a potential explanation for observed DIF across groups. Results showed that most expectancy items did not exhibit DIF. For items where DIF was indicated, differences in alcohol use did not explain differences in item parameters. Positive and negative expectancies also systematically differed in the location parameter. Latent variable scale scores of both positive and negative expectancies were associated with drinking behavior cross-sectionally, while only positive expectancies predicted drinking prospectively. Improving the measurement of alcohol expectancies can help researchers better assess this important risk factor for drinking in this population, particularly the identification of those with either very high positive or very low negative alcohol expectancies. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

3.
Questionnaire item responses were analyzed by testing the significance of the differences of percents in response categories, using a non-parametric test, and by testing the group differences in mean item scores, based on weights assigned the responses, using a critical ratio test. The two approaches gave equivalent results in almost all cases. Item mean scores correlated with percent-in-category generally .90 or higher. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

4.
Because of the practical, theoretical, and legal implications of differential item functioning (DIF) for organizational assessments, studies of measurement equivalence are a necessary first step before scores can be compared across individuals from different groups. However, commonly recommended criteria for evaluating results from these analyses have several important limitations. The present study proposes an effect size index for confirmatory factor analytic (CFA) studies of measurement equivalence to address 1 of these limitations. The application of this index is illustrated with personality data from American English, Greek, and Chinese samples. Results showed a range of nonequivalence across these samples, and these differences were linked to the observed effects of DIF on the outcomes of the assessment (i.e., group-level mean differences and adverse impact). (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

5.
A psychometric analysis of 2 interview-based measures of cognitive deficits was conducted: the 21-item Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS; Ventura et al., 2008), and the 20-item Schizophrenia Cognition Rating Scale (SCoRS; Keefe et al., 2006), which were administered on 2 occasions to a sample of people with schizophrenia. Traditional psychometrics, bifactor analysis, and item response theory methods were used to explore item functioning and dimensionality and to compare instruments. Despite containing similar item content, responses to the CGI-CogS demonstrated superior psychometric properties (e.g., higher item intercorrelations, better spread of ratings across response categories) relative to the SCoRS. The authors argue that these differences arise mainly from the differential use of prompts and how the items are phrased and scored. Bifactor analysis demonstrated that although both measures capture a broad range of cognitive functioning (e.g., working memory, social cognition), the common variance on each is overwhelmingly explained by a single general factor. Item response theory analyses of the combined pool of 41 items showed that measurement precision is peaked in the mild to moderate range of cognitive impairment. Finally, simulated adaptive testing revealed that only about 10 to 12 items are necessary to achieve latent trait level estimates with reasonably small standard errors for most individuals. This suggests that these interview-based measures of cognitive deficits could be shortened without loss of measurement precision. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

6.
In this article, the authors developed a common strategy for identifying differential item functioning (DIF) items that can be implemented in both the mean and covariance structures method (MACS) and item response theory (IRT). They proposed examining the loadings (discrimination) and the intercept (location) parameters simultaneously using the likelihood ratio test with a free-baseline model and Bonferroni corrected critical p values. They compared the relative efficacy of this approach with alternative implementations for various types and amounts of DIF, sample sizes, numbers of response categories, and amounts of impact (latent mean differences). Results indicated that the proposed strategy was considerably more effective than an alternative approach involving a constrained-baseline model. Both MACS and IRT performed similarly well in the majority of experimental conditions. As expected, MACS performed slightly worse in dichotomous conditions but better than IRT in polytomous cases where sample sizes were small. Also, contrary to popular belief, MACS performed well in conditions where DIF was simulated on item thresholds (item means), and its accuracy was not affected by impact. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

7.
While the popularity of an item is partially a function of its judged social desirability (SD), reliable item preferences also occur which are independent of a general SD variable and which in some cases may have greater predictive power. 4 analyses showed: (1) The proportion of true responses to MMPI items obtained from 10 disparate groups contained 40%-52% common variance on the average across items classed as very desirable, desirable, neutral, undesirable, and very undesirable. (2) With SD controlled, intergroup partial rs were all significant (  相似文献   

8.
Current interest in the assessment of measurement equivalence emphasizes 2 major methods of analysis. The authors offer a comparison of a linear method (confirmatory factor analysis) and a nonlinear method (differential item and test functioning using item response theory) with an emphasis on their methodological similarities and differences. The 2 approaches test for the equality of true scores (or expected raw scores) across 2 populations when the latent (or factor) score is held constant. Both approaches can provide information about when measurrment nonequivalence exists and the extent to which it is a problem. An empirical example is used to illustrate the 2 approaches. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

9.
Pathological personality item responses have been shown to relate to the social desirability scale values of test items. It was hypothesized that both social desirability and pathological item-response frequency might vary as a function of the time permitted to answer test items. Two groups of Ss were administered the items of the Maslow, Birch, Honigman, McGrath, Plason, and Stein Security-Insecurity Inventory. Social desirability scale values for the items were established. Maximal reading time required for each item was also determined, and both groups were permitted to view each item for the same established length of time. 1 group was allowed 2 sec., the other group 10 sec. for each response. It was observed that time pressure reduced the number of pathological item responses, and that items scaled either high or low in social desirability tended to be answered in the socially desirable direction under time pressure. Females generally provided more critical or pathological item responses than did males. (24 ref.) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

10.
Most item response theory models assume conditional independence, and it is known that interactions between items affect the estimated item discrimination. In this article, this effect is further investigated from a theoretical perspective and by means of simulation studies. To this end, a parametric model for item interactions is introduced. Next, it is shown that ignoring a positive interaction results in an overestimation of the discrimination parameter in the two-parameter logistic model (2PLM), whereas ignoring a negative interaction leads to an underestimation of the parameter. Furthermore, it is demonstrated that in some cases the item characteristic curves of the 2PLM and of an item involved in an interaction are quite similar, indicating that the 2PLM can provide a good fit to data with interactions. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

11.
Several hypotheses in family psychology involve comparisons of sociocultural groups. Yet the potential for cross-cultural inequivalence in widely used psychological measurement instruments threatens the validity of inferences about group differences. Methods for dealing with these issues have been developed via the framework of item response theory. These methods deal with an important type of measurement inequivalence, called differential item functioning (DIF). The authors introduce DIF analytic methods, linking them to a well-established framework for conceptualizing cross-cultural measurement equivalence in psychology (C.H. Hui & H.C. Triandis, 1985). They illustrate the use of DIF methods using data from the Project on Human Development in Chicago Neighborhoods (PHDCN). Focusing on the Caregiver Warmth and Environmental Organization scales from the PHDCN's adaptation of the Home Observation for Measurement of the Environment Inventory, the authors obtain results that exemplify the range of outcomes that may result when these methods are applied to psychological measurement instruments. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

12.
The authors conducted a meta-analysis to determine the magnitude of older and younger adults' preferences for emotional stimuli in studies of attention and memory. Analyses involved 1,085 older adults from 37 independent samples and 3,150 younger adults from 86 independent samples. Both age groups exhibited small to medium emotion salience effects (i.e., preference for emotionally valenced stimuli over neutral stimuli) as well as positivity preferences (i.e., preference for positively valenced stimuli over neutral stimuli) and negativity preferences (i.e., preference for negatively valenced stimuli to neutral stimuli). There were few age differences overall. Type of measurement appeared to influence the magnitude of effects; recognition studies indicated significant age effects, where older adults showed smaller effects for emotion salience and negativity preferences than younger adults. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

13.
The authors investigated the recognizability of recently studied word and nonword stimuli in relation to both experimentally controlled prior frequency of occurrence and, for words, normative frequency (assessed by counts of occurrences in printed English). The interaction between these variables was small and nonsignificant across all conditions of 2 experiments. Patterns of recognition measures in relation to controlled prior frequency, but not normative frequency, appeared interpretable in terms of response biases generated by long-term priming. Application of a global memory model and analyses of correlations among item categories yielded evidence for a lexicality dimension underlying normative-frequency effects and an implication that "word-frequency effects" on recognition are better termed lexicality effects. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

14.
M. Duncan & B. B. Murdock (2000) compared precued and postcued item recognition and serial recall showing precued-postcued differences for item recognition but not for serial recall. Precuing and postcuing refer to 2 conditions in which the instructions as to the type of recall test following the presentation of short lists of items is given before or after the list presentation. This methodology was extended here to a paired-associate task. In 2 experiments, short lists of paired associates were presented followed by single-item, old-new, or intact-rearranged pair recognition tests; test type was precued or postcued. A fast or slow presentation rate was used to discourage or encourage mediators. TODAM2 (a theory of distributed associative memory) predicts that there should be little or no cuing differences regardless of whether subjects use mediators to remember the pairs. As predicted the recognition data were essentially identical for the precued and postcued conditions. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

15.
The authors use multiple-sample longitudinal data from different test batteries to examine propositions about changes in constructs over the life span. The data come from 3 classic studies on intellectual abilities in which, in combination, 441 persons were repeatedly measured as many as 16 times over 70 years. They measured cognitive constructs of vocabulary and memory using 8 age-appropriate intelligence test batteries and explore possible linkage of these scales using item response theory (IRT). They simultaneously estimated the parameters of both IRT and latent curve models based on a joint model likelihood approach (i.e., NLMIXED and WINBUGS). They included group differences in the model to examine potential interindividual differences in levels and change. The resulting longitudinal invariant Rasch test analyses lead to a few new methodological suggestions for dealing with repeated constructs based on changing measurements in developmental studies. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

16.
The effects of faking on personality test scores have been studied previously by comparing (a) experimental groups instructed to fake or answer honestly, (b) subgroups created from a single sample of applicants or nonapplicants by using impression management scores, and (c) job applicants and nonapplicants. In this investigation, the latter 2 methods were used to study the effects of faking on the functioning of the items and scales of the Sixteen Personality Factor Questionnaire. A variety of item response theory methods were used to detect differential item/test functioning, interpreted as evidence of faking. The presence of differential item/test functioning across testing situations suggests that faking adversely affects the construct validity of personality scales and that it is problematic to study faking by comparing groups defined by impression management scores. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

17.
Accentuation theory states that the classification of stimuli produces encoding biases. Contrast effects enhance intercategory differences; assimilation effects enhance intracategory similarities. Do these biases affect the retrieval of stimuli distributed across many categories? The calendar superimposes arbitrary intermonth boundaries on day-to-day variations in temperature. In Exp 1, Ss estimated the average temperatures of 48 days. Differences between estimates for 2 days belonging to neighboring months were greater (contrast) and differences between estimates for 2 days belonging to the same month were smaller than actual differences (assimilation). Exp 2 showed that assimilation accounted for all categorization effects. When modified by assumptions from an exemplar model of category learning, accentuation theory accounts for the results. The relevance of these findings for social categorization is discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

18.
Three experiments tested whether the relationship between age differences in temporal and item memory depends on the degree to which the item memory measure relies on memory for context. The authors predicted a stronger relationship of temporal memory to free recall than to recognition memory. Results showed that age differences in temporal memory could be eliminated after controlling for free recall but not recognition memory performance. Under some conditions recognition memory accounted for a significant portion of age-related variance in temporal memory. These results challenge past research that has interpreted age differences in temporal and item memory as independent and suggest that a generalized decline in context memory may underlie reduced performance in older adults on all types of memory tests. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

19.
The Psychopathy Checklist--Revised (PCL-R) is an important measure in both applied and research settings. Evidence for its validity is mostly derived from male Caucasian participants. PCL-R ratings of 359 Caucasian and 356 African American participants were compared using confirmatory factor analysis (CFA) and item response theory (IRT) analyses. Previous research has indicated that 13 items of the PCL-R can be described by a 3-factor hierarchical model. This model was replicated in this sample. No cross-group difference in factor structure could be found using CFA; the structure of psychopathy is the same in both groups. IRT methods indicated significant but small differences in the performance of 5 of the 20 PCL-R items. No significant differential test functioning was found, indicating that the item differences canceled each other out. It is concluded that the PCL-R can be used, in an unbiased way, with African American participants. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

20.
This study used data from 3 sites to examine the invariance and psychometric characteristics of the Brief Symptom Inventory–18 across Black, Hispanic, and White mothers of 5th graders (N = 4,711; M = 38.07 years of age, SD = 7.16). Internal consistencies were satisfactory for all subscale scores of the instrument regardless of ethnic group membership. Mean and covariance structures analysis indicated that the hypothesized 3-factor structure of the instrument was not robust across ethnic groups. It provided a reasonable approximation to the data for Black and White women but not for Hispanic women. Tests for differential item functioning (DIF) were therefore conducted for only Black and White women. Analyses revealed no more than trivial instances of nonuniform DIF but more substantial evidence of uniform DIF for 3 of the 18 items. After having established partial strong factorial invariance of the instrument, latent factor means were found to be significantly higher for Black than for White women on all 3 subscales (somatization, depression, anxiety). In conclusion, the instrument may be used for mean comparisons between Black and White women. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号