期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Diagnosing item score patterns on a test using item response theory-based person-fit statistics.

Meijer Rob R. 《Canadian Metallurgical Quarterly》2003,8(1):72

Person-fit statistics have been proposed to investigate the fit of an item score pattern to an item response theory (IRT) model. The author investigated how these statistics can be used to detect different types of misfit. Intelligence test data were analyzed using person-fit statistics in the context of the G. Rasch (1960) model and R. J. Mokken's (1971, 1997) IRT models. The effect of the choice of an IRT model to detect misfitting item score patterns and the usefulness of person-fit statistics for diagnosis of misfit are discussed. Results showed that different types of person-fit statistics can be used to detect different kinds of person misfit. Parametric person-fit statistics had more power than nonparametric person-fit statistics. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

2.

Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance.

Reise Steven P.; Widaman Keith F.; Pugh Robin H. 《Canadian Metallurgical Quarterly》1993,114(3):552

Investigated the utility of confirmatory factor analysis (CFA) and item response theory (IRT) models for testing the comparability of psychological measurements. Both procedures were used to investigate whether mood ratings collected in Minnesota and China were comparable. Several issues were addressed. The 1st issue was that of establishing a common measurement scale across groups, which involves full or partial measurement invariance of trait indicators. It is shown that using CFA or IRT models, test items that function differentially as trait indicators across groups need not interfere with comparing examinees on the same trait dimension. Second, the issue of model fit was addressed. It is proposed that person-fit statistics be used to judge the practical fit of IRT models. Finally, topics for future research are suggested. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

3.

Assessing the fit of measurement models at the individual level: A comparison of item response theory and covariance structure approaches.

Reise Steven P.; Widaman Keith F. 《Canadian Metallurgical Quarterly》1999,4(1):3

The goal of this study was to explore similarities and differences in person-fit assessment under item response theory (IRT) and covariance structure analysis (CSA) measurement models. The responses of 3,245 individuals who completed 3 personality scales were analyzed under an IRT model and a CSA model. The authors then computed person-fit statistics for individual examinees under both IRT and CSA models. To be specific, for each examinee, the authors computed a standardized person-fit index for the IRT models, called Zl; in addition, an individual's contribution to chi-square, called IND{chi}, was used as a person-fit indicator for CSA models. Findings indicated that these indices are relatively free of confounds with examinee trait level. However, the relationship between Zl, and IND{chi}, values was small, suggesting that the indices identify different examinees as not fitting a model. Implications of the results and directions for future inquiry are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

4.

Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory.

Raju Nambury S.; Laffitte Larry J.; Byrne Barbara M. 《Canadian Metallurgical Quarterly》2002,87(3):517

Current interest in the assessment of measurement equivalence emphasizes 2 major methods of analysis. The authors offer a comparison of a linear method (confirmatory factor analysis) and a nonlinear method (differential item and test functioning using item response theory) with an emphasis on their methodological similarities and differences. The 2 approaches test for the equality of true scores (or expected raw scores) across 2 populations when the latent (or factor) score is held constant. Both approaches can provide information about when measurrment nonequivalence exists and the extent to which it is a problem. An empirical example is used to illustrate the 2 approaches. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

5.

Cluster-branching methodology for adaptive testing and the development of the Adaptive Category Test.

Laatsch Linda; Choca James 《Canadian Metallurgical Quarterly》1994,6(4):345

Adaptive testing involves the adjustment of a set of test items, in accordance with an individual's characteristics, to minimize items that do not yield useful information. The best known methodology used to develop adaptive tests, item response theory (IRT), cannot be used with most psychological instruments. The authors propose using cluster analysis to develop a branching logic that would allow the adaptive administration of such instruments. The proposed methodology is described in detail and is used to develop an adaptive version of the Halstead Category Test (W. Halstead & P. Settlage, 1943) from archival data. Real-data simulations show the Adaptive Category Test to yield scores that are not significantly different from the scores actually obtained on the original version of the test. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

6.

Beyond group-mean differences: The concept of item bias.

Thissen David; Steinberg Lynne; Gerrard Meg 《Canadian Metallurgical Quarterly》1986,99(1):118

Describes item bias analysis in attitude and personality measurement using the techniques of item response theory (IRT). Data from 179 male and 119 female college students on the Mosher Forced-Choice Sex Guilt Inventory illustrate the procedures developed to distinguish true group differences in a psychologically meaningful construct from artifactual differences due to some aspect of the test construction process. This analysis suggests that the sex difference in scores on this inventory reflects the item composition of the measure rather than a true group difference on a global guilt continuum. Recommendations for the application of IRT item analysis are presented. (31 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

7.

Psychometric equivalence of a translation of the Job Descriptive Index into Hebrew.

Hulin Charles L.; Mayer Laura J. 《Canadian Metallurgical Quarterly》1986,71(1):83

相似文献

8.

Does the rose still smell as sweet? Item variability across test forms and revisions.

Knowles Eric S.; Condon Christopher A. 《Canadian Metallurgical Quarterly》2000,12(3):245

This article examines item stability when the same item appears in different contexts. The 1st section considers the assumptions in classical test theory and item response theory concerning the relationship between the item and the trait it is presumed to measure. The 2nd section presents contextualist challenges to the measurement theory assumptions about item properties and shows the instability of item characteristics across different testing contexts. The 3rd section describes methods for checking the relationship between items and traits. Classical test methods, item response methods, and structural equation methods for assessing item stability are reviewed. The instability of item characteristics across contexts should caution researchers to assess, and not assume, that items operate the same way on different test versions. Item instability also indicates the need for a more detailed understanding of the psychological processes that occur between item and answer. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

9.

Generalized linear item response theory.

Mellenbergh Gideon J. 《Canadian Metallurgical Quarterly》1994,115(2):300

Generalized linear item response theory is discussed, which is based on the following assumptions: (1) A distribution of the response occurs according to given item format; (2) the item responses are explained by 1 continuous or nominal latent variable and p latent as well as observed variables that are continuous or nominal; (3) the responses to the different items of a test are independently distributed given the values of the explanatory variables; and (4) a monotone differentiable function g of the expected item response τ is needed such that a linear combination of the explanatory variables is a predictor of g(τ). It is shown that most of the well-known psychometric models are special cases of the generalized theory and that concepts such as differential item functioning, specific objectivity, reliability, and information can be subsumed under the generalized theory. (PsycINFO Database Record (c) 2011 APA, all rights reserved) 相似文献

10.

Using item response theory to study the convergent and discriminant validity of three questionnaires measuring cigarette dependence.

Courvoisier Delphine; Etter Jean-Fran?ois 《Canadian Metallurgical Quarterly》2008,22(3):391

To determine whether the Cigarette Dependence Scale, the Fagerstr?m Test for Nicotine Dependence, and the Nicotine Dependence Syndrome Scale (NDSS) reliably and correctly assessed both weakly and severely dependent individuals, the authors collected data via Internet from 2,435 current smokers, from 2004 to 2007. They used a 2-parameter item response model to determine the difficulty and discrimination of each question and used correlations between latent scores to assess convergent and discriminant validity. The reliability of all scales was close to or exceeded .70. Both the Cigarette Dependence Scale and the Fagerstr?m Test for Nicotine Dependence had 1 misfitting item. Each NDSS scale had at least 2 misfitting items. The information curve of each of the questionnaires peaked between -2 and 2 and was low at both extremes. All questionnaires had adequate reliability and were more informative for a medium level of the underlying cigarette dependence continuum than for both extremes of this continuum. The correlations between latent scores indicated good convergent validity between questionnaires and low discriminant validity between NDSS subscales, except for Tolerance. This result suggests that nicotine dependence may not be composed of 5 dimensions but may be unidimensional and distinct from reduced sensitivity to the effects of smoking (Tolerance). (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

11.

Measurement precision in test score and item response models.

Mellenbergh Gideon J. 《Canadian Metallurgical Quarterly》1996,1(3):293

The population-dependent concept of reliability is used in test score models such as classical test theory and the binomial error model, whereas in item response models, the population-independent concept of information is used. Reliability and information apply to both test score and item response models. Information is a conditional definition of precision, that is, the precision for a given subject; reliability is an unconditional definition, that is, the precision for a population of subjects. Information and reliability do not distinguish test score and item response models. The main distinction is that the parameters are specific for the test and the subject in test score models, whereas in item response models, the item parameters are separated from the subject parameters. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

12.

Response latency detection of fakers on personnel tests.

Holden Ronald R. 《Canadian Metallurgical Quarterly》1995,27(3):343

Examined the ability of personnel test item response latencies to differentiate between individuals instructed to fake and those instructed to respond honestly with 64 undergraduates in Exp 1 and 100 unemployed Ss in Exp 2. Results supported a general model of lying derived from schema theory, demonstrating that fakers take relatively longer than honest respondents to admit to delinquent characteristics concerning themselves. Discriminant function analysis indicated that response latencies to items on standard personnel tests could significantly distinguish between fakers and honest test respondents in a personnel testing scenario. (French abstract) (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

13.

A Piagetian test of general intelligence.

Humphreys Lloyd G.; Rich Susan A.; Davey Timothy C. 《Canadian Metallurgical Quarterly》1985,21(5):872

Investigated the possibility of shortening the Piagetian test by means of classical item analysis methodology. It is shown that from only 13 items, a Piagetian test can be formed that is an excellent measure of general intelligence in its own right but can also add to the information furnished by WISC Verbal and Performance IQs and academic achievement. Reasons for the high correlations reported are discussed, and the results are integrated into a theory of general intelligence. (5 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

14.

Psychopathy and ethnicity: Structural, item, and test generalizability of the Psychopathy Checklist—Revised (PCL-R) in Caucasian and African American participants.

Cooke David J.; Kosson David S.; Michie Christine 《Canadian Metallurgical Quarterly》2001,13(4):531

The Psychopathy Checklist--Revised (PCL-R) is an important measure in both applied and research settings. Evidence for its validity is mostly derived from male Caucasian participants. PCL-R ratings of 359 Caucasian and 356 African American participants were compared using confirmatory factor analysis (CFA) and item response theory (IRT) analyses. Previous research has indicated that 13 items of the PCL-R can be described by a 3-factor hierarchical model. This model was replicated in this sample. No cross-group difference in factor structure could be found using CFA; the structure of psychopathy is the same in both groups. IRT methods indicated significant but small differences in the performance of 5 of the 20 PCL-R items. No significant differential test functioning was found, indicating that the item differences canceled each other out. It is concluded that the PCL-R can be used, in an unbiased way, with African American participants. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

15.

Constructing personality scales under the assumptions of an ideal point response process: Toward increasing the flexibility of personality measures.

Chernyshenko Oleksandr S.; Stark Stephen; Drasgow Fritz; Roberts Brent W. 《Canadian Metallurgical Quarterly》2007,19(1):88

The main aim of this article is to explicate why a transition to ideal point methods of scale construction is needed to advance the field of personality assessment. The study empirically demonstrated the substantive benefits of ideal point methodology as compared with the dominance framework underlying traditional methods of scale construction. Specifically, using a large, heterogeneous pool of order items, the authors constructed scales using traditional classical test theory, dominance item response theory (IRT), and ideal point IRT methods. The merits of each method were examined in terms of item pool utilization, model-data fit, measurement precision, and construct and criterion-related validity. Results show that adoption of the ideal point approach provided a more flexible platform for creating future personality measures, and this transition did not adversely affect the validity of personality test scores. (PsycINFO Database Record (c) 2011 APA, all rights reserved) 相似文献

16.

Relative efficacy of differential response latencies for detecting faking on a self-report measure of psychopathology.

Holden Ronald R.; Kroner Daryl G. 《Canadian Metallurgical Quarterly》1992,4(2):170

Investigated whether differential response latencies for items on a structured self-report test of psychopathology could be used to detect faking in a sample of maximum security prison inmates. Test item response times were statistically adjusted to reflect item latencies in relation both to the person and to the item; discriminant function analysis indicated that such times could significantly differentiate among standard responding, faking good responses, and faking bad responses. Furthermore, classification hit rates with differential response latencies compared favorably with those rates found with more traditional response dissimulation scales. Theoretical and clinical implications are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

17.

Vector analysis and process combinations in motion perception: A reply to Wallach, Becklen, and Nitzberg (1985).

Johansson Gunnar 《Canadian Metallurgical Quarterly》1985,11(3):367

Comments on a proposed alternative to the present author's theory of perceptual vector analysis in motion perception, which was advanced by H. Wallach et al (see record 1986-00251-001). It is argued that Wallach et al have not taken the qualitative difference between their theory and vector analysis into account and that they employed an early and outdated vector theory in their study. Therefore, the conclusion by Wallach et al that their theory is superior to vector analysis is unwarranted. (23 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

18.

Deconstructing therapy outcome measurement with rasch analysis of a measure of general clinical distress: The Symptom Checklist-90-Revised.

Elliott Robert; Fox Christine M.; Beltyukova Svetlana A.; Stone Gregory E.; Gunderson Jennifer; Zhang Xi 《Canadian Metallurgical Quarterly》2006,18(4):359

Rasch analysis was used to illustrate the usefulness of item-level analyses for evaluating a common therapy outcome measure of general clinical distress, the Symptom Checklist-90-Revised (SCL-90-R; Derogatis, 1994). Using complementary therapy research samples, the instrument's 5-point rating scale was found to exceed clients' ability to make reliable discriminations and could be improved by collapsing it into a 3-point version (combining scale points 1 with 2 and 3 with 4). This revision, in addition to removing 3 misfitting items, increased person separation from 4.90 to 5.07 and item separation from 7.76 to 8.52 (resulting in alphas of .96 and .99, respectively). Some SCL-90-R subscales had low internal consistency reliabilities; SCL-90-R items can be used to define one factor of general clinical distress that is generally stable across both samples, with two small residual factors. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

19.

Social desirability, item-response time, and item significance.

Sutherland Beverly V. 《Canadian Metallurgical Quarterly》1964,28(5):447

Pathological personality item responses have been shown to relate to the social desirability scale values of test items. It was hypothesized that both social desirability and pathological item-response frequency might vary as a function of the time permitted to answer test items. Two groups of Ss were administered the items of the Maslow, Birch, Honigman, McGrath, Plason, and Stein Security-Insecurity Inventory. Social desirability scale values for the items were established. Maximal reading time required for each item was also determined, and both groups were permitted to view each item for the same established length of time. 1 group was allowed 2 sec., the other group 10 sec. for each response. It was observed that time pressure reduced the number of pathological item responses, and that items scaled either high or low in social desirability tended to be answered in the socially desirable direction under time pressure. Females generally provided more critical or pathological item responses than did males. (24 ref.) (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

20.

Are performance appraisal ratings from different rating sources comparable?

Facteau Jeffrey D.; Craig S. Bartholomew 《Canadian Metallurgical Quarterly》2001,86(2):215

The purpose of this study was to test whether a multisource performance appraisal instrument exhibited measurement invariance across different groups of raters. Multiple-groups confirmatory factor analysis as well as item response theory (IRT) techniques were used to test for invariance of the rating instrument across self, peer, supervisor, and subordinate raters. The results of the confirmatory factor analysis indicated that the rating instrument was invariant across these rater groups. The IRT analysis yielded some evidence of differential item and test functioning, but it was limited to the effects of just 3 items and was trivial in magnitude. Taken together, the results suggest that the rating instrument could be regarded as invariant across the rater groups, thus supporting the practice of directly comparing their ratings. Implications for research and practice are discussed, as well as for understanding the meaning of between-source rating discrepancies. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献