首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The Rasch family of models displays several well-documented properties that distinguish them from the general item response theory (IRT) family of measurement models. This paper describes an additional unique property of Rasch models, referred to as the property of item information constancy. This property asserts that the area under the information function for Rasch models is always equal to the number of response categories minus one, regardless of the values of the item location parameters. The implication of the property of item information constancy is that, for a given number of response categories, all items following a Rasch model contribute equally to the height of the test information function across the entire latent continuum.  相似文献   

2.
The Rating Scale Model (RSM) and the Partial Credit Model (PCM) are fairly well-known examples of Rasch models for polytomously scored items. In addition to a number of threshold parameters, both the models contain two scalar parameters characterizing item and person location on a common interval-level scale. The rank order of items and persons defined by the Likert summative scores (i.e. the raw total scores) is compared with that obtained from the Rasch-based measures (i.e. the maximum likelihood estimates of person and item parameters). It is proved that: 1) the property of comonotonicity between Likert summative scores and Rasch-based measures holds for both the person and item parameters of the RSM; 2) the property of comonotonicity between Likert summative scores and Rasch-based measures holds for the PCM only with reference to the person parameters; 3) violations of comonotonicity are possible, for particular datasets, for the item parameters of the PCM.  相似文献   

3.
In the Rasch model for items with more than two ordered response categories, the thresholds that define the successive categories are an integral part of the structure of each item in that the probability of the response in any category is a function of all thresholds, not just the thresholds between any two categories. This paper describes a method of estimation for the Rasch model that takes advantage of this structure. In particular, instead of estimating the thresholds directly, it estimates the principal components of the thresholds, from which threshold estimates are then recovered. The principal components are estimated using a pairwise maximum likelihood algorithm which specialises to the well known algorithm for dichotomous items. The method of estimation has three advantageous properties. First, by considering items in all possible pairs, sufficiency in the Rasch model is exploited with the person parameter conditioned out in estimating the item parameters, and by analogy to the pairwise algorithm for dichotomous items, the estimates appear to be consistent, though unlike for the dichotomous case, no formal proof has yet been provided. Second, the estimates of each item parameter is a function of frequencies in all categories of the item rather than just a function of frequencies of two adjacent categories. This stabilizes estimates in the presence of low frequency data. Third, the procedure accounts readily for missing data. All of these properties are important when the model is used for constructing variables from large scale data sets which must account for structurally missing data. A simulation study shows that the quality of the estimates is excellent.  相似文献   

4.
This article contains information on the Rasch measurement partial credit model: what it is, how it differs from other Rasch models, when to use it, and how to use it. The calibration of instruments with increasingly complex items is described, starting with dichotomous items and moving on to polychotomous items using a single rating scale, and mixed polychotomous items using multiple rating scales, and instruments in which each item has its own rating scale. It also introduces a procedure for aligning rating scale categories to be used when more than one rating scale is used in a single instrument. Pivot anchoring is defined and an illustration of its use with the mental health scale of the SF-36 that contains positive and negative worded items is provided. It finally describes the effect of pivot anchoring on step calibrations, the item hierarchy, and person measures.  相似文献   

5.
The invariance of the estimated parameters across variation in the incidental parameters of a sample is one of the most important properties of Rasch measurement models. This is the property that allows the equating of test forms and the use of computer adaptive testing. It necessarily follows that in Rasch models if the data fit the model, than the estimation of the parameter of interest must be invariant across sub-samples of the items or persons. This study investigates the degree to which the INFIT and OUTFIT item fit statistics in WINSTEPS detect violations of the invariance property of Rasch measurement models. The test in this study is a 80 item multiple-choice test used to assess mathematics competency. The WINSTEPS analysis of the dichotomous results, based on a sample of 2000 from a very large number of students who took the exam, indicated that only 7 of the 80 items misfit using the 1.3 mean square criteria advocated by Linacre and Wright. Subsequent calibration of separate samples of 1,000 students from the upper and lower third of the person raw score distribution, followed by a t-test comparison of the item calibrations, indicated that the item difficulties for 60 of the 80 items were more than 2 standard errors apart. The separate calibration t-values ranged from +21.00 to -7.00 with the t-test value of 41 of the 80 comparisons either larger than +5 or smaller than -5. Clearly these data do not exhibit the invariance of the item parameters expected if the data fit the model. Yet the INFIT and OUTFIT mean squares are completely insensitive to the lack of invariance in the item parameters. If the OUTFIT ZSTD from WINSTEPS was used with a critical value of | t | > 2.0, then 56 of the 60 items identified by the separate calibration t-test would be identified as misfitting. A fourth measure of misfit, the between ability-group item fit statistic identified 69 items as misfitting when a critical value of t > 2.0 was used. Clearly relying solely on the INFIT and OUTFIT mean squares in WINSETPS to assess the fit of the data to the model would cause one to miss one of the most important threats to the usefulness of the measurement model.  相似文献   

6.
This paper describes the development and validation of a democratic learning style scale intended to fill a gap in Sternberg's theory of mental self-government and the associated learning style inventory (Sternberg, 1988, 1997). The scale was constructed as an 8-item scale with a 7-category response scale. The scale was developed following an adapted version of DeVellis' (2003) guidelines for scale development. The validity of the Democratic Learning Style Scale was assessed by items analysis using graphical loglinear Rasch models (Kreiner and Christensen, 2002, 2004, 2006) The item analysis confirmed that the full 8-item revised Democratic Learning Style Scale fitted a graphical loglinear Rasch model with no differential item functioning but weak to moderate uniform local dependence between two items. In addition, a reduced 6-item version of the scale fitted the pure Rasch model with a rating scale parameterization. The revised Democratic Learning Style Scale can therefore be regarded as a sound measurement scale meeting requirements of both construct validity and objectivity.  相似文献   

7.
This paper examines the impact of differential item functioning (DIF), missing item values, and different methods for handling missing item values on theta estimates with data simulated from the partial credit model and Andrich's rating scale model. Both Rasch family models are commonly used when obtaining an estimate of a respondent's attitude. The degree of missing data, DIF magnitude, and the percentage of DIF items were varied in MCAR data conditions in which the focal group was 10% of the total population. Four methods for handling missing data were compared: complete-case analysis, mean substitution, hot-decking, and multiple imputation. Bias, RMSE, means, and standard errors of the theta estimates for the focal group were adversely affected by the amount and magnitude of DIF items. RMSE and fidelity coefficients for both the reference and focal group were adversely impacted by the amount of missing data. While all methods of handling missing data performed fairly similarly, multiple imputation and hot-decking showed slightly better performance.  相似文献   

8.
The Standardized Letter of Recommendation (SLR), a 28-item form, was created by ETS to supplement the qualitative rating of graduate school applicants' nonacademic qualities with a quantitative approach. The purpose of this study was to evaluate the following psychometric properties of the SLR using the Rasch rating scale model: dimensionality, reliability, item quality, and rating category effectiveness. Principal component and factor analyses were also conducted to examine the dimensionality of the SLR. Results revealed (a) two secondary factors underlay the data, along with a strong higher order factor, (b) item and person separation reliabilities were high, (c) noncognitive items tended to elicit higher endorsements than did cognitive items, and (d) a 5-point Likert scale functioned effectively. The psychometric properties of the SLR support the use of a composite score when reporting SLR scores and the utility of the SLR in higher education and in admissions.  相似文献   

9.
In the present paper, the Rasch measurement model is used in the validation and analysis of data coming from the satisfaction section of the first national survey concerning the social services sector carried out in Italy. A comparison between two Rasch models for polytomous data, that is the Rating Scale Model and the Partial Credit Model, is discussed. Given that the two models provide similar estimates of the item difficulties and workers satisfaction, for almost all the items the response probabilities computed using the RSM and the PCM are very close and the analysis of the bootstrap confidence intervals shows that the estimates obtained applying the RSM are more stable than the ones obtained using the PCM, it can be conclude that, for the present data, the RSM is more appropriate than the PCM.  相似文献   

10.
The purpose of this research was to use Rasch measurement to study the psychometric properties of data obtained from a newly developed Diabetes Questionnaire designed to measure diabetes knowledge, attitudes, and self-care. Specifically, a methodology using principles of Rasch measurement for investigating the cross-form equivalence of English and Spanish versions of the Diabetes Questionnaire was employed. A total of fifty diabetes patients responded to the questionnaire, with 26 participants completing the English version. Analyses detected problems with the attitude items. We attributed the scaling problems to the use of negatively worded items with participants having generally low educational backgrounds. Analysis of the knowledge and self-care items yielded unidimensional variables with clinically meaningful item hierarchies that may have relevance to treatment protocols. Furthermore, the knowledge and the self-care items from the two versions of the Diabetes Questionnaire met our criteria for establishing cross-form equivalence and thus allow quantitative comparisons of person measures across versions. Limitations of the study and suggested refinements of the Diabetes Questionnaire are discussed.  相似文献   

11.
Teachers' knowledge is usually categorised into subject matter (SMK) and pedagogical content knowledge (PCK). Previously, measurement instruments and consequent cognitive scales have been developed to assess students' and teachers' subject knowledge. A number of qualitative studies have explored teachers' pedagogical content knowledge. This study developed a means to investigate one aspect of PCK--teachers' awareness of their students' knowledge--using a combination of measurement and qualitative interpretation. We asked teachers to estimate on a Likert scale (and also describe qualitatively) the difficulty their pupils would have with test items which we had already scaled using data from their pupils. We then constructed, using various models, a "Teacher's collective Perception of Item Difficulty" (TPID) scale and contrasted this with the student's ability scale by comparing the two sets of item-difficulty parameters. The results were triangulated with qualitative data. We suggest the methodology is best supported by an Inverse Partial Credit Model (IPCM) but we compare the results across alternative Rasch models.  相似文献   

12.
Local independence in the Rasch model can be violated in two generic ways that are generally not distinguished clearly in the literature. In this paper we distinguish between a violation of unidimensionality, which we call trait dependence, and a specific violation of statistical independence, which we call response dependence, both of which violate local independence. Distinct algebraic formulations for trait and response dependence are developed as violations of the dichotomous Rasch model, data are simulated with varying degrees of dependence according to these formulations, and then analysed according to the Rasch model assuming no violations. Relative to the case of no violation it is shown that trait and response dependence result in opposite effects on the unit of scale as manifested in the range and standard deviation of the scale and the standard deviation of person locations. In the case of trait dependence the scale is reduced; in the case of response dependence it is increased. Again, relative to the case of no violation, the two violations also have opposite effects on the person separation index (analogous to Cronbach's alpha reliability index of traditional test theory in value and construction): it decreases for data with trait dependence; it increases for data with response dependence. A standard way of accounting for dependence is to combine the dependent items into a higher-order polytomous item. This typically results in a decreased person separation index index and Cronbach's alpha, compared with analysing items as discrete, independent items. This occurs irrespective of the kind of dependence in the data, and so further contributes to the two violations not being distinguished clearly. In an attempt to begin to distinguish between them statistically this paper articulates the opposite effects of these two violations in the dichotomous Rasch model.  相似文献   

13.
In 2005 PISA published trend indicators that compared the results of PISA 2000 and PISA 2003. In this paper we explore the extent to which the outcomes of these trend analyses are sensitive to the choice of test equating methodologies, the choice of regression models and the choice of linking items. To establish trends PISA equated its 2000 and 2003 tests using a methodology based on Rasch Modelling that involved estimating linear transformations that mapped 2003 Rasch-scaled scores to the previously established PISA 2000 Rasch-scaled scores. In this paper we compare the outcomes of this approach with an alternative, which involves the joint Rasch scaling of the PISA 2000 and PISA 2003 data separately for each country. Note that under this approach the item parameters are estimated separately for each country, whereas the linear transformation approach used a common set of item parameter estimates for all countries. Further, as its primary trend indicators, PISA reported changes in mean scores between 2000 and 2003. These means are not adjusted for changes in the background characteristics of the PISA 2000 and PISA 2003 samples - that is, they are marginal rather than conditional means. The use of conditional rather than marginal means results in some differing conclusions regarding trends at both the country and within-country level.  相似文献   

14.
A major challenge in conducting assessments in ethnically and culturally diverse populations, especially using translated instruments, is the possibility that measures developed for a given construct in one particular group may not be assessing the same construct in other groups. Using a Rasch analysis, this study examined the item equivalence of two psychiatric measures, the Harvard Trauma Questionnaire (HTQ), measuring traumatic experience, and the Hopkins Symptom Checklist (HSCL), assessing depression symptoms across Vietnamese- and Cambodian American mothers, using data from the Cross-Cultural Families (CCF) Project. The majority of items were equivalent across the two groups, particularly on the HTQ. However, some items were endorsed differently by the two groups, and thus are not equivalent, suggesting Cambodian and Vietnamese immigrants may manifest certain aspects of trauma and depression differently. Implications of these similarities and differences for practice and the use of IRT in this arena are discussed.  相似文献   

15.
The aim is to show that it is possible to parameterize discrimination for sets of items, rather than individual items, without destroying conditions for sufficiency in a form of the Rasch model. The form of the model is obtained by formalizing the relationship between discrimination and the unit of a metric. The raw score vector across item sets is the sufficient statistic for the person parameter. Simulation studies are used to show the implementation of conditional estimation solution equations based on the relevant form of the Rasch model. The model also applied to two numeracy tests attempted by a group of common persons in a large-scale testing program. The results show improved fit compared with the Rasch model in its standard form. They also show the units of the scales were more accurately equated. The paper discusses implications for applied measurement using Rasch models and contrasts the approach with the application of the two parameter logistic (2PL) model.  相似文献   

16.
The purpose of this research is twofold. First is to extend the work of Smith (1992, 1996) and Smith and Miao (1991, 1994) in comparing item fit statistics and principal component analysis as tools for assessing the unidimensionality requirement of Rasch models. Second is to demonstrate methods to explore how violations of the unidimensionality requirement influence person measurement. For the first study, rating scale data were simulated to represent varying degrees of multidimensionality and the proportion of items contributing to each component. The second study used responses to a 24 item Attention Deficit Hyperactivity Disorder scale obtained from 317 college undergraduates. The simulation study reveals both an iterative item fit approach and principal component analysis of standardized residuals are effective in detecting items simulated to contribute to multidimensionality. The methods presented in Study 2 demonstrate the potential impact of multidimensionality on norm and criterion-reference person measure interpretations. The results provide researchers with quantitative information to help assist with the qualitative judgment as to whether the impact of multidimensionality is severe enough to warrant removing items from the analysis.  相似文献   

17.
This research describes some of the similarities and differences between additive conjoint measurement (a type of fundamental measurement) and the Rasch model. It seems that there are many similarities between the two frameworks, however, their differences are nontrivial. For instance, while conjoint measurement specifies measurement scales using a data-free, non-numerical axiomatic frame of reference, the Rasch model specifies measurement scales using a numerical frame of reference that is, by definition, data dependent. In order to circumvent difficulties that can be realistically imposed by this data dependence, this research formalizes new non-parametric item response models. These models are probabilistic measurement theory models in the sense that they explicitly integrate the axiomatic ideas of measurement theory with the statistical ideas of order-restricted inference and Markov Chain Monte Carlo. The specifications of these models are rather flexible, as they can represent any one of several models used in psychometrics, such as Mokken's (1971) monotone homogeneity model, Scheiblechner's (1995) isotonic ordinal probabilistic model, or the Rasch (1960) model. The proposed non-parametric item response models are applied to analyze both real and simulated data sets.  相似文献   

18.
Traditionally, women and minorities have not been fully represented in science and engineering. Numerous studies have attributed these differences to gaps in science achievement as measured by various standardized tests. Rather than describe mean group differences in science achievement across multiple cultures, this study focused on an in-depth item-level analysis across two countries: Spain and the United States. This study investigated eighth-grade gender differences on science items across the two countries. A secondary purpose of the study was to explore the nature of gender differences using the many-faceted Rasch Model as a way to estimate gender DIF. A secondary analysis of data from the Third International Mathematics and Science Study (TIMSS) was used to address three questions: 1) Does gender DIF in science achievement exist? 2) Is there a relationship between gender DIF and characteristics of the science items? 3) Do the relationships between item characteristics and gender DIF in science items replicate across countries. Participants included 7,087 eight grade students from the United States and 3,855 students from Spain who participated in TIMSS. The Facets program (Linacre and Wright, 1992) was used to estimate gender DIF. The results of the analysis indicate that the content of the item seemed to be related to gender DIF. The analysis also suggests that there is a relationship between gender DIF and item format. No pattern of gender DIF related to cognitive demand was found. The general pattern of gender DIF was similar across the two countries used in the analysis. The strength of item-level analysis as opposed to group mean difference analysis is that gender differences can be detected at the item level, even when no mean differences can be detected at the group level.  相似文献   

19.
The rating scale model (Andrich, 1978) was applied to data from a survey that directed students to rate their satisfaction with college services on a five point Likert scale. Because students used different services, and students were directed to rate only the services they used, the items were differentially exposed to a person factor that we call "pleasability." Differential exposure to pleasability makes items' average rating a biased measure of their performance. In contrast, item parameter estimates in the rating scale model corrected for differential exposure to pleasability. Compared to items' average ratings, item parameter estimates in the rating scale model did a better job of predicting which item received the higher rating when any two items were rated by the same rater.  相似文献   

20.
The Rasch measurement model using dichotomous scoring of item response data from a newly created Mobility Scale administered to elderly independent living individuals is presented. The dichotomous scoring model, item calibration, person calibration, logit scale, normative scale score, reliability, and validity are explained. Results indicated that additional activity statements need to be written and tested to improve the Mobility Scale instrument.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号