首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This research describes some of the similarities and differences between additive conjoint measurement (a type of fundamental measurement) and the Rasch model. It seems that there are many similarities between the two frameworks, however, their differences are nontrivial. For instance, while conjoint measurement specifies measurement scales using a data-free, non-numerical axiomatic frame of reference, the Rasch model specifies measurement scales using a numerical frame of reference that is, by definition, data dependent. In order to circumvent difficulties that can be realistically imposed by this data dependence, this research formalizes new non-parametric item response models. These models are probabilistic measurement theory models in the sense that they explicitly integrate the axiomatic ideas of measurement theory with the statistical ideas of order-restricted inference and Markov Chain Monte Carlo. The specifications of these models are rather flexible, as they can represent any one of several models used in psychometrics, such as Mokken's (1971) monotone homogeneity model, Scheiblechner's (1995) isotonic ordinal probabilistic model, or the Rasch (1960) model. The proposed non-parametric item response models are applied to analyze both real and simulated data sets.  相似文献   

2.
In test analysis involving the Rasch model, a large degree of importance is placed on the "objective" measurement of individual abilities and item difficulties. The degree to which the objectivity properties are attained, of course, depends on the degree to which the data fit the Rasch model. It is therefore important to utilize fit statistics that accurately and reliably detect the person-item response inconsistencies that threaten the measurement objectivity of persons and items. Given this argument, it is somewhat surprising that there is far more emphasis placed in the objective measurement of person and items than there is in the measurement quality of Rasch fit statistics. This paper provides a critical analysis of the residual fit statistics of the Rasch model, arguably the most often used fit statistics, in an effort to illustrate that the task of Rasch fit analysis is not as simple and straightforward as it appears to be. The faulty statistical properties of the residual fit statistics do not allow either a convenient or a straightforward approach to Rasch fit analysis. For instance, given a residual fit statistic, the use of a single minimum critical value for misfit diagnosis across different testing situations, where the situations vary in sample and test properties, leads to both the overdetection and underdetection of misfit. To improve this situation, it is argued that psychometricians need to implement residual-free Rasch fit statistics that are based on the number of Guttman response errors, or use indices that are statistically optimal in detecting measurement disturbances.  相似文献   

3.
The item parameters of a polytomous Rasch model can be estimated using marginal and conditional approaches. This paper describes how this can be done in SAS (V8.2) for three item parameter estimation procedures: marginal maximum likelihood estimation, conditional maximum likelihood estimation, and pairwise conditional estimation. The use of the procedures for extensions of the Rasch model is also discussed. The accuracy of the methods are evaluated using a simulation study.  相似文献   

4.
The Rasch family of models displays several well-documented properties that distinguish them from the general item response theory (IRT) family of measurement models. This paper describes an additional unique property of Rasch models, referred to as the property of item information constancy. This property asserts that the area under the information function for Rasch models is always equal to the number of response categories minus one, regardless of the values of the item location parameters. The implication of the property of item information constancy is that, for a given number of response categories, all items following a Rasch model contribute equally to the height of the test information function across the entire latent continuum.  相似文献   

5.
This article contains information on the Rasch measurement partial credit model: what it is, how it differs from other Rasch models, when to use it, and how to use it. The calibration of instruments with increasingly complex items is described, starting with dichotomous items and moving on to polychotomous items using a single rating scale, and mixed polychotomous items using multiple rating scales, and instruments in which each item has its own rating scale. It also introduces a procedure for aligning rating scale categories to be used when more than one rating scale is used in a single instrument. Pivot anchoring is defined and an illustration of its use with the mental health scale of the SF-36 that contains positive and negative worded items is provided. It finally describes the effect of pivot anchoring on step calibrations, the item hierarchy, and person measures.  相似文献   

6.
A large number of papers and technical reports are published every year describing researches where Rasch models are used. It has been observed, however, that not all the authors describe the application of the Rasch measurement with the same thoroughness. Some authors may leave behind important bits of information e.g. they may fail to investigate the person or item fit or may even fail to discuss the reliability of measurement. As a result, editorial guidelines have been published in order to suggest an informal minimum of thoroughness with which the authors may describe the application of Rasch measurement in their papers. This study presents stages for the development of a scale to investigate the comprehensiveness with which individual papers describe the application of Rasch models in practical settings. The scale is used to evaluate how comprehensively the papers published by the Journal of Applied Measurement present the application of Rasch models.  相似文献   

7.
The current study investigates the performance of two Rasch measurement programs and their parameter estimations on the linear logistic test model (LLTM; Fischer, 1973). These two programs, LinLog (Whitely & Nieh, 1981) and FACETS (Linacre, 2002), are used to investigate within-item complexity factors in a spatial memory measure tool. LinLog uses conditional maximum likelihood to estimate person and item parameters and is an LLTM specific program. FACETS is usually reserved for the many-facet Rasch model (MFRM; Linacre, 1989), however in the case of specifically designed within-item solution processes, a multifaceted approach makes good sense. It is possible to consider each dimension within the item as a separate facet, just as if there were multiple raters for each item. Simulations of 500 and 1000 persons expand the original data set (114 persons) to better examine each estimation technique. LinLog and FACETS analyses show strikingly similar results in both the simulation and original data conditions, indicating that the FACETS program produces accurate LLTM parameter estimates.  相似文献   

8.
Building on Wright and Masters (1982), several Rasch estimation methods are briefly described, including Marginal Maximum Likelihood Estimation (MMLE) and minimum chi-square methods. General attributes of Rasch estimation algorithms are discussed, including the handling of missing data, precision and accuracy, estimate consistency, bias and symmetry. Reasons for, and the implications of, measure misestimation are explained, including the effect of loose convergence criteria, and failure of Newton-Raphson iteration to converge. Alternative parameterizations of rating scales broaden the scope of Rasch measurement methodology.  相似文献   

9.
An extension to the Rasch model for fundamental measurement is described in which there is parameterization not only for examinee ability and item difficulty but also for judge severity. Variants of this model are discussed and judging plans reviewed. Its use and characteristics are explained by an application of the model to an empirical testing situation. A comparison with Generalizability Theory using a common data set is presented as a contrast in approaches to resolving judge indeterminacy.  相似文献   

10.
There has been some discussion among researchers as to the benefits of using one calibration process over the other during equating. Although literature is rife with the pros and cons of the different methods, hardly any research has been done on anchoring (i.e., fixing item parameters to their pre-determined values on an established scale) as a method that is commonly used by psychometricians in large-scale assessments. This simulation research compares the fixed form of calibration with the concurrent method (where calibration of the different forms on the same scale is accomplished by a single run of the calibration process, treating all non-included items on the forms as missing or not reached), using the dichotomous Rasch (Rasch, 1960) and the Rasch partial credit (Masters, 1982) models, and the WINSTEPS (Linacre, 2003) computer program. Contrary to the belief and some researchers' contention that the concurrent run with larger n-counts for the common items would provide greater accuracy in the estimation of item parameters, the results of this paper indicate that the greater accuracy of one method over the other is confounded by the sample-size, the number of common items, etc., and there is no real benefit in using one method over the other in the calibration and equating of parallel tests forms.  相似文献   

11.
Although post-equating (PE) has proven to be an acceptable method in the scaling and equating of items and forms, there are times when the turn-around period for equating and converting raw scores to scale scores is so small that PE cannot be undertaken within the prescribed time frame. In such cases, pre-equating (PrE) could be considered as an acceptable alternative. Assessing the feasibility of using item calibrations from the item bank (as in PrE) is conditioned on the equivalency of the calibrations and the errors associated with it vis a vis the results obtained via PE. This paper creates item banks over three periods of item introduction into the banks and uses the Rasch model in examining data with respect to the recovery of item parameters, the measurement error, and the effect cut-points have on examinee placement in both the PrE and PE situations. Results indicate that PrE is a viable solution to PE provided the stability of the item calibrations are enhanced by using large sample sizes (perhaps as large as full-population) in populating the item bank.  相似文献   

12.
The invariance of the estimated parameters across variation in the incidental parameters of a sample is one of the most important properties of Rasch measurement models. This is the property that allows the equating of test forms and the use of computer adaptive testing. It necessarily follows that in Rasch models if the data fit the model, than the estimation of the parameter of interest must be invariant across sub-samples of the items or persons. This study investigates the degree to which the INFIT and OUTFIT item fit statistics in WINSTEPS detect violations of the invariance property of Rasch measurement models. The test in this study is a 80 item multiple-choice test used to assess mathematics competency. The WINSTEPS analysis of the dichotomous results, based on a sample of 2000 from a very large number of students who took the exam, indicated that only 7 of the 80 items misfit using the 1.3 mean square criteria advocated by Linacre and Wright. Subsequent calibration of separate samples of 1,000 students from the upper and lower third of the person raw score distribution, followed by a t-test comparison of the item calibrations, indicated that the item difficulties for 60 of the 80 items were more than 2 standard errors apart. The separate calibration t-values ranged from +21.00 to -7.00 with the t-test value of 41 of the 80 comparisons either larger than +5 or smaller than -5. Clearly these data do not exhibit the invariance of the item parameters expected if the data fit the model. Yet the INFIT and OUTFIT mean squares are completely insensitive to the lack of invariance in the item parameters. If the OUTFIT ZSTD from WINSTEPS was used with a critical value of | t | > 2.0, then 56 of the 60 items identified by the separate calibration t-test would be identified as misfitting. A fourth measure of misfit, the between ability-group item fit statistic identified 69 items as misfitting when a critical value of t > 2.0 was used. Clearly relying solely on the INFIT and OUTFIT mean squares in WINSETPS to assess the fit of the data to the model would cause one to miss one of the most important threats to the usefulness of the measurement model.  相似文献   

13.
The aim is to show that it is possible to parameterize discrimination for sets of items, rather than individual items, without destroying conditions for sufficiency in a form of the Rasch model. The form of the model is obtained by formalizing the relationship between discrimination and the unit of a metric. The raw score vector across item sets is the sufficient statistic for the person parameter. Simulation studies are used to show the implementation of conditional estimation solution equations based on the relevant form of the Rasch model. The model also applied to two numeracy tests attempted by a group of common persons in a large-scale testing program. The results show improved fit compared with the Rasch model in its standard form. They also show the units of the scales were more accurately equated. The paper discusses implications for applied measurement using Rasch models and contrasts the approach with the application of the two parameter logistic (2PL) model.  相似文献   

14.
The purpose of this research is twofold. First is to extend the work of Smith (1992, 1996) and Smith and Miao (1991, 1994) in comparing item fit statistics and principal component analysis as tools for assessing the unidimensionality requirement of Rasch models. Second is to demonstrate methods to explore how violations of the unidimensionality requirement influence person measurement. For the first study, rating scale data were simulated to represent varying degrees of multidimensionality and the proportion of items contributing to each component. The second study used responses to a 24 item Attention Deficit Hyperactivity Disorder scale obtained from 317 college undergraduates. The simulation study reveals both an iterative item fit approach and principal component analysis of standardized residuals are effective in detecting items simulated to contribute to multidimensionality. The methods presented in Study 2 demonstrate the potential impact of multidimensionality on norm and criterion-reference person measure interpretations. The results provide researchers with quantitative information to help assist with the qualitative judgment as to whether the impact of multidimensionality is severe enough to warrant removing items from the analysis.  相似文献   

15.
This paper reports the use of a Rasch measurement model, the Extended Logistic Model of Rasch (Andrich, 1988), to explore the construct of a general motor ability in young children. Data were collected from 332 five and six year old children performing 24 motor skills, including run, hop, balance and ball skills. The data were categorised based on threshold estimates provided by the measurement model. Gender differences in performances on items were hypothesised to contribute to initial item and person misfit for the total sample. The data for boys and for girls were separated and independently analysed resulting in improved item and person fit. Two different, unidimensional scales for boys and for girls were created.  相似文献   

16.
A measure of the tendency to mismanage money was developed in an evaluation of a representative payee program for individuals with serious mental illnesses. A conceptual model was composed to guide item development, and items were tested, revised, added, and rejected in three waves of data collection. Rasch analyses were used to examine measurement properties. The resulting Money Mismanagement Measure (M3) consisted of 28 items with a Rasch person reliability at .72. Restriction of range was likely responsible for the low Rasch reliability. Validity analyses supported the construct validity of the M3. Subsequently, a cross-validation study was conducted on an untreated sample not as susceptible to range restriction. The M3 produced a Rasch person reliability = .85 with good validity. The M3 fills a gap that can facilitate research in the understudied area of money mismanagement.  相似文献   

17.
Past research on Computer Adaptive Testing (CAT) has focused almost exclusively on the use of binary items and minimizing the number of items to be administrated. To address this situation, extensive computer simulations were performed using partial credit items with two, three, four, and five response categories. Other variables manipulated include the number of available items, the number of respondents used to calibrate the items, and various manipulations of respondents' true locations. Three item selection strategies were used, and the theoretically optimal Maximum Information method was compared to random item selection and Bayesian Maximum Falsification approaches. The Rasch partial credit model proved to be quite robust to various imperfections, and systematic distortions did occur mainly in the absence of sufficient numbers of items located near the trait or performance levels of interest. The findings further indicate that having small numbers of items is more problematic in practice than having small numbers of respondents to calibrate these items. Most importantly, increasing the number of response categories consistently improved CAT's efficiency as well as the general quality of the results. In fact, increasing the number of response categories proved to have a greater positive impact than did the choice of item selection method, as the Maximum Information approach performed only slightly better than the Maximum Falsification approach. Accordingly, issues related to the efficiency of item selection methods are far less important than is commonly suggested in the literature. However, being based on computer simulations only, the preceding presumes that actual respondents behave according to the Rasch model. CAT research could thus benefit from empirical studies aimed at determining whether, and if so, how, selection strategies impact performance.  相似文献   

18.
This overview of Rasch measurement models begins with a conceptualization of our continuous experiences that are often captured as discrete observations. It goes on to discuss the properties that are require of measures if they are to transcend the occasion in which they were collected, and concludes with a discussion the spiral of inferential development. This is followed by a discussion of the mathematical properties of the Rasch family of models that allow the transformation of discrete deterministic counts into continuous probabilistic abstractions on which science is based. The overview concludes with a discussion of six of the family of Rasch models, Binomial Trials, Poisson Counts, Rating Scale, Partial Credit, and Ranks and the types of data for which these models are appropriate.  相似文献   

19.
The San Francisco Unified School District (SFUSD) uses the Language and Literacy Assessment Rubric (LALAR) as the secondary measurement required by the No Child Left Behind (NCLB) Act to measure English proficiency of English language learners (ELLs). In this analysis, the Rasch model is used to identify whether the LALAR is a valid measurement instrument and scale to measure the "English proficiency" of ELLs. This analysis investigates the relationship between student ability () and the probability that the student will respond correctly to an item on the LALAR. Controlling for this relationship, the item characteristics of each item, ability of each student, and measurement error associated with each score were mathematically derived. This will allow for validity and reliability tests to be conducted, which will help determine if the LALAR is a useful accountability measure for ELLs.  相似文献   

20.
Optimizing rating scale category effectiveness   总被引:2,自引:0,他引:2  
Rating scales are employed as a means of extracting more information out of an item than would be obtained from a mere "yes/no", "right/wrong" or other dichotomy. But does this additional information increase measurement accuracy and precision? Eight guidelines are suggested to aid the analyst in optimizing the manner in which rating scales categories cooperate in order to improve the utility of the resultant measures. Though these guidelines are presented within the context of Rasch analysis, they reflect aspects of rating scale functioning which impact all methods of analysis. The guidelines feature rating-scale-based data such as category frequency, ordering, rating-to-measure inferential coherence, and the quality of the scale from measurement and statistical perspectives. The manner in which the guidelines prompt recategorization or reconceptualization of the rating scale is indicated. Utilization of the guidelines is illustrated through their application to two published data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号