首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In state assessment programs that employ Rasch-based common item linking procedures, the linking constant is usually estimated with only those common items not identified as exhibiting item difficulty parameter drift. Since state assessments typically contain a fixed number of items, an item classified as exhibiting parameter drift during the linking process remains on the exam as a scorable item even if it is removed from the common item set. Under the assumption that item parameter drift has occurred for one or more of the common items, the expected effect of including or excluding the "affected" item(s) in the estimation of the linking constant is derived in this article. If the item parameter drift is due solely to factors not associated with a change in examinee achievement, no linking error will (be expected to) occur given that the linking constant is estimated only with the items not identified as "affected"; linking error will (be expected to) occur if the linking constant is estimated with all common items. However, if the item parameter drift is due solely to change in examinee achievement, the opposite is true: no linking error will (be expected to) occur if the linking constant is estimated with all common items; linking error will (be expected to) occur if the linking constant is estimated only with the items not identified as "affected".  相似文献   

2.
This paper used real and simulated data sets to compare three screening approaches often used in state-wide equating programs utilizing the Rasch model: Wright and Stone's t-statistic, robust z-statistic, and displace. Analyses of real data sets supported the superiority of robust z-statistic and displace measure relative to Wright and Stone's t-statistic. The simulation component did not support the contention that indiscriminate use of the +/-0.3 logits criterion inflates rates of Type I error for robust z-statistic and displace measure, although this contention was supported for the Wright and Stone's t-statistic. However, Type II error rates were largest for displace measure, followed by the robust z-statistic, then the t-statistic. The paper discusses the importance of a priori selection of a criterion for screening linking items and its effects on stability and accuracy of Rasch equating constant.  相似文献   

3.
The stable tail dependence function gives a full characterisation of the extremal dependence between two or more random variables. In this paper, we propose an estimator for this function which is robust against outliers in the sample. The estimator is derived from a bivariate second-order tail model together with a proper transformation of the bivariate observations, and its asymptotic properties are studied under some suitable regularity conditions. Our estimation procedure depends on two parameters: \(\alpha \), which controls the trade-off between efficiency and robustness of the estimator, and a second-order parameter \(\tau \), which can be replaced by a fixed value or by an estimate. In case where \(\tau \) has been replaced by the true value or by an external consistent estimator, our robust estimator is asymptotically unbiased, whereas in case where \(\tau \) is mis-specified, one loses this property, but still our estimator performs quite well with respect to bias. The finite sample performance of our robust and bias-corrected estimator of the stable tail dependence function is examined on a simulation study involving uncontaminated and contaminated samples. In particular, its behavior is illustrated for different values of the pair \((\alpha , \tau )\) and is compared with alternative estimators from the extreme value literature.  相似文献   

4.
Cause‐selecting control charts are believed to be invaluable for monitoring and diagnosing multistage processes where the output quality of some stages is significantly impacted by the output quality of preceding stages. To establish a relationship between input and output variables, a standard procedure uses historical data, which are often prone to hold outliers. The presence of outliers tends to decrease the effectiveness of monitoring procedures because the regression model is distorted and the control limits become stretched. To dampen the negative repercussions of outliers, robust fitting techniques based on M‐estimators are implemented instead of the ordinary least‐squares method and two robust monitoring approaches are presented. An example is given to illustrate the application and performance of the proposed control charts. Furthermore, a simulation‐based study is included to investigate and compare the average run length of robust and non‐robust schemes. The results reveal that the robust procedure far outperforms the non‐robust counterpart due to its prompt detection of out‐of‐control conditions when outliers exist. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

5.
In the Rasch model for items with more than two ordered response categories, the thresholds that define the successive categories are an integral part of the structure of each item in that the probability of the response in any category is a function of all thresholds, not just the thresholds between any two categories. This paper describes a method of estimation for the Rasch model that takes advantage of this structure. In particular, instead of estimating the thresholds directly, it estimates the principal components of the thresholds, from which threshold estimates are then recovered. The principal components are estimated using a pairwise maximum likelihood algorithm which specialises to the well known algorithm for dichotomous items. The method of estimation has three advantageous properties. First, by considering items in all possible pairs, sufficiency in the Rasch model is exploited with the person parameter conditioned out in estimating the item parameters, and by analogy to the pairwise algorithm for dichotomous items, the estimates appear to be consistent, though unlike for the dichotomous case, no formal proof has yet been provided. Second, the estimates of each item parameter is a function of frequencies in all categories of the item rather than just a function of frequencies of two adjacent categories. This stabilizes estimates in the presence of low frequency data. Third, the procedure accounts readily for missing data. All of these properties are important when the model is used for constructing variables from large scale data sets which must account for structurally missing data. A simulation study shows that the quality of the estimates is excellent.  相似文献   

6.
Model-based process-monitoring procedures are extremely useful in situations where an output variable of interest is impacted by one or more inputs to the process, and where there are multistage processes with multiple inputs and outputs. To build the model relating input and output variables, the procedure uses historical data, which often contain outliers. To accommodate the presence of these outliers, a robust fitting scheme is introduced for the Generalized Linear Model in process monitoring. Robust deviance residuals are defined and used as the basis of the monitoring procedure. An example and a simulation study for a gamma-distributed response are included. The average run length performance reveals that the procedure is effective for detecting small process shifts when outliers are present.  相似文献   

7.
Scientists, especially environmental scientists, often encounter trace level concentrations that are typically reported as less than a certain limit of detection, L. Type I left-censored data arises when certain low values lying below L are ignored or unknown as they cannot be measured accurately. In many environmental quality assurance and quality control (QA/QC), and groundwater monitoring applications of the United States Environmental Protection Agency (USEPA), values smaller than L are not required to be reported. However, practitioners still need to obtain reliable estimates of the population mean μ, and the standard deviation (S.D.) σ. The problem gets complex when a small number of high concentrations are observed with a substantial number of concentrations below the detection limit. The high-outlying values contaminate the underlying censored sample, leading to distorted estimates of μ and σ. The USEPA, through the National Exposure Research Laboratory-Las Vegas (NERL-LV), under the Office of Research and Development (ORD), has research interests in developing statistically rigorous robust estimation procedures for contaminated left-censored data sets. Robust estimation procedures based upon a proposed (PROP) influence function are shown to result in reliable estimates of population parameters of mean and S.D. using contaminated left-censored samples. It is also observed that the robust estimates thus obtained with or without the outliers are in close agreement with the corresponding classical estimates after the removal of outliers. Several classical and robust methods for the estimation of μ and σ using left-censored (truncated) data sets with potential outliers have been reviewed and evaluated.  相似文献   

8.
A multidimensional Rasch model was applied to two instruments measuring abilities in two related areas of a university general education curriculum. Grades from related courses were also calibrated using the Rasch model. Thus, course grades, test items, and persons were all placed on the same metric. Incorporating grades within the metric provided additional meaning to the measures; instructors could see which items were matched to students in a particular grade range for a course. This could help not only in interpreting items but also in interpreting grades. Test items and grades fit the model reasonably well, with adequate person separation reliability.  相似文献   

9.
This article contains information on the Rasch measurement partial credit model: what it is, how it differs from other Rasch models, when to use it, and how to use it. The calibration of instruments with increasingly complex items is described, starting with dichotomous items and moving on to polychotomous items using a single rating scale, and mixed polychotomous items using multiple rating scales, and instruments in which each item has its own rating scale. It also introduces a procedure for aligning rating scale categories to be used when more than one rating scale is used in a single instrument. Pivot anchoring is defined and an illustration of its use with the mental health scale of the SF-36 that contains positive and negative worded items is provided. It finally describes the effect of pivot anchoring on step calibrations, the item hierarchy, and person measures.  相似文献   

10.
The performances of three procedures for treatment of outliers in normal samples are evaluated. The first procedure is the sequential application of the usual maximum residual test. The largest observation is declared an outlier if the largest studentized residual exceeds a predetermined value. If one outlier is detected, the test is repeated on t.he remaining observations, the process continuing until no further outliers are detected. In the second procedure the two largest observations are declared outliers if the sum of the two largest studentized residuals exceeds a predetermined value. In the third procedure the two largest observations are considered outliers if the ratio of the corrected sum of squares omitting these values to the total corrected sum of squaresis less than a critical ratio. The performances of these procedures are evaluated for samples in which two of the observations have means different from the common mean of the remainder of the sample.  相似文献   

11.
In the present paper, the Rasch measurement model is used in the validation and analysis of data coming from the satisfaction section of the first national survey concerning the social services sector carried out in Italy. A comparison between two Rasch models for polytomous data, that is the Rating Scale Model and the Partial Credit Model, is discussed. Given that the two models provide similar estimates of the item difficulties and workers satisfaction, for almost all the items the response probabilities computed using the RSM and the PCM are very close and the analysis of the bootstrap confidence intervals shows that the estimates obtained applying the RSM are more stable than the ones obtained using the PCM, it can be conclude that, for the present data, the RSM is more appropriate than the PCM.  相似文献   

12.
The Rasch family of models displays several well-documented properties that distinguish them from the general item response theory (IRT) family of measurement models. This paper describes an additional unique property of Rasch models, referred to as the property of item information constancy. This property asserts that the area under the information function for Rasch models is always equal to the number of response categories minus one, regardless of the values of the item location parameters. The implication of the property of item information constancy is that, for a given number of response categories, all items following a Rasch model contribute equally to the height of the test information function across the entire latent continuum.  相似文献   

13.
Past research on Computer Adaptive Testing (CAT) has focused almost exclusively on the use of binary items and minimizing the number of items to be administrated. To address this situation, extensive computer simulations were performed using partial credit items with two, three, four, and five response categories. Other variables manipulated include the number of available items, the number of respondents used to calibrate the items, and various manipulations of respondents' true locations. Three item selection strategies were used, and the theoretically optimal Maximum Information method was compared to random item selection and Bayesian Maximum Falsification approaches. The Rasch partial credit model proved to be quite robust to various imperfections, and systematic distortions did occur mainly in the absence of sufficient numbers of items located near the trait or performance levels of interest. The findings further indicate that having small numbers of items is more problematic in practice than having small numbers of respondents to calibrate these items. Most importantly, increasing the number of response categories consistently improved CAT's efficiency as well as the general quality of the results. In fact, increasing the number of response categories proved to have a greater positive impact than did the choice of item selection method, as the Maximum Information approach performed only slightly better than the Maximum Falsification approach. Accordingly, issues related to the efficiency of item selection methods are far less important than is commonly suggested in the literature. However, being based on computer simulations only, the preceding presumes that actual respondents behave according to the Rasch model. CAT research could thus benefit from empirical studies aimed at determining whether, and if so, how, selection strategies impact performance.  相似文献   

14.
The item parameters of a polytomous Rasch model can be estimated using marginal and conditional approaches. This paper describes how this can be done in SAS (V8.2) for three item parameter estimation procedures: marginal maximum likelihood estimation, conditional maximum likelihood estimation, and pairwise conditional estimation. The use of the procedures for extensions of the Rasch model is also discussed. The accuracy of the methods are evaluated using a simulation study.  相似文献   

15.
The standard scoring structure of the revised Minnesota Multiphasic Personality Inventory (MMPI-2) Social Introversion (Si) scale was reexamined with Rasch Measurement. The 69-item Si scale split into two distinct dimensions when their standardized residuals were factor analyzed. Items keyed "true" to Si defined one dimension and items keyed "false" defined another. Relationships between Lexile values (an index of reading difficulty and comprehension) and item difficulties were also explored. The article shows how to use Rasch Measurement to understand and improve personality assessment.  相似文献   

16.
In 2005 PISA published trend indicators that compared the results of PISA 2000 and PISA 2003. In this paper we explore the extent to which the outcomes of these trend analyses are sensitive to the choice of test equating methodologies, the choice of regression models and the choice of linking items. To establish trends PISA equated its 2000 and 2003 tests using a methodology based on Rasch Modelling that involved estimating linear transformations that mapped 2003 Rasch-scaled scores to the previously established PISA 2000 Rasch-scaled scores. In this paper we compare the outcomes of this approach with an alternative, which involves the joint Rasch scaling of the PISA 2000 and PISA 2003 data separately for each country. Note that under this approach the item parameters are estimated separately for each country, whereas the linear transformation approach used a common set of item parameter estimates for all countries. Further, as its primary trend indicators, PISA reported changes in mean scores between 2000 and 2003. These means are not adjusted for changes in the background characteristics of the PISA 2000 and PISA 2003 samples - that is, they are marginal rather than conditional means. The use of conditional rather than marginal means results in some differing conclusions regarding trends at both the country and within-country level.  相似文献   

17.
One often hears the question asked, "For questionnaire data measuring a variable, what difference does it make to use factor analysis/principal components analysis (true-score theory) or Rasch measurement in testing for dimensionality?" This paper reports both factor analysis and Rasch measurement analysis for two sets of data. One set of data measures social anxiety for primary school students (N=436, I=10) and the second measures attitude to mathematics for primary-aged students (N=774, I=10). For both sets of data, the factor analysis suggests that the scores are reliable, and that inferences can be made that are valid for measuring school anxiety and attitude to mathematics. For both sets of data analyzed with Rasch measurement techniques, the reliability of the measures, the dimensionality of the measures, and the initial conceptualisation of the items, are called into question. It suggests that one cannot make valid inferences from the measures that were initially set up for true-score theory. The Rasch analysis suggests that items intended to measure a variable should be initially developed on a conceptualized scale from easy to hard, and that students should answer the items from this perspective, so that the Rasch analysis of the data tests this conceptualisation, and a linear scale can be created based on a mathematical measurement model with consistent units (logits).  相似文献   

18.
In Scottish High School mathematics examinations partial credit is normally awarded for answers which are not totally correct but nevertheless contain some of the correct working. As a way of incorporating partial credit in the marking of ICT versions of these examinations, "steps" have been introduced. The use of "steps" also allows for a Rasch analysis that measures the inaccessibility of items and the confidence of candidates in addition to the traditional difficulty of items and ability of candidates. Two Rasch models can be fitted and jointly assessed for fit. The resulting measures can then be investigated for any relationship between ability and confidence and between difficulty and inaccessibility. A small data set has been used to illustrate these ideas.  相似文献   

19.
Two frequently used parametric statistics of person-fit with the dichotomous Rasch model (RM) are adjusted and compared to each other and to their original counterparts in terms of power to detect aberrant response patterns in short tests (10, 20, and 30 items). Specifically, the cube root transformation of the mean square for the unweighted person-fit statistic, t, and the standardized likelihood-based person-fit statistic Z3 were adjusted by estimating the probability for correct item response through the use of symmetric functions in the dichotomous Rasch model. The results for simulated unidimensional Rasch data indicate that t and Z3 are consistently, yet not greatly, outperformed by their adjusted counterparts, denoted t* and Z3*, respectively. The four parametric statistics, t, Z3, t*, and Z3*, were also compared to a non-parametric statistic, HT, identified in recent research as outperforming numerous parametric and non-parametric person-fit statistics. The results show that HT substantially outperforms t, Z3, t*, and Z3* in detecting aberrant response patterns for 20-item and 30-item tests, but not for very short tests of 10 items. The detection power of t, Z3, t*, and Z3*, and HT at two specific levels of Type I error, .10 and .05 (i.e., up to 10% and 5% false alarm rate, respectively), is also reported.  相似文献   

20.
Mixed models take the dependency between observations based on the same person into account by introducing one or more random effects. After introducing the mixed model framework, it is explained, by taking the Rasch model as a generic example, how item response models can be conceptualized as generalized linear and nonlinear mixed models. Common estimation methods for generalized linear and nonlinear models are discussed. In a simulation study, the performance of four estimation methods is assessed for the Rasch model under different conditions regarding the number of items and persons, and the degree of interindividual differences. The estimation methods included in the study are: an approximation of the integral over the random effect by means of Gaussian quadrature; direct maximization with a sixth-order Laplace approximation to the integrand; a linearized approximation of the nonlinear model employing PQL2; and finally a Bayesian MCMC method. It is concluded that the estimation methods perform almost equally well, except for a slightly worse recovery of the variance parameter for PQL2 and MCMC.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号