首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
The measurement complexities emerging from vertical equating in an educational experiment aiming at an advance in the curriculum are addressed, when calibrating an 'integer ability' scale for year 5 students from Greater Manchester based both on primary (years 5 and 6) and high school (years 7 and 8) data. The need for such a calibration resulted from experimental teaching of 'high school content' in primary school. Substantial Rasch differential item functioning (DIF) arose in the vertical equating between primary and high school in our initial 'all-on-all' 'concurrent' calibration. A second 'Primary anchored-and-extended' calibration which substantially overcame DIF problems is shown to be preferable for our teaching experiment. The relevant methodological challenges and the techniques adopted are discussed. The solution provided might be useful to researchers for educational experiments targeting an advance in the curriculum.  相似文献   

2.
BACKGROUND: In the development of health outcome measures, the pool of candidate items may be divided into multiple forms, thus "spreading" response burden over two or more study samples. Item responses collected using this approach result in two or more forms whose scores are not equivalent. Therefore, the item responses must be equated (adjusted) to a common mathematical metric. OBJECTIVES: The purpose of this study was to examine the effect of sample size, test size, and selection of item response theory model in equating three forms of a health status measure. Each of the forms was comprised of a set of items unique to it and a set of anchor items common across forms. RESEARCH DESIGN: The study was a secondary data analysis of patients' responses to the developmental item pool for the Health of Seniors Survey. A completely crossed design was used with 25 replications per study cell. RESULTS: We found that the quality of equatings was affected greatly by sample size. Its effect was far more substantial than choice of IRT model. Little or no advantage was observed for equatings based on 60 or 72 items versus those based on 48 items. CONCLUSIONS: We concluded that samples of less than 300 are clearly unacceptable for equating multiple forms. Additional sample size guidelines are offered based on our results.  相似文献   

3.
Traditionally, women and minorities have not been fully represented in science and engineering. Numerous studies have attributed these differences to gaps in science achievement as measured by various standardized tests. Rather than describe mean group differences in science achievement across multiple cultures, this study focused on an in-depth item-level analysis across two countries: Spain and the United States. This study investigated eighth-grade gender differences on science items across the two countries. A secondary purpose of the study was to explore the nature of gender differences using the many-faceted Rasch Model as a way to estimate gender DIF. A secondary analysis of data from the Third International Mathematics and Science Study (TIMSS) was used to address three questions: 1) Does gender DIF in science achievement exist? 2) Is there a relationship between gender DIF and characteristics of the science items? 3) Do the relationships between item characteristics and gender DIF in science items replicate across countries. Participants included 7,087 eight grade students from the United States and 3,855 students from Spain who participated in TIMSS. The Facets program (Linacre and Wright, 1992) was used to estimate gender DIF. The results of the analysis indicate that the content of the item seemed to be related to gender DIF. The analysis also suggests that there is a relationship between gender DIF and item format. No pattern of gender DIF related to cognitive demand was found. The general pattern of gender DIF was similar across the two countries used in the analysis. The strength of item-level analysis as opposed to group mean difference analysis is that gender differences can be detected at the item level, even when no mean differences can be detected at the group level.  相似文献   

4.
Conventional two-group DIF analysis for dichotomous items is extended to factorial DIF analysis for polytomous items where multiple grouping factors with multiple groups in each are jointly analyzed. By adopting the formulation of general linear models, item parameters across all possible groups are treated as a dependent variable and the grouping factors as independent variables. These item parameters are then reparameterized as a set of grand item parameters and sets of DIF parameters representing main and interaction effects of the factors on the items. Results of simulation studies show that the parameters of the proposed modeling could be satisfactorily recovered. A real data set of 10 polytomous items and 1924 subjects was analyzed. Applications and implications of the proposed modeling are addressed.  相似文献   

5.
The study investigated five factors which can affect the equating of scores from two tests onto a common score scale. The five factors were: (a) item distribution type (i.e., normal versus uniform; (b) standard deviation of item difficulty (i.e.,.68,.95,.99); (c) number of items or test length (i.e., 50, 100, 200); (d) number of common items (i.e., 10, 20, 30); and (e) sample size (i.e., 100, 300, 500). SIMTEST and BIGSTEPS programs were used for the simulation and equating of 4,860 item data sets, respectively. Results from the five-way fixed effects factorial analysis of variance indicated three statistically significant two-way interaction effects. Simple effects for the interaction between common item length and test length only were interpreted given Type I error rate considerations. The eta-squared values for number of common items and test length were small indicating the effects had little practical importance. The Rasch approach to equating is robust with as few as 10 common items and a test length of 100 items.  相似文献   

6.
This paper examines the impact of differential item functioning (DIF), missing item values, and different methods for handling missing item values on theta estimates with data simulated from the partial credit model and Andrich's rating scale model. Both Rasch family models are commonly used when obtaining an estimate of a respondent's attitude. The degree of missing data, DIF magnitude, and the percentage of DIF items were varied in MCAR data conditions in which the focal group was 10% of the total population. Four methods for handling missing data were compared: complete-case analysis, mean substitution, hot-decking, and multiple imputation. Bias, RMSE, means, and standard errors of the theta estimates for the focal group were adversely affected by the amount and magnitude of DIF items. RMSE and fidelity coefficients for both the reference and focal group were adversely impacted by the amount of missing data. While all methods of handling missing data performed fairly similarly, multiple imputation and hot-decking showed slightly better performance.  相似文献   

7.
Although post-equating (PE) has proven to be an acceptable method in the scaling and equating of items and forms, there are times when the turn-around period for equating and converting raw scores to scale scores is so small that PE cannot be undertaken within the prescribed time frame. In such cases, pre-equating (PrE) could be considered as an acceptable alternative. Assessing the feasibility of using item calibrations from the item bank (as in PrE) is conditioned on the equivalency of the calibrations and the errors associated with it vis a vis the results obtained via PE. This paper creates item banks over three periods of item introduction into the banks and uses the Rasch model in examining data with respect to the recovery of item parameters, the measurement error, and the effect cut-points have on examinee placement in both the PrE and PE situations. Results indicate that PrE is a viable solution to PE provided the stability of the item calibrations are enhanced by using large sample sizes (perhaps as large as full-population) in populating the item bank.  相似文献   

8.
Functional Caregiving (FC) is a construct about mothers caring for children (both old and young) with intellectual disabilities, which is operationally defined by two nonequivalent survey forms, urban and suburban, respectively. The purposes of this research are, first, to generalize school-based achievement test principles to survey methods by equating two nonequivalent survey forms. A second purpose is to expand FC foundations by a) establishing linear measurement properties for new caregiving items, b) replicate a hierarchical item structure across an urban, school-based population, c) consolidate survey forms to establish a calibrated item bank, and d) collect more external construct validity data. Results supported invariant item parameters of a fixed item form (96 items) for two urban samples (N = 186). FC measures also showed expected construct relationships with age, mental depression, and health status. However, only five common items between urban and suburban forms were statistically stable because suburban mothers' age and child's age appear to interact with medical information and social activities.  相似文献   

9.
Feature-based geometric reasoning for process planning   总被引:2,自引:0,他引:2  
We present a framework based on Domain Independent Form (DIF) features for automatic evaluation of manufacturability and process planning for machining. The framework enables interpretation of a common product model with respect to each task in the transition from design to manufacture. A key idea here is to generate the interpretation suitable for each task in two steps. In the first step, DIF features that are defined through feature enumeration are automatically extracted from the geometric model. The extracted DIF features are then mapped into features meaningful for individual tasks through geometric reasoning based on domain dependent knowledge. The formal approach to feature definitions and separation of the domain specific reasoning from the general geometric reasoning enable us to overcome the bottlenecks reported in features technology. Work reported in this paper has been funded in part by grants from Aeronautical Development Agency and the Department of Science and Technology.  相似文献   

10.
The purpose of this study is to explore criteria for common element test equating for performance examinations. Using the multi-facet Rasch model, each element of each facet is calibrated or placed in a relative position on a Benchmark or reference scale. Common elements from each facet, included on the examinations being equated, are used to anchor the facet elements to the Benchmark Scale. This places all examinations on the same scale so that the same criterion standard can be used. Performance examinations typically have three to four facets including examinees, raters, items and tasks. Raters rate examinees on tasks related to the items included in the test. The initial anchoring of a current test administration to the Benchmark Scale is evaluated for invariance and fit. If there is too much variance or lack of fit for particular facet elements, it may be necessary to unanchor those elements, which means they are not used in the equating. The equating process was applied to an exam with four facets and another with five facets. Results found few common facet elements that could not be used in the test equating process and that differences in the difficulty of the equated exams were identified so that the criterion standard on the Benchmark Scale could be used. It was necessary to use careful quality control for anchoring the common elements in each facet. The common elements should be unaltered from their original use. Strict criteria for displacement and fit must be established and used consistently. Unanchoring inconsistent and/or misfitting facet elements improves the quality of the test equating.  相似文献   

11.
A series of tests were developed to assess the proficiency of Australian Year 5 and Year 8 students in Asian Studies. This paper presents results of analyses that involved calibrating items distributed over 14 overlapping subtests, developed to cater for state and territory curricula and two year-levels. This allowed for state and year-level preferences to be selected from a common pool of 105 items. The project used common item anchoring to map all students and items onto a single, underpinning scale that was identified and interpreted using concurrent equating procedures and a skills audit of items.  相似文献   

12.
考查地震或水下非接触爆炸冲击下旋转机械的动态响应特性,一般从研究转子系统基础冲击响应出发。由于陀螺效应和转子-轴承的交互效应,转子系统运动方程系数矩阵呈非对称性,不能在模态坐标下解耦,无法利用常规模态叠加法求解,所以以往的研究一般采用数值积分如Newmark法等进行迭代求解,但数值积分法相对模态叠加法要耗费较多的计算资源。提出了一种复数域内转子系统冲击响应计算方法,无需坐标解耦但仍可以利用线性叠加法进行响应求解。首先将激励和响应傅立叶展开成复数形式,包括正向旋转项和反向旋转项,根据方程左右两边相同频率前系数相等的事实得到特征方程,将特征方程写成简单矩阵束的本征方程形式,求得矩阵束的本征值和本征向量,将本征向量正规化,进一步得到矩阵束的逆阵,将逆阵元素取名为“频响因子”,将逆阵与激励相乘即可得到频率响应幅值,将所有频率响应成分叠加即可得到系统响应。通过一个工程实例,比较了所提方法与数值积分方法的结果,比较分析表明,所提方法满足工程要求,可以作为转子系统基础冲击响应和瞬态响应计算的一种普适方法。  相似文献   

13.
Two independent projects are described in which drop-hammer techniques are used to investigate the dynamic increase factor (DIF) under both flexural and shear high-speed loading of a new ultra high performance fibre reinforced blast-resistant concrete. The results from both studies correlate well. The results show that a DIF of the flexural tensile strength rising from 1.0 at 1 s−1 on a slope of 1/3 on a log (strain rate) versus log (DIF) plot can be used for design purposes. The results also show that no DIF should be used to increase the shear strength at high loading rates.  相似文献   

14.
梁杰  李庆超 《计量学报》2019,40(6):1140-1145
针对车载气体传感器测量准确度低的问题,运用前向和反向两种修正算法对其测量结果进行后修正处理,从而提高测量数据的可信度。通过车载气体传感器与固定监测点分析仪器的测量结果进行比较,验证了这两种算法,测量结果修正后,测量误差降低了50%。此外,又提出了基于这两种算法的多级修正模型,并对其进行评估,经过测试,两种算法在该模型上的测试结果基本满足需要。  相似文献   

15.
The invariance of the estimated parameters across variation in the incidental parameters of a sample is one of the most important properties of Rasch measurement models. This is the property that allows the equating of test forms and the use of computer adaptive testing. It necessarily follows that in Rasch models if the data fit the model, than the estimation of the parameter of interest must be invariant across sub-samples of the items or persons. This study investigates the degree to which the INFIT and OUTFIT item fit statistics in WINSTEPS detect violations of the invariance property of Rasch measurement models. The test in this study is a 80 item multiple-choice test used to assess mathematics competency. The WINSTEPS analysis of the dichotomous results, based on a sample of 2000 from a very large number of students who took the exam, indicated that only 7 of the 80 items misfit using the 1.3 mean square criteria advocated by Linacre and Wright. Subsequent calibration of separate samples of 1,000 students from the upper and lower third of the person raw score distribution, followed by a t-test comparison of the item calibrations, indicated that the item difficulties for 60 of the 80 items were more than 2 standard errors apart. The separate calibration t-values ranged from +21.00 to -7.00 with the t-test value of 41 of the 80 comparisons either larger than +5 or smaller than -5. Clearly these data do not exhibit the invariance of the item parameters expected if the data fit the model. Yet the INFIT and OUTFIT mean squares are completely insensitive to the lack of invariance in the item parameters. If the OUTFIT ZSTD from WINSTEPS was used with a critical value of | t | > 2.0, then 56 of the 60 items identified by the separate calibration t-test would be identified as misfitting. A fourth measure of misfit, the between ability-group item fit statistic identified 69 items as misfitting when a critical value of t > 2.0 was used. Clearly relying solely on the INFIT and OUTFIT mean squares in WINSETPS to assess the fit of the data to the model would cause one to miss one of the most important threats to the usefulness of the measurement model.  相似文献   

16.
Colleges and universities conduct student satisfaction studies for many important policy making reasons. However the differences in instrumentation and the use of students' self-reported ratings of satisfaction makes such decisions sample-, instrument-, and institution-dependent. A common metric of student satisfaction would assist decision makers by providing a richness of information not typically obtained. The present study investigated the extent to which two nationally known instruments of student satisfaction could be scaled on the same quantitative metric. Pseudo-common item equating (Fisher, 1997) based on five link items of low and high endorsability enabled comparisons of "similar, but not identical items, from different instruments, calibrated on different samples" (p. 87). Results suggest that both instruments measured similar constructs and could be reasonably used to create a single, common metric. While samples used in the experiment were less than ideal, results clearly demonstrated the usefulness and reasonability of the pseudo-common item equating process.  相似文献   

17.
A number of state assessment programs that employ Rasch-based common item equating procedures estimate the equating constant with only those common items for which the two tests' Rasch item difficulty parameter estimates differ by less than 0.3 logits. The results of this study presents evidence that this practice results in an inflated probability of incorrectly dropping an item from the common item set if the number of examinees is small (e.g., 500 or less) and the reverse if the number of examinees is large (e.g., 5000 or more). An asymptotic experiment-wise error rate criterion was algebraically derived. This same criterion can also be applied to the Mantel-Haenszel statistic. Bonferroni test statistics were found to provide excellent approximations to the (asymptotically) exact test statistics.  相似文献   

18.
Svetina  Dubravka  Dai  Shenghai  Wang  Xiaolin 《Behaviormetrika》2017,44(2):313-349

This study explored potential sources of differential item functioning (DIF) among accommodated and nonaccommodated groups by examining skills and cognitive processes hypothesized to underlie student performance on the National Assessment for Educational Progress (NAEP). Out of 53 released NAEP items in 2007 for grade 8, a total of 25 items were flagged as DIF among the four studied groups (nonaccommodated, accommodated with extra time, accommodated with read aloud, and accommodated with small groups) by a generalized logistic regression method. The Reparameterized Unified Model was fit to the same data using a Q-matrix containing 25 skills that included content-, process-, and item-type attributes. The nonaccommodated group yielded the highest averages of attribute mastery probabilities as well as the largest proportion of mastered examinees among all the groups. The three accommodated groups tended to have similar attribute mastery means, with the group accommodated with small groups yielding a larger proportion of mastery examinees when compared to the other two accommodated groups.

  相似文献   

19.
李潇  方秦  孔祥振  吴昊 《工程力学》2018,35(7):187-193
进行了砂浆材料在不同应变率下的SHPB实验,拟合实验数据得到了动态强度放大因子DIF随应变率变化的关系曲线。基于实验测得应变率时程曲线,采用简化有限元模型,对实验进行了数值模拟。该文探讨了动态压缩实验中惯性效应产生的原因,并基于数值模拟对本实验中惯性效应对材料动态强度的影响进行了剥离,得到了砂浆材料动态强度放大因子随应变率变化的固有特性曲线,并将该固有特性曲线作为数值模拟中应变率效应的输入,计算结果与实验得到的应力-应变曲线吻合较好。进一步通过比较输入CEB推荐曲线和已有半经验公式的模拟结果,验证了所提出砂浆材料动态强度放大因子固有特性曲线的优越性。  相似文献   

20.
When a new set of mixed format items is augmented with a previous old multiple-choice (MC) test, those mixed format items should be linked to the existing old MC test. This study used simulation to investigate sample size effect on recovery of known item parameter from the concurrent calibration in the context of horizontal equating, where the new mixed format tests are equated to the existing MC test which acts as the common linking items. In the partial credit model following the Andrich style parameterization, item location and item step parameters were differentially affected by the sample size. Item location parameters were recovered better than item step parameters at the individual item, the sub-test, and the total test level. This study also shows the outward bias for the item location parameter estimated by the maximum likelihood estimator.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号