首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 220 毫秒
1.
BACKGROUND: In the development of health outcome measures, the pool of candidate items may be divided into multiple forms, thus "spreading" response burden over two or more study samples. Item responses collected using this approach result in two or more forms whose scores are not equivalent. Therefore, the item responses must be equated (adjusted) to a common mathematical metric. OBJECTIVES: The purpose of this study was to examine the effect of sample size, test size, and selection of item response theory model in equating three forms of a health status measure. Each of the forms was comprised of a set of items unique to it and a set of anchor items common across forms. RESEARCH DESIGN: The study was a secondary data analysis of patients' responses to the developmental item pool for the Health of Seniors Survey. A completely crossed design was used with 25 replications per study cell. RESULTS: We found that the quality of equatings was affected greatly by sample size. Its effect was far more substantial than choice of IRT model. Little or no advantage was observed for equatings based on 60 or 72 items versus those based on 48 items. CONCLUSIONS: We concluded that samples of less than 300 are clearly unacceptable for equating multiple forms. Additional sample size guidelines are offered based on our results.  相似文献   

2.
Functional Caregiving (FC) is a construct about mothers caring for children (both old and young) with intellectual disabilities, which is operationally defined by two nonequivalent survey forms, urban and suburban, respectively. The purposes of this research are, first, to generalize school-based achievement test principles to survey methods by equating two nonequivalent survey forms. A second purpose is to expand FC foundations by a) establishing linear measurement properties for new caregiving items, b) replicate a hierarchical item structure across an urban, school-based population, c) consolidate survey forms to establish a calibrated item bank, and d) collect more external construct validity data. Results supported invariant item parameters of a fixed item form (96 items) for two urban samples (N = 186). FC measures also showed expected construct relationships with age, mental depression, and health status. However, only five common items between urban and suburban forms were statistically stable because suburban mothers' age and child's age appear to interact with medical information and social activities.  相似文献   

3.
The purpose of this study is to explore criteria for common element test equating for performance examinations. Using the multi-facet Rasch model, each element of each facet is calibrated or placed in a relative position on a Benchmark or reference scale. Common elements from each facet, included on the examinations being equated, are used to anchor the facet elements to the Benchmark Scale. This places all examinations on the same scale so that the same criterion standard can be used. Performance examinations typically have three to four facets including examinees, raters, items and tasks. Raters rate examinees on tasks related to the items included in the test. The initial anchoring of a current test administration to the Benchmark Scale is evaluated for invariance and fit. If there is too much variance or lack of fit for particular facet elements, it may be necessary to unanchor those elements, which means they are not used in the equating. The equating process was applied to an exam with four facets and another with five facets. Results found few common facet elements that could not be used in the test equating process and that differences in the difficulty of the equated exams were identified so that the criterion standard on the Benchmark Scale could be used. It was necessary to use careful quality control for anchoring the common elements in each facet. The common elements should be unaltered from their original use. Strict criteria for displacement and fit must be established and used consistently. Unanchoring inconsistent and/or misfitting facet elements improves the quality of the test equating.  相似文献   

4.
The study investigated five factors which can affect the equating of scores from two tests onto a common score scale. The five factors were: (a) item distribution type (i.e., normal versus uniform; (b) standard deviation of item difficulty (i.e.,.68,.95,.99); (c) number of items or test length (i.e., 50, 100, 200); (d) number of common items (i.e., 10, 20, 30); and (e) sample size (i.e., 100, 300, 500). SIMTEST and BIGSTEPS programs were used for the simulation and equating of 4,860 item data sets, respectively. Results from the five-way fixed effects factorial analysis of variance indicated three statistically significant two-way interaction effects. Simple effects for the interaction between common item length and test length only were interpreted given Type I error rate considerations. The eta-squared values for number of common items and test length were small indicating the effects had little practical importance. The Rasch approach to equating is robust with as few as 10 common items and a test length of 100 items.  相似文献   

5.
There has been some discussion among researchers as to the benefits of using one calibration process over the other during equating. Although literature is rife with the pros and cons of the different methods, hardly any research has been done on anchoring (i.e., fixing item parameters to their pre-determined values on an established scale) as a method that is commonly used by psychometricians in large-scale assessments. This simulation research compares the fixed form of calibration with the concurrent method (where calibration of the different forms on the same scale is accomplished by a single run of the calibration process, treating all non-included items on the forms as missing or not reached), using the dichotomous Rasch (Rasch, 1960) and the Rasch partial credit (Masters, 1982) models, and the WINSTEPS (Linacre, 2003) computer program. Contrary to the belief and some researchers' contention that the concurrent run with larger n-counts for the common items would provide greater accuracy in the estimation of item parameters, the results of this paper indicate that the greater accuracy of one method over the other is confounded by the sample-size, the number of common items, etc., and there is no real benefit in using one method over the other in the calibration and equating of parallel tests forms.  相似文献   

6.
Although post-equating (PE) has proven to be an acceptable method in the scaling and equating of items and forms, there are times when the turn-around period for equating and converting raw scores to scale scores is so small that PE cannot be undertaken within the prescribed time frame. In such cases, pre-equating (PrE) could be considered as an acceptable alternative. Assessing the feasibility of using item calibrations from the item bank (as in PrE) is conditioned on the equivalency of the calibrations and the errors associated with it vis a vis the results obtained via PE. This paper creates item banks over three periods of item introduction into the banks and uses the Rasch model in examining data with respect to the recovery of item parameters, the measurement error, and the effect cut-points have on examinee placement in both the PrE and PE situations. Results indicate that PrE is a viable solution to PE provided the stability of the item calibrations are enhanced by using large sample sizes (perhaps as large as full-population) in populating the item bank.  相似文献   

7.
This study describes and demonstrates a set of processes for developing new forms of examinations which are intended to have equivalent cut scores in the raw score metric. This approach goes beyond the traditional Rasch-based approach which develops forms with cut scores that are equated in the logit metric. The methods described in this study can be used to create multiple forms of an assessment, all of which have the same raw score cut score (i.e., the number correct required to pass each examination form represents the same amount of the underlying construct). This paper provides an overview of equating standards, the research related specifically to pre-equating procedures, and three guidelines which can be used to achieve equal raw score cut scores. Three examples of how to use the guidelines as part of an iterative form-development process are provided using simulated data sets.  相似文献   

8.
A mesh-size insensitive structural stress definition is presented in this paper. The structural stress definition is consistent with elementary structural mechanics theory and provides an effective measure of a stress state that pertains to fatigue behavior of welded joints in the form of both membrane and bending components. Numerical procedures for both solid models and shell or plate element models are presented to demonstrate the mesh-size insensitivity in extracting the structural stress parameter. Conventional finite element models can be directly used with the structural stress calculation as a post-processing procedure. To further illustrate the effectiveness of the present structural stress procedures, a collection of existing weld S-N data for various joint types were processed using the current structural stress procedures. The results strongly suggests that weld classification based S-N curves can be significantly reduced into possibly a single master S-N curve, in which the slope of the S-N curve is determined by the relative composition of the membrane and bending components of the structural stress parameter. The effects of membrane and bending on S-N behaviors can be addressed by introducing an equivalent stress intensity factor based parameter using the structural stress components. Among other things, the two major implications are: (a) structural stresses pertaining to weld fatigue behavior can be consistently calculated in a mesh-insensitive manner regardless of types of finite element models; (b) transferability of weld S-N test data, regardless of welded joint types and loading modes, can be established using the structural stress based parameters.  相似文献   

9.
This paper presents three quasi on-line scheduling procedures for FMSs consisting of work stations, transport devices, and operators. In the scheduling, different types of decisions are taken to perform a particular operation, i.e. the selection of (a) a work station, (b) a transport device and (c) an operator. Further, (d) the scheduling sequence of the operations has to be determined. The three developed procedures differ in the way these four decision problems are solved hierarchically. Several dispatching rules (SPT, SPT.TOT, SPT/ TOT and EFTA) are available to solve the last mentioned decision problem. Limited buffer capacities in an FMS may cause deadlock in the procedures as well as in practice. The scheduling procedures involve a buffer handling method to avoid deadlock. A case study is presented to demonstrate the three procedures and to show some of its properties. Based on simulation tests, some conclusions are drawn about the performance of the scheduling procedures and the various dispatching rules.  相似文献   

10.
In reliability engineering, component failures are generally classified in one of three ways: (1) early life failures; (2) failures having random onset times; and (3) late life or ‘wear out’ failures. When the time-distribution of failures of a population of components is analysed in terms of a Weibull distribution, these failure types may be associated with shape parameters β having values <1, 1, and >1 respectively. Early life failures are frequently attributed to poor design (e.g. poor materials selection) or problems associated with manufacturing or assembly processes.

We describe a methodology for the implementation of physics-of-failure models of component lifetimes in the presence of parameter and model uncertainties. This treats uncertain parameters as random variables described by some appropriate statistical distribution, which may be sampled using Monte Carlo methods. The number of simulations required depends upon the desired accuracy of the predicted lifetime. Provided that the number of sampled variables is relatively small, an accuracy of 1–2% can be obtained using typically 1000 simulations.

The resulting collection of times-to-failure are then sorted into ascending order and fitted to a Weibull distribution to obtain a shape factor β and a characteristic life-time η.

Examples are given of the results obtained using three different models: (1) the Eyring–Peck (EP) model for corrosion of printed circuit boards; (2) a power-law corrosion growth (PCG) model which represents the progressive deterioration of oil and gas pipelines; and (3) a random shock-loading model of mechanical failure. It is shown that for any specific model the values of the Weibull shape parameters obtained may be strongly dependent on the degree of uncertainty of the underlying input parameters. Both the EP and PCG models can yield a wide range of values of β, from β>1, characteristic of wear-out behaviour, to β<1, characteristic of early-life failure, depending on the degree of dispersion of the uncertain parameters. If there is no uncertainty, a single, sharp value of the component lifetime is predicted, corresponding to the limit β=∞. In contrast, the shock-loading model is inherently random, and its predictions correspond closely to those of a constant hazard rate model, characterized by a value of β close to 1 for all finite degrees of parameter uncertainty.

The results are discussed in the context of traditional methods for reliability analysis and conventional views on the nature of early-life failures.  相似文献   


11.
This paper proposes a multilevel measurement model that controls for DIF effects in test equating. The accuracy and stability of item and ability parameter estimates under the proposed multilevel measurement model were examined using randomly simulated data. Estimates from the proposed model were compared with those resulting from two multiple-group concurrent equating designs, including 1) a design that replaced DIF-items with items with no DIF; and 2) a design that retained DIF items with no attempt to control for DIF. In most of the investigated conditions, the results indicated that the proposed multilevel measurement model performed better than the two comparison models.  相似文献   

12.
Helical anchors have been widely used during the last 15 years to support electrical transmission line towers in Brazil; however, the prediction of the uplift capacity of these anchors is still a very difficult task. Typically, costly and time-consuming static load tests complemented with torque–capacity correlations are used for the determination of anchor capacity. One reason for the inability of the existing design methods is the degree of soil disturbance during installation, which is variable according to the soil type, number of helices, anchor configuration, and quality of installation. Therefore, an extensive database of field load tests and installation monitoring of helical foundations in different Brazilian sites would be valuable to guide the engineers for a better and optimum design, reducing project cost. The current paper presents a helical anchor database that includes data of soil characterization, anchor geometry, installation torque and load–displacement response of 107 anchor cases in 40 different sites in Brazil. Additionally, an analysis of the data presented here indicated that the torque-correlation method, used frequently for the quality control of helical anchors, needs to be improved by considering the influence of parameters such as relative embedment depth and soil type. Moreover, this database can contribute to a better understanding of helical anchor behaviour and be used for: (i) calibration of uplift capacity estimation methods and of resistance factors for ULS and SLS design, (ii) development of torque–capacity correlation models, (iii) correlations between ultimate helix bearing pressure and SPT blow count; (iv) characterization of the uncertainties of design methods; (v) improvement of numerical models; etc.  相似文献   

13.
Three automotive corporations have developed and sanctioned the recently revised reference manual entitled Measurement Systems Analysis. This “standard” contains a procedure, called the “analytic method,” whose purpose is to estimate the gage bias and gage repeatability of an attribute gage. An improved estimation procedure for this standard is presented. The improved estimation procedure yields more accurate estimates than those obtained using the procedures currently presented in the standard. In addition, the improved procedure allows more flexibility in data collection than the current test protocol. A simulation study that evaluates the estimation procedure of the current standard and compares it with the improved estimation procedure is presented. Errors that are contained in the present standard are also noted.  相似文献   

14.
The majority of epidemiological studies investigating correlations between long-term low-level radiofrequency (RF) exposure from mobile phones and health endpoints have followed a case-control design, requiring reconstruction of individual RF exposure. To date, these have employed 'time of use' as an exposure surrogate from questionnaire information or billing records. The present study demonstrates such an approach may not account for variability in mobile phone transmit power, which can be roughly correlated with RF exposure. This variability exists (a) during a single call, (b) between separate calls, (c) between averaged values from individuals within a local study group and (d) between average values from groups in different geographical locations. The present data also suggest an age-related influence on talk time, as well as significant inaccuracy (45-60%) in recalling 'time of use'. Evolving technology and changing use behaviours may add additional complexities. Collectively, these data suggest efforts to identify dose response and statistical correlations between mobile phone use and subtle health endpoints may be significantly challenged.  相似文献   

15.
A number of state assessment programs that employ Rasch-based common item equating procedures estimate the equating constant with only those common items for which the two tests' Rasch item difficulty parameter estimates differ by less than 0.3 logits. The results of this study presents evidence that this practice results in an inflated probability of incorrectly dropping an item from the common item set if the number of examinees is small (e.g., 500 or less) and the reverse if the number of examinees is large (e.g., 5000 or more). An asymptotic experiment-wise error rate criterion was algebraically derived. This same criterion can also be applied to the Mantel-Haenszel statistic. Bonferroni test statistics were found to provide excellent approximations to the (asymptotically) exact test statistics.  相似文献   

16.
A comprehensive finite element (FE) analytical tool to predict the effect of defects and damage in composite structures was developed for rapid and accurate damage assessment. The structures under consideration were curved, T-stiffened, multi-rib, composite panels representative of those widely used in aerospace primary structures. The damage assessment focussed on skin-to-stiffener debonding, a common defect that can critically reduce the performance of composite structures with integral or secondary bonded stiffeners. The analytical tool was validated using experimental data obtained from the structural test of a large stiffened panel that contained an artificial skin-to-stiffener debond. Excellent agreement between FE analysis and test results was obtained. The onset of crack growth predictions also compared well with the test observation. Since the general damage tolerance philosophy in composite structures follows the “no-growth” principle, the critical parameters were established based the onset of crack growth determined using fracture mechanics calculations. Parametric studies were conducted using the analytical tool in order to understand the structural behaviour in the postbuckling range and to determine the critical parameters. Parameters considered included debond size, debond location, debond type, multiple debonds and laminate lay-up.  相似文献   

17.

While randomized controlled experiments are often considered the gold standard for predicting causal relationships between variables, they are expensive if one is interested in understanding the complete set of causal relationships governing a large set of variables and it may not be possible to manipulate certain variables due to ethical or practical constraints. To address these scenarios, procedures have been developed which use conditional independence relationships among variables when they are passively observed to predict which variables may or may not be causally related to other variables. Until recently, most of these procedures assumed that the data consisted of a single i.i.d. dataset of observations, but in practice researchers often have access to multiple similar datasets, e.g. from multiple labs studying the same problem, which measure slightly different variable sets and where recording conventions and procedures may vary. This paper discusses recent state of the art approaches for predicting causal relationships using multiple observational and experimental datasets in these contexts.

  相似文献   

18.
The best available design data for vent relief of dust deflagrations are contained in the nomagraphs presented by NFPA and VDI. In order to utilize these data, experimental measurements must be made to characterize the dust in the enclosure to be protected. In the absence of a single standard test for such measurements, various test vessels of 20 liters or greater volume are in use following the demonstration that the 1.2 liter Hartmann apparatus yields data which are incompatible with the nomograph method. A 26-liter test facility is described and the effect of test variables detailed, showing how these test variables may be standardized. The overall philosophy of vent relief design is outlined and it is shown that various approximations exist at every stage of the design process, which is a compromise rather than a “worst case” solution.  相似文献   

19.
This article presents a methodology to optimise the design of a realistic mechanical test to characterise the material elastic stiffness parameters of an orthotropic PVC foam material in one single test. Two main experimental techniques were used in this study: Digital Image Correlation (DIC) and the Virtual Fields Method (VFM). The actual image recording process was mimicked by numerically generating a series of deformed synthetic images. Subsequent to this, the entire measurement and data processing procedure was simulated by processing the synthetic images using DIC and VFM algorithms. This procedure was used to estimate the uncertainty of the measurements (systematic and random errors) by including the most significant parameters of actual experiments, e.g. the geometric test configuration, the parameters of the DIC process and the noise. By using these parameters as design variables and by defining different error functions as object functions, an optimisation study was performed to minimise the uncertainty of the material parameter identification and to select the optimal test parameters. The confidence intervals of the identified parameters were predicted based on systematic and random errors obtained from the simulations. The simulated experimental results have shown that averaging multiple images can lead to a significant reduction of the random error. An experimental determination of the elastic coefficient of a PVC foam material was conducted using the optimised test parameters obtained from the numerical study. The identified stiffness values matched well with data from previous tests, but even more interesting was the fact that the experimental uncertainty intervals matched reasonably well with the predictions of the simulations, which is a highly original result and probably the main outcome of the present paper.  相似文献   

20.
The Western European market for automatic test equipment (ATE) purchases is currently estimated to be of the order of $200 000 000 American dollars per annum. The case for installation of ATE stations in the production line and in maintenance workshops is usually primarily based on the requirement for a high rate of test capability (to match the product volume flow) or to Bpeed up the test because the time for manual test methods is too high relative to the product MTBF, or because of a shortage of test personnel of the requisite skill level. There are other benefits, and certain disadvantages, as listed in this paper.

It is the contention of the present authors that the enormous flow of test data offers an unusual opportunity to management which, if properly exploited, will lead to more efficient industrial production. The paper shows via a particular industrial application, that selective data processing can be used:

(a) to assess information content of the current test schedules, so that redundant measurements may be identified and eliminated;

(b) to reduce the number of test variables needing to be monitored;

(c) to forecast the onset of unacceptable quality levels thereby permitting corrective action to be taken;

(d) to highlight changes in quality levels, and by indicating the likely time of onset, to assist in detecting those events causing the change.

Thus the paper regards the provision of ATE as an opportunity to provide an extra highly effective feedback link within the management oontrol system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号