期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Test scores,measurement, and the use of analysis of variance: an historical overview

Romanoski J Douglas G 《Journal of applied measurement》2002,3(3):232-242

In order to establish a firmer statistical foundation from which to draw inferences from factorial design study data, transformations of raw scores are occasionally employed in order to make their distributions more generally normal or to provide linearity. To date, few studies have been conducted to determine whether or not raw scores transformed or otherwise constitute measures for the purposes of statistical analysis. In this article, the historical development of the understanding of the term "measurement" by researchers in the social sciences is traced, and the development and use of One and Two-way ANOVA by researchers in the social sciences are presented and evaluated. 相似文献

2.

A Rasch analysis of three of the Wisconsin Scales of Psychosis Proneness: measurement of schizotypy

Graves RE Weinstein S 《Journal of applied measurement》2004,5(2):160-171

Rasch analyses were conducted with data from 90 university students on three of the Wisconsin Scales of Psychosis Proneness--the Magical Ideation (Eckblad & Chapman, 1983), Perceptual Aberration (Chapman, Chapman, & Raulin, 1978), and Revised Social Anhedonia Scales (Eckblad, Chapman, Chapman, & Mishlove, 1982). All of the items for each of the individual scales, plus all of the items from the combined Perceptual Aberration/Magical Ideation (Per-Mag) Scale, showed satisfactory fit to the Rasch model. These results show that personality traits including these psychosis proneness, or schizotypy, traits can be measured on a theoretically sound quantitative interval scale. Rasch scale equivalents for raw scores are provided. Possible improvements to the Magical Ideation, Perceptual Aberration, and Per-Mag scales are suggested by the item analysis. Advantages of Rasch scaling for clinical applications include detection of invalid test protocols, more meaningful interpretations of test scores, and direct comparison of scores from different tests of the same construct. 相似文献

3.

Level of Activity in Profound/Severe Mental Retardation (LAPMER): a Rasch-derived scale of disability

Tesio L Valsecchi MR Sala M Guzzon P Battaglia MA 《Journal of applied measurement》2002,3(1):50-84

相似文献

4.

Comparing traditional and Rasch analyses of the Mississippi PTSD Scale: revealing limitations of reverse-scored items

Conrad KJ Wright BD McKnight P McFall M Fontana A Rosenheck R 《Journal of applied measurement》2004,5(1):15-30

This study examined whether Rasch analysis could provide more information than true score theory (TST) in determining the usefulness of reverse-scored items in the Mississippi Scale for Posttraumatic Stress Disorder (M-PTSD). Subjects were 803 individuals in inpatient PTSD units at 10 VA sites. TST indicated that the M-PTSD performed well and could be improved slightly by deleting one item. Factor analysis using raw scores indicated that the reverse-scored items formed the second factor and had poor relationships with normally scored items. However, since item-total correlations supported their usefulness, they were kept. The subsequent Rasch analysis indicated that five of the seven worst fitting items were reverse-scored items. We concluded that using reversed items with disturbed patients can cause confusion that reduces reliability. Deleting them improved validity without loss of reliability. The study supports the use of Rasch analysis over TST in health research since it indicated ways to reduce respondent burden while maintaining reliability and improving validity. 相似文献

5.

The influence of equating methodology on reported trends in PISA

Gebhardt E Adams RJ 《Journal of applied measurement》2007,8(3):305-322

In 2005 PISA published trend indicators that compared the results of PISA 2000 and PISA 2003. In this paper we explore the extent to which the outcomes of these trend analyses are sensitive to the choice of test equating methodologies, the choice of regression models and the choice of linking items. To establish trends PISA equated its 2000 and 2003 tests using a methodology based on Rasch Modelling that involved estimating linear transformations that mapped 2003 Rasch-scaled scores to the previously established PISA 2000 Rasch-scaled scores. In this paper we compare the outcomes of this approach with an alternative, which involves the joint Rasch scaling of the PISA 2000 and PISA 2003 data separately for each country. Note that under this approach the item parameters are estimated separately for each country, whereas the linear transformation approach used a common set of item parameter estimates for all countries. Further, as its primary trend indicators, PISA reported changes in mean scores between 2000 and 2003. These means are not adjusted for changes in the background characteristics of the PISA 2000 and PISA 2003 samples - that is, they are marginal rather than conditional means. The use of conditional rather than marginal means results in some differing conclusions regarding trends at both the country and within-country level. 相似文献

6.

Measurement uncertainty in the determination of total petroleum hydrocarbons (TPH) in soil by GC-FID

Eija Saari Paavo Permki Jorma Jalonen 《Chemometrics and Intelligent Laboratory Systems》2008,92(1):3-12

In this investigation two methods were used for estimating the measurement uncertainty due to sampling and analysis of petroleum hydrocarbon contaminated soil. Analysis of variance (ANOVA) was used for type A evaluation of the measurement uncertainty. The results showed that the statistical evaluation of measurement uncertainty can be complicated by the log-normality and heteroscedasticity of the data. Although mathematical transformation of raw data is widely suggested for overcoming the discrepancy between data and ANOVA assumptions, its use results in problems with the interpretation of the ANOVA results at the original scale.

The measurement uncertainty was also estimated from the calculated precision equations for sampling and analysis. Comparison of measurement uncertainty values with the equivalent values obtained with ANOVA revealed that ANOVA overestimates the expanded uncertainty at both low and high TPH concentrations. Consequently, correct selection of the statistical analysis method needs comprehensive knowledge of the assumptions and limitations of statistical methods and careful consideration of the special characteristics (distribution, constancy of measurement variance) of the raw data as these may affect the validity of the estimated uncertainty. The expanded uncertainty obtained in this study for the results of TPH determinations with linear measurement precision modelling was moderate, ranging from 21% at a TPH concentration of 895 mg/kg to 9% at a TPH concentration of 10 019 mg/kg. If a single sample taken in a survey is analyzed only once, then the analytical variance contributes the most to the measurement variance, ranging from 68– 80% at a TPH concentration of 100–10 000 mg/kg. 相似文献

7.

An analysis of dimensionality using factor analysis (true-score theory) and Rasch measurement: what is the difference? Which method is better?

Waugh RF Chapman ES 《Journal of applied measurement》2005,6(1):80-99

One often hears the question asked, "For questionnaire data measuring a variable, what difference does it make to use factor analysis/principal components analysis (true-score theory) or Rasch measurement in testing for dimensionality?" This paper reports both factor analysis and Rasch measurement analysis for two sets of data. One set of data measures social anxiety for primary school students (N=436, I=10) and the second measures attitude to mathematics for primary-aged students (N=774, I=10). For both sets of data, the factor analysis suggests that the scores are reliable, and that inferences can be made that are valid for measuring school anxiety and attitude to mathematics. For both sets of data analyzed with Rasch measurement techniques, the reliability of the measures, the dimensionality of the measures, and the initial conceptualisation of the items, are called into question. It suggests that one cannot make valid inferences from the measures that were initially set up for true-score theory. The Rasch analysis suggests that items intended to measure a variable should be initially developed on a conceptualized scale from easy to hard, and that students should answer the items from this perspective, so that the Rasch analysis of the data tests this conceptualisation, and a linear scale can be created based on a mathematical measurement model with consistent units (logits). 相似文献

8.

Pre-equating: a simulation study based on a large scale assessment model

Taherbhai HM Young MJ 《Journal of applied measurement》2004,5(3):301-318

Although post-equating (PE) has proven to be an acceptable method in the scaling and equating of items and forms, there are times when the turn-around period for equating and converting raw scores to scale scores is so small that PE cannot be undertaken within the prescribed time frame. In such cases, pre-equating (PrE) could be considered as an acceptable alternative. Assessing the feasibility of using item calibrations from the item bank (as in PrE) is conditioned on the equivalency of the calibrations and the errors associated with it vis a vis the results obtained via PE. This paper creates item banks over three periods of item introduction into the banks and uses the Rasch model in examining data with respect to the recovery of item parameters, the measurement error, and the effect cut-points have on examinee placement in both the PrE and PE situations. Results indicate that PrE is a viable solution to PE provided the stability of the item calibrations are enhanced by using large sample sizes (perhaps as large as full-population) in populating the item bank. 相似文献

9.

The recovery of the density scale using a stochastic quasi-realization of additive conjoint measurement

Pelton TW Bunderson CV 《Journal of applied measurement》2003,4(3):269-281

This paper attempts to illuminate some of the practical limitations that the Rasch model (and by extension, Item Response Theory models) may have by focusing on the recovery of the density scale. Five simulation trials were conducted: the first four to recover the density scale with different deviations from the assumptions implicit in the use of the Rasch model and the fifth trial with an almost ideal data set. Results demonstrate that when error distributions are insufficient the results may be ordinal at best, and when error distributions are non-symmetrical, the positions of items may be biased with respect to the positions of persons. Results also confirm that errors of estimation, and test and sample information functions are sample dependent. 相似文献

10.

Some links between classical and modern test theory via the two-level hierarchical generalized linear model

Miyazaki Y 《Journal of applied measurement》2005,6(3):289-310

This article considers some links between classical test theory (CTT) and modern test theory (MTT) such as item response theory (IRT) and the Rasch model in the context of the two-level hierarchical generalized linear model (HGLM). Conceptualizing items as nested within subjects, both the CTT model and the MTT model can be reformulated as an HGLM where item difficulty parameters are represented by fixed effects and subjects' abilities are represented by random effects. In this HGLM framework, the CTT and MTT models differ only in the level 1 sampling model and the associated link function. This article also contrasts the Rasch and two-parameter IRT models by considering the property of specific objectivity in the context of CTT. It is found that the essentially tau-equivalent model exhibits specific objectivity if the data fit the model, but the congeneric measures model does not. Data from English composition scores on essay writing used by J?reskog (1971) are reanalyzed for illustration. 相似文献

11.

Rasch modeling of the structure of health risk behavior in South african adolescents

Mpofu E Caldwell L Smith E Flisher AJ Mathews C Wegner L Vergnani T 《Journal of applied measurement》2006,7(3):323-334

The study used Rasch analysis to investigate the presence of a syndrome of health risk behavior in South African adolescents. A total of 2186 in-school adolescents participated in the study (males = 1077; females = 1119; age range = 12-16 years; median = 13 years). The data are baseline from a longitudinal study of a leisure-based drug abuse and HIV/AIDS prevention program at Mitchell's Plain in Cape Town, South Africa. The adolescents completed a self-report measure on various health risk vulnerabilities, including use of alcohol, tobacco and other drugs (ATOD), co-occurrence of penetrative sex with use of ATOD, health related self-efficacy, personal beliefs about health, peer perceptions, and use of contraceptives. The Rasch analysis calibrated data on 50 items from the aforesaid conceptually distinct health risk domains. Infit and Outfit mean square statistics and principal components analysis of the standardized residuals suggested a fit of the data to the unidimensional Rasch measurement model. The findings support a syndrome view of health risk in teenagers as proposed by problem behavior theory. 相似文献

12.

Rasch fit statistics as a test of the invariance of item parameter estimates

Smith RM Suh KK 《Journal of applied measurement》2003,4(2):153-163

The invariance of the estimated parameters across variation in the incidental parameters of a sample is one of the most important properties of Rasch measurement models. This is the property that allows the equating of test forms and the use of computer adaptive testing. It necessarily follows that in Rasch models if the data fit the model, than the estimation of the parameter of interest must be invariant across sub-samples of the items or persons. This study investigates the degree to which the INFIT and OUTFIT item fit statistics in WINSTEPS detect violations of the invariance property of Rasch measurement models. The test in this study is a 80 item multiple-choice test used to assess mathematics competency. The WINSTEPS analysis of the dichotomous results, based on a sample of 2000 from a very large number of students who took the exam, indicated that only 7 of the 80 items misfit using the 1.3 mean square criteria advocated by Linacre and Wright. Subsequent calibration of separate samples of 1,000 students from the upper and lower third of the person raw score distribution, followed by a t-test comparison of the item calibrations, indicated that the item difficulties for 60 of the 80 items were more than 2 standard errors apart. The separate calibration t-values ranged from +21.00 to -7.00 with the t-test value of 41 of the 80 comparisons either larger than +5 or smaller than -5. Clearly these data do not exhibit the invariance of the item parameters expected if the data fit the model. Yet the INFIT and OUTFIT mean squares are completely insensitive to the lack of invariance in the item parameters. If the OUTFIT ZSTD from WINSTEPS was used with a critical value of | t | > 2.0, then 56 of the 60 items identified by the separate calibration t-test would be identified as misfitting. A fourth measure of misfit, the between ability-group item fit statistic identified 69 items as misfitting when a critical value of t > 2.0 was used. Clearly relying solely on the INFIT and OUTFIT mean squares in WINSETPS to assess the fit of the data to the model would cause one to miss one of the most important threats to the usefulness of the measurement model. 相似文献

13.

Modelling Insurance Losses with a New Family of Heavy-Tailed Distributions

Muhammad Arif Dost Muhammad Khan Saima Khan Khosa Muhammad Aamir Adnan Aslam Zubair Ahmad Wei Gao 《计算机、材料和连续体（英文）》2021,66(1):537-551

The actuaries always look for heavy-tailed distributions to model data relevant to business and actuarial risk issues. In this article, we introduce a new class of heavy-tailed distributions useful for modeling data in financial sciences. A specific sub-model form of our suggested family, named as a new extended heavy-tailed Weibull distribution is examined in detail. Some basic characterizations, including quantile function and raw moments have been derived. The estimates of the unknown parameters of the new model are obtained via the maximum likelihood estimation method. To judge the performance of the maximum likelihood estimators, a simulation analysis is performed in detail. Furthermore, some important actuarial measures such as value at risk and tail value at risk are also computed. A simulation study based on these actuarial measures is conducted to exhibit empirically that the proposed model is heavy-tailed. The usefulness of the proposed family is illustrated by means of an application to a heavy-tailed insurance loss data set. The practical application shows that the proposed model is more flexible and efficient than the other six competing models including (i) the two-parameter models Weibull, Lomax and Burr-XII distributions (ii) the three-parameter distributions Marshall-Olkin Weibull and exponentiated Weibull distributions, and (iii) a well-known four-parameter Kumaraswamy Weibull distribution. 相似文献

14.

Construct validity of scores/measures from a developmental assessment in mathematics using classical and many-facet Rasch measurement

Banerji M 《Journal of applied measurement》2000,1(2):177-198

相似文献

15.

Using Rasch scaled stage scores to validate orders of hierarchical complexity of balance beam task sequences

Commons ML Goodheart EA Pekker A Dawson TL Draney K Adams KM 《Journal of applied measurement》2008,9(2):182-199

These studies examine the relationship between the analytic basis underlying the hierarchies produced by the Model of Hierarchical Complexity and the probabilistic Rasch scales that places both participants and problems along a single hierarchically ordered dimension. A Rasch analysis was performed on data from the balance-beam task series. This yielded scaled stage of performance for each of the items. The items formed a series of clusters along this same dimension, according to their order of hierarchical complexity. We sought to ascertain whether there was a significant relationship between the order of hierarchical complexity (a task property variable) of the tasks and the corresponding Rasch scaled difficulty of those same items (a performance variable). It was found that The Model of Hierarchical Complexity was highly accurate in predicting the Rasch Stage scores of the performed tasks, therefore providing an analytic and developmental basis for the Rasch scaled stages. 相似文献

16.

Using the Mixed Rasch Model to analyze data from the beliefs and attitudes about memory survey

Smith EV Ying Y Brown SW 《Journal of applied measurement》2012,13(1):23-40

In this study, we used the Mixed Rasch Model (MRM) to analyze data from the Beliefs and Attitudes About Memory Survey (BAMS; Brown, Garry, Silver, and Loftus, 1997). We used the original 5-point BAMS data to investigate the functioning of the "Neutral" category via threshold analysis under a 2-class MRM solution. The "Neutral" category was identified as not eliciting the model expected responses and observations in the "Neutral" category were subsequently treated as missing data. For the BAMS data without the "Neutral" category, exploratory MRM analyses specifying up to 5 latent classes were conducted to evaluate data-model fit using the consistent Akaike information criterion (CAIC). For each of three BAMS subscales, a two latent class solution was identified as fitting the mixed Rasch rating scale model the best. Results regarding threshold analysis, person parameters, and item fit based on the final models are presented and discussed as well as the implications of this study. 相似文献

17.

A confirmatory study of Rasch-based optimal categorization of a rating scale

Zhu W 《Journal of applied measurement》2002,3(1):1-15

The purpose of this study was to determine if the characteristics of the optimal categorization identified by the Rasch analysis in a previous study can be maintained when the revised scale is applied to the same population. Based on the results of the previous Rasch analysis, a 23-item exercise barrier scale was modified from its original five-category structure ("Very often" = 1, "Often" = 2, "Sometimes" = 3, "Rarely" = 4, and "Never" = 5) to a three-category structure ("Very Often" = 1, "Sometimes" = 2, and "Never" = 3). The modified scale was then mailed to the original sample (N = 381), of which 206 returned the survey; a return rate 57.5%. The data was again analyzed using the Rasch Rating Scale model. Overall, the Rasch model fit data well and similar change patterns were observed in two category statistics provided by the Rasch analysis. The order of item severity was also well kept and the correlation of item severities generated from two studies was very high, with r = .98. In addition, similar results were also found in respondents' ability estimations, and the correlation between the two studies was moderately high, with r = .68. These results verified that the characteristics of the optimal categorization identified by the Rasch post-hoc analysis can be maintained after the original scale was modified based on such an analysis. 相似文献

18.

Is the partial credit model a Rasch model?

Massof RW 《Journal of applied measurement》2012,13(2):114-131

相似文献

19.

Validation of scores from self-learning scales for primary students using true-score and Rasch measurement methods

Mok MM 《Journal of applied measurement》2004,5(3):258-286

The validation of scores from the Self-learning Scales for primary pupils is presented in this study. The sample for the study comprised 1253 pupils from 20 Year-3 and 20 Year-5 classes from ten primary schools in Hong Kong. The 10-item Usefulness Scale is designed to measure primary pupils' attitudes toward the usefulness of self-learning strategies situated in ten learning contexts. The 10-item Deployment Scale is designed to measure pupils' frequency in using the self-learning strategies. Both scales use 3-point Likert response scale. Construct validity of scores from the scales for use with primary pupils is supported by confirmatory factor analysis and Rasch measurement. Gender and year level differences were identified on the Rasch person measures. Generalizability of the scores from the two scales across gender and year level needs to be undertaken with caution. 相似文献

20.

Mapping spatiotemporal molecular distributions using a microfluidic array

Lynn NS Tobet S Henry CS Dandy DS 《Analytical chemistry》2012,84(3):1360-1366

The spatial and temporal distributions of an extensive number of diffusible molecules drive a variety of complex functions. These molecular distributions often possess length scales on the order of a millimeter or less; therefore, microfluidic devices have become a powerful tool to study the effects of these molecular distributions in both chemical and biological systems. Although there exist a number of studies utilizing microdevices for the creation of molecular gradients, there are few, if any, studies focusing on the measurement of spatial and temporal distributions of molecular species created within the study system itself. Here we present a microfluidic device capable of sampling multiple chemical messengers in a spatiotemporally resolved manner. This device operates through spatial segregation of nanoliter-sized volumes of liquid from a primary sample reservoir into a series of analysis microchannels, where fluid pumping is accomplished via a system of passive microfluidic pumps. Subsequent chemical analysis within each microchannel, achieved via optical or bioanalytical methods, yields quantitative data on the spatial and temporal information for any analytes of interest existing within the sample reservoir. These techniques provide a simple, cost-effective route to measure the spatiotemporal distributions of molecular analytes, where the system can be tailored to study both chemical and biological systems. 相似文献