首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
This article contains information on the Rasch measurement partial credit model: what it is, how it differs from other Rasch models, when to use it, and how to use it. The calibration of instruments with increasingly complex items is described, starting with dichotomous items and moving on to polychotomous items using a single rating scale, and mixed polychotomous items using multiple rating scales, and instruments in which each item has its own rating scale. It also introduces a procedure for aligning rating scale categories to be used when more than one rating scale is used in a single instrument. Pivot anchoring is defined and an illustration of its use with the mental health scale of the SF-36 that contains positive and negative worded items is provided. It finally describes the effect of pivot anchoring on step calibrations, the item hierarchy, and person measures.  相似文献   

2.
A 28-item inventory was developed to measure the clinical problem-solving abilities of 3rd and 4th year dental students. The judgments of 57 expert raters (dental-school faculty) from four dental schools used the inventory to evaluate 183 dental students on a 5-point rating scale. The Rasch measurement model was employed to examine the psychometric properties and construct validity of this inventory. In this study, fit statistics identified the "noise" in the data and residual analysis assisted in extracting a meaningful structure. The study results indicate that the Rasch measurement model appeared to be a useful method for use in producing a unidimensional instrument. All five rating categories were used in a coherent manner, and four discernable levels of clinical problem-solving ability were identified. After removal of four repetitious items, a version of the Clinical Problem-Solving Inventory was finalized that could serve as a criterion measure for validating the use of a critical thinking test on the Dental Admission Test.  相似文献   

3.
The objective of this article is to illustrate incremental item banking using health-related quality of life data collected from two samples of patients receiving cancer treatment. The kinds of decisions one faces in establishing an item bank for computerized adaptive testing are also illustrated. Pre-calibration procedures include: identifying common items across databases; creating a new database with data from each pool; reverse-scoring "negative" items; identifying rating scales used in items; identifying pivot points in each rating scale; pivot anchoring items at comparable rating scale categories; and identifying items in each instrument that measure the construct of interest. A series of calibrations were conducted in which a small proportion of new items were added to the common core and misfitting items were identified and deleted until an initial item bank has been developed.  相似文献   

4.
Rating scales are used to elicit data about qualitative entities (e.g., research collaboration). This study presents an innovative method for reducing the number of rating scale items without the predictability loss. The “area under the receiver operator curve method” (AUC ROC) is used. The presented method has reduced the number of rating scale items (variables) to 28.57% (from 21 to 6) making over 70% of collected data unnecessary. Results have been verified by two methods of analysis: Graded Response Model (GRM) and Confirmatory Factor Analysis (CFA). GRM revealed that the new method differentiates observations of high and middle scores. CFA proved that the reliability of the rating scale has not deteriorated by the scale item reduction. Both statistical analysis evidenced usefulness of the AUC ROC reduction method.  相似文献   

5.
Occupational therapists do not have a comprehensive, objective method for measuring how persons with tetraplegia perform activities of daily living (ADL) in their homes and communities, because SCI ADL performance is usually determined in rehabilitation. The ADL Habits Survey (ADLHS) is designed specifically to address this knowledge gap by surveying performance on relevant and meaningful activities in homes and communities. After a comprehensive task analysis and pilot development, 30 activities were selected that emphasize a broad range of hand and wrist, reaching, and grasping movements in compound activities. A sample of 49 persons with cervical spinal cord injuries responded to items. The sample was predominantly male, median age was 41 years, and ASIA motor classification levels ranged from C2 through C8/T1 with majority concentration in C4, C5, or C6 (68%). Each participant report was rated by an occupational therapist using a seven category rating scale, and the item by participant response matrix (30 X 49) was analyzed with a Rasch model for rating scales. Results showed excellent participant separation (>4) and very high reliability (>.95), and both item and participant fit values were adequate (STANDARDIZED INFIT less than absolute value of 3). With only two exceptions, all participants fit the Rasch rating scale model, and only one item "Light housekeeping" presented significant fit issues. Principal Components Analysis an analysis of item residuals did not reveal serious threats to unidimensionality. A between group fit comparison of participants with more versus less movement found invariant item calibrations, and ANOVA of participant measures found statistically significant differences across ASIA motor classification levels. These ADLHS results offer occupational therapists a new method for measuring ADL that is potentially more sensitive to functional changes in tetraplegia than most instruments in common use. Accommodation of step disorder with a three category rating scale did not diminish measurement properties.  相似文献   

6.
Historically, job analysis has played a fundamental role for developing and validating licensure and certification examinations. Still, research on what constitutes reliable and valid job analysis data is lacking. This paper illustrates several ways to examine the reliability and validity of job analysis survey results. Generalizability theory and the many-facets Rasch model are applied to investigate consistency and generalizability in task importance measures, to suggest reliable sample size, and to justify the number and use of rating scales. By using random samples from job analysis data for two professions with divergent job activities, this study finds that a representative sample as small as 400 respondents produces reliable estimates of task importance to the same degree of generalizability as obtained from a larger sample of job analysis respondents. Analyses of rating scales suggest that the effectiveness of using different numbers and types of rating scales depends on the nature of a profession.  相似文献   

7.
The purpose of this research is twofold. First is to extend the work of Smith (1992, 1996) and Smith and Miao (1991, 1994) in comparing item fit statistics and principal component analysis as tools for assessing the unidimensionality requirement of Rasch models. Second is to demonstrate methods to explore how violations of the unidimensionality requirement influence person measurement. For the first study, rating scale data were simulated to represent varying degrees of multidimensionality and the proportion of items contributing to each component. The second study used responses to a 24 item Attention Deficit Hyperactivity Disorder scale obtained from 317 college undergraduates. The simulation study reveals both an iterative item fit approach and principal component analysis of standardized residuals are effective in detecting items simulated to contribute to multidimensionality. The methods presented in Study 2 demonstrate the potential impact of multidimensionality on norm and criterion-reference person measure interpretations. The results provide researchers with quantitative information to help assist with the qualitative judgment as to whether the impact of multidimensionality is severe enough to warrant removing items from the analysis.  相似文献   

8.
Measurement methods for product evaluation   总被引:1,自引:1,他引:0  
Among the many tasks designers must perform, evaluation of product options based on performance criteria is fundamental. Yet I have found that the methods commonly used remain controversial and uncertain among those who apply them. In this paper, I apply mathematical measurement theory to analyze and clarify common design methods. The methods can be analyzed to determine the level of information required and the quality of the answer provided. Most simple, a method using an ordinal scale only arranges options based on a performance objective. More complex, an interval scale also indicates the difference in performance provided. To construct an interval scale, the designer must provide two basic a priori items of information. First, a base-pointdesign is required from which the remaining designs are relatively measured. Second, the deviation of each remaining design is compared from the base point design using a metricdatum design. Given these two datums, any other design can be evaluated numerically. I show that concept selection charts operate with interval scales. After an interval scale, the next more complex scale is a ratio scale, where the objective has a well-defined zero value. I show that QFD methods operate with ratio scales. Of all measurement scales, the most complex are extensively measurable scales. Extensively measurable scales have a well defined base value, metric value and a concatenation operation for adding values. I show that standard optimization methods operate with extensively measurable scales. Finally, it is also possible to make evaluations with non-numeric scales. These may be more convenient, but are no more general.  相似文献   

9.
In this study, we used the Mixed Rasch Model (MRM) to analyze data from the Beliefs and Attitudes About Memory Survey (BAMS; Brown, Garry, Silver, and Loftus, 1997). We used the original 5-point BAMS data to investigate the functioning of the "Neutral" category via threshold analysis under a 2-class MRM solution. The "Neutral" category was identified as not eliciting the model expected responses and observations in the "Neutral" category were subsequently treated as missing data. For the BAMS data without the "Neutral" category, exploratory MRM analyses specifying up to 5 latent classes were conducted to evaluate data-model fit using the consistent Akaike information criterion (CAIC). For each of three BAMS subscales, a two latent class solution was identified as fitting the mixed Rasch rating scale model the best. Results regarding threshold analysis, person parameters, and item fit based on the final models are presented and discussed as well as the implications of this study.  相似文献   

10.
结构物理参数识别的多尺度参数卡尔曼滤波方法   总被引:1,自引:0,他引:1  
经过正交小波变换后,低尺度上测量信号的信噪比提高。应用小波变换将结构的激励信号和响应信号分解到不同尺度上,得到不同尺度上结构的状态方程和测量方程,结合动力学系统辨识的参数卡尔曼滤波方法,提出了结构物理参数的多尺度参数卡尔曼滤波辨识方法。理论分析和数值算例表明:在多尺度上对结构参数进行辨识比在单一尺度上辨识能获得更高的精度。  相似文献   

11.
This paper describes the development and validation of a democratic learning style scale intended to fill a gap in Sternberg's theory of mental self-government and the associated learning style inventory (Sternberg, 1988, 1997). The scale was constructed as an 8-item scale with a 7-category response scale. The scale was developed following an adapted version of DeVellis' (2003) guidelines for scale development. The validity of the Democratic Learning Style Scale was assessed by items analysis using graphical loglinear Rasch models (Kreiner and Christensen, 2002, 2004, 2006) The item analysis confirmed that the full 8-item revised Democratic Learning Style Scale fitted a graphical loglinear Rasch model with no differential item functioning but weak to moderate uniform local dependence between two items. In addition, a reduced 6-item version of the scale fitted the pure Rasch model with a rating scale parameterization. The revised Democratic Learning Style Scale can therefore be regarded as a sound measurement scale meeting requirements of both construct validity and objectivity.  相似文献   

12.
The theory of measurement, scales is used to show that there is no foundation for attempting to extend the SI system to measurements of quantities and properties which are described by ordering and naming scales or by absolute scales. It is proposed that the units of planar and solid angles should be considered to be outside the system. Dimensionless quantities are conditionally classified. An analysis is made of specified order scales in which the concept of a “unit of measurement” is not applicable and for which it makes no sense to attribute dimensionality to the numbers and scale points used in them. Translated from Izmeritel'naya Tekhnika, No. 9, pp. 3–10, September, 1999.  相似文献   

13.
This study examines validity of data generated by the School Readiness for Reforms: Leader Questionnaire (SRR-LQ) using an iterative procedure that combines classical and Rasch rating scale analysis. Following content-validation and pilot-testing, principal axis factor extraction and promax rotation of factors yielded a five factor structure consistent with the content-validated subscales of the original instrument. Factors were identified based on inspection of pattern and structure coefficients. The rotated factor pattern, inter-factor correlations, convergent validity coefficients, and Cronbach's alpha reliability estimates supported the hypothesized construct properties. To further examine unidimensionality and efficacy of the rating scale structures, item-level data from each factor-defined subscale were subjected to analysis with the Rasch rating scale model. Data-to-model fit statistics and separation reliability for items and persons met acceptable criteria. Rating scale results suggested consistency of expected and observed step difficulties in rating categories, and correspondence of step calibrations with increases in the underlying variables. The combined approach yielded more comprehensive diagnostic information on the quality of the five SRR-LQ subscales; further research is continuing.  相似文献   

14.
Teachers' knowledge is usually categorised into subject matter (SMK) and pedagogical content knowledge (PCK). Previously, measurement instruments and consequent cognitive scales have been developed to assess students' and teachers' subject knowledge. A number of qualitative studies have explored teachers' pedagogical content knowledge. This study developed a means to investigate one aspect of PCK--teachers' awareness of their students' knowledge--using a combination of measurement and qualitative interpretation. We asked teachers to estimate on a Likert scale (and also describe qualitatively) the difficulty their pupils would have with test items which we had already scaled using data from their pupils. We then constructed, using various models, a "Teacher's collective Perception of Item Difficulty" (TPID) scale and contrasted this with the student's ability scale by comparing the two sets of item-difficulty parameters. The results were triangulated with qualitative data. We suggest the methodology is best supported by an Inverse Partial Credit Model (IPCM) but we compare the results across alternative Rasch models.  相似文献   

15.
A large number of papers and technical reports are published every year describing researches where Rasch models are used. It has been observed, however, that not all the authors describe the application of the Rasch measurement with the same thoroughness. Some authors may leave behind important bits of information e.g. they may fail to investigate the person or item fit or may even fail to discuss the reliability of measurement. As a result, editorial guidelines have been published in order to suggest an informal minimum of thoroughness with which the authors may describe the application of Rasch measurement in their papers. This study presents stages for the development of a scale to investigate the comprehensiveness with which individual papers describe the application of Rasch models in practical settings. The scale is used to evaluate how comprehensively the papers published by the Journal of Applied Measurement present the application of Rasch models.  相似文献   

16.
This paper revisits a half-century long theoretical controversy associated with the use of magnitude estimation scaling (MES) and category rating scaling (CRS) procedures in measurement. The MES procedure in this study involved instructing participants to write a number that matched their impression of difficulty of a test item. Participants were not restricted in the range of numbers they could choose for their scale. They also had the choice of disclosing their individual scale. After the MES task was completed, participants were given a blank copy of the test to rate the perceived difficulty of each item using a researcher-imposed categorical rating scale from 1 (very easy) to 6 (very difficult). The MES and CRS data were both analyzed using Rasch Rating scale model. Additionally, the MES data were examined with Rasch Partial Credit model. Results indicate that knowing each person's scale is associated with smaller errors of measurement.  相似文献   

17.
A sequential modelling approach consisting of passing information across length scales is presented to simulate macroscopic behavior of composite materials. The modeling procedure utilizes a proper flow of information from molecular scale to macroscopic scale including material characteristics at different length scales. Both molecular dynamics and analytical/numerical methods were used in the multiscale analysis together with some experimental observations obtained from Raman microspectroscopy and X-ray microtomography. The multiscale procedure is systematically applied to short glass fibre polypropylene composite material.  相似文献   

18.
The rating scale model (Andrich, 1978) was applied to data from a survey that directed students to rate their satisfaction with college services on a five point Likert scale. Because students used different services, and students were directed to rate only the services they used, the items were differentially exposed to a person factor that we call "pleasability." Differential exposure to pleasability makes items' average rating a biased measure of their performance. In contrast, item parameter estimates in the rating scale model corrected for differential exposure to pleasability. Compared to items' average ratings, item parameter estimates in the rating scale model did a better job of predicting which item received the higher rating when any two items were rated by the same rater.  相似文献   

19.
The paper presents a general process that utilizes wavelet analysis in order to link information on material properties at several scales. In the particular application addressed analytically and numerically, multiscale porosity is the source of material structure or heterogeneity, and the wavelet-based analysis of multiscale information shows clearly its role on properties such as resistance to mechanical failure. Furthermore, through the statistical properties of the heterogeneity at a hierarchy of scales, the process clearly identifies a dominant scale or range of scales. Special attention is paid to porosity appearing at two distinct scales far apart from each other since this demonstrates the process in a lucid fashion. Finally, the paper suggests ways to extend the process to general multiscale phenomena, including time scaling.  相似文献   

20.
Fisher's information function is reviewed with respect to an example he used for explication. A contemporary example continues the discussion with application to a rating scale instrument. The relationship of information to precision and measurement error is presented and discussed with respect to the analysis of fit. Targeting the instrument and the best test design for measuring a person with respect to information and item-person fit is discussed. The idealization of information and precision for making measures appears most effectively realized when computerized assisted testing can be employed to implement a best test design.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号