首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Data from general population samples in 11 countries (n = 1483 to 9151) were used to assess data quality and test the assumptions underlying the construction and scoring of multi-item scales from the SF-36 Health Survey. Across all countries, the rate of item-level missing data generally was low, although slightly higher for items printed in the grid format. In each country, item means generally were clustered as hypothesized within scales. Correlations between items and hypothesized scales were greater than 0.40 with one exception, supporting item internal consistency. Items generally correlated significantly higher with their own scale than with competing scales, supporting item discriminant validity. Scales could be constructed for 93-100% of respondents. Internal consistency reliability of the eight SF-36 scales was above 0.70 for all scales, with two exceptions. Floor effects were low for all except the two role functioning scales; ceiling effects were high for both role functioning scales and also were noteworthy for the Physical Functioning, Bodily Pain, and Social Functioning scales in some countries. These results support the construction and scoring of the SF-36 translations in these 11 countries using the method of summated ratings.  相似文献   

2.
The following study proposes a Rasch method to measure variables of nonadditive conjoint structures, where dichotomous response combinations are evaluated. In this framework, both the number of endorsed items and their latent positions are considered. This is different from the cumulative response process (measurable by the Rasch model), where the probability of a positive response to an item with measure delta iota is considered a monotonic increasing function of the person's measure beta nu. This is also unlike the unfolding framework, where the probability of a positive response is maximum when beta nu = delta iota, and monotonically decreases as magnitude of beta nu-delta iota approaches infinity. The method involves four steps. In Step 1, items are scaled by the Rasch model for paired comparisons to produce a variable definition. These scale values serve as a basis for Steps 2 and 4. In Step 2, the nonadditive conjoint system is restructured to additive. The quantitative hypothesis of the restructured data is tested by the axioms of conjoint measurement theory in Step 3. This data is then analyzed by the Rasch rating scale model in Step 4 to evaluate individual response combinations, using the Step 1 item calibrations as anchors. The method was applied to simulated person responses of the Schedule of Recent Events (Holmes and Rahe, 1967). The results suggest that the method is useful and effective. It scales items with a robust method of paired comparisons, ensures additivity and quantification of the conjoint person-item matrix, produces a reasonable ordering of person measures from the perspective of individual response combinations, and provides satisfactory person and item separation (i.e., reliability). Furthermore, the restructured data reproduces SRE item scale values obtained by paired comparisons in Step 1.  相似文献   

3.
Several methods of scoring an interest inventory so as to maximize the separation of workers in an occupation from workers in general were applied to samples of electricians (compared with civilian workers) and aviation machinists' mates (compared with Navy men-in-general.) Criteria of a good key were (a) its ability to separate groups (per cent overlap), and (b) its test-retest reliability. It was found that (1) using unit item weights an optimum number of items can be found for scoring, (2) units weights with an optimum number of items yielded more discriminating keys than Strong scoring weights, (3) selecting items by a method designed to increase item heterogeneity, the validity of the key is increased but test-retest reliability is somewhat decreased. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

4.
This article urges counseling psychology researchers to recognize and report how missing data are handled, because consumers of research cannot accurately interpret findings without knowing the amount and pattern of missing data or the strategies that were used to handle those data. Patterns of missing data are reviewed, and some of the common strategies for dealing with them are described. The authors provide an illustration in which data were simulated and evaluate 3 methods of handling missing data: mean substitution, multiple imputation, and full information maximum likelihood. Results suggest that mean substitution is a poor method for handling missing data, whereas both multiple imputation and full information maximum likelihood are recommended alternatives to this approach. The authors suggest that researchers fully consider and report the amount and pattern of missing data and the strategy for handling those data in counseling psychology research and that editors advise researchers of this expectation. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

5.
A 2-step approach for obtaining internal consistency reliability estimates with item-level missing data is outlined. In the 1st step, a covariance matrix and mean vector are obtained using the expectation maximization (EM) algorithm. In the 2nd step, reliability analyses are carried out in the usual fashion using the EM covariance matrix as input. A Monte Carlo simulation examined the impact of 6 variables (scale length, response categories, item correlations, sample size, missing data, and missing data technique) on 3 different outcomes: estimation bias, mean errors, and confidence interval coverage. The 2-step approach using EM consistently yielded the most accurate reliability estimates and produced coverage rates close to the advertised 95% rate. An easy method of implementing the procedure is outlined. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

6.
Self-report measures require respondents to comprehend the inquiry and then engage the self. Two studies investigated how these 2 processes affect the answers produced. In Study 1, 480 participants completed a locus-of-control scale describing themselves, their best friend, or Bill Cosby. Item answers became more reliable as the items moved from the beginning to the end of the measure. The similar increase for self, friend, and Cosby suggested that exposure to the content, rather than self-engagement, was driving the reliability shift. Self-engagement did activate an actor–observer difference in scale means. Study 2 focused on the content engagement process. With more item experience, respondents were better able to distinguish that prototypic items belonged to the locus-ofcontrol scale and that distractor items did not. These studies imply that early questions clarify the meaning of a measure and improve the reliability of later answers. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

7.
Used 2 methods of item selection against an external criterion to build 2 short tests for selecting programing and computer-maintenance students. The methods were (a) sequential accretion of items so that at each iteration the item selected was the one leading to the largest increase in the correlation between the test and the criterion, and (b) accretion of items in order of their declining item point biserial correlations with the criterion. The items were given to development and cross-validation samples consisting of 99 computer-maintenance and 315 programing students. There was no significant difference in the validity of tests built using either method. Both methods produced tests with cross-valid coefficients higher than the validity of the item pool and both were reasonably resistant to shrinkage. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

8.
Reverse-scored items on assessment scales increase cognitive processing demands and may therefore lead to measurement problems for older adult respondents. In this study, the objective was to examine possible psychometric inadequacies of reverse-scored items on the Center for Epidemiologic Studies Depression Scale (CES-D) when used to assess ethnically diverse older adults. Using baseline data from a gerontologic clinical trial (n = 460), we tested the hypotheses that the reversed items on the CES-D (a) are less reliable than nonreversed items, (b) disproportionately lead to intraindividually atypical responses that are psychometrically problematic, and (c) evidence improved measurement properties when an imputation procedure based on the scale mean is used to replace atypical responses. In general, the results supported the hypotheses. Relative to nonreversed CES-D items, the 4 reversed items were less internally consistent, were associated with lower item-scale correlations, and were more often answered atypically at an intraindividual level. Further, the atypical responses were negatively correlated with responses to psychometrically sound nonreversed items that had similar content. The use of imputation to replace atypical responses enhanced the predictive validity of the set of reverse-scored items. Among older adult respondents, reverse-scored items are associated with measurement difficulties. It is recommended that appropriate correction procedures such as item readministration or statistical imputation be applied to reduce the difficulties. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

9.
The extensive use of the Unified Parkinson's Disease Rating Scale (UPDRS) has revealed low interrater reliability in some items and redundancy in others. In view of these shortcomings, we have structured a new scale that includes a zero-to three-point scale for each item in the evaluation of PD. The mental axis includes memory, thought disorders, and depression. Activities of daily living (ADL) includes eight items: speech, eating, feeding, dressing, hygiene, handwriting, walking, and turning in bed. The motor examination includes eight items: speech, tremor, rest and posture, rigidity, finger tapping, arising from chair, gait, and postural stability. Complications of therapy were also included: dyskinesias, dystonia, motor fluctuations, and freezing episodes, collected by history. In addition, a global scoring for motor fluctuations that should complement the Hoehn and Yahr Scale was incorporated. In this report, we present a statistical analysis of the ADL, motor evaluation, and complications of therapy sections. Concerning the interrater reliability mean, Kendall's W values were >0.9 for most of the items in the Short Parkinson's Evaluation Scale (SPES). Kendall's W <0.8 (motor evaluation) was found for two items of the SPES and nine items of the UPDRS. The mean interrater reliability for both scales across all seven centers (seven Kendall's W for seven centers) (Mann-Whitney test) showed no statistical differences between the scales. Spearman's correlations between items of both scales were significant. Factor analysis of the SPES and UPDRS data revealed a four-factor solution that explained approximately 60% of the data. All participating centers found the SPES easier to apply and quicker to complete, when compared with the UPDRS. The results obtained strongly favor the introduction of SPES for clinical practice.  相似文献   

10.
Reported norms of rated subjective frequency of use and imagery on 7-point scales for 1,916 French nouns in 454 17–29-yr-olds. Interjudge reliability was assessed by calculating the correlation between the mean ratings of items repeated in the booklet, between the mean ratings obtained from odd-numbered and even-numbered respondents, and by computing the Cronbach alpha statistic for each page of the booklet. Results indicate that although the estimates provided by female and male participants were highly correlated, the former gave a slightly higher frequency rating to the word sample but a slightly lower imagery rating than the latter did. Moreover, female respondents gave slightly more extreme ratings on the frequency and imagery scales. An analysis of the absolute difference between female and male ratings revealed a discrepancy of one half point or more on 20% of the word sample for frequency and 13% for imagery. On both scales, the mean absolute difference between male and female ratings was larger than that obtained by chance alone. The mean, standard deviation, and percentile rank of the frequency and imagery ratings for each item are appended together with their objective frequency of occurrence in Baudot's (1992) dictionary. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

11.
Discusses the contradictions and confusion in the literature on determining the optimal number of scale points in a rating scale, and suggests a mathematical model that allows for the simulation of the rating situation. The model involves generating data with different item variance-covariance structures and with different numbers of scale points. Such data were generated and used to calculate 3 reliability measures. The effects of different numbers of scale points and different covariance structures upon these reliability measures are examined, and the results help explain a large number of empirical studies exploring the "optimal number of scale points" problem. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

12.
The averaging model of information integration generally predicts a decelerating set-size effect (SSE), because it assumes that an "initial neutral impression" is always averaged with the information items that are presented. However, a modified averaging model, called the path-analytic averaging (PAA) model, predicts that the SSE will not always occur. What is considered the "initial impression" in the averaging model is reconceptualized as "inferred missing information" in the PAA model. When the number of presented items equals the number of items deemed important and both increase together, the PAA model predicts that there will be no SSE because there is no missing information. When the number of presented items increases, so that the added items provide information previously missing, a SSE should occur. These predictions of the PAA model were tested in an experiment in which 36 undergraduates rated the desirability of candidates for secretarial positions based on 1, 2, or 3 items of information. For most Ss, the PAA predictions were confirmed; for some Ss, however, the results are inconsistent with both the PAA model and the usual averaging model predictions. The latter Ss were distinguished from the others by their apparent use of scores less than the scale midpoint when they inferred information that was considered important but missing. (6 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

13.
Across many areas of psychology, concordance is commonly used to measure the (intragroup) agreement in ranking a number of items by a group of judges. Sometimes, however, the judges come from multiple groups, and in those situations, the interest is to measure the concordance between groups, under the assumption that there is some within-group concordance. In this investigation, existing methods are compared under a variety of scenarios. Permutation theory is used to calculate the error rates and the power of the methods. Missing data situations are also studied. The results indicate that the performance of the methods depend on (a) the number of items to be ranked, (b) the level of within-group agreement, and (c) the level of between-group agreement. Overall, using the actual ranks of the items gives better results than using the pairwise comparison of rankings. Missing data lead to loss in statistical power, and in some cases, the loss is substantial. The degree of power loss depends on the missing mechanism and the method of imputing the missing data, among other factors. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

14.
We developed a new questionnaire in the surgical area based on a core quality of life (QOL) questionnaire for patients with gastrointestinal cancer. In this study, we investigated the validity and reliability of a QOL questionnaire (Tokyo Yamabuki Forum Version) for patients with colorectal cancer. The questionnaire was composed of 17 items including 5 scales (basic sensory scale, psychological scale, physiological scale, defection-related scale and active scale) and a face scale as an global scale. The time needed to answer questionnaires was expected to be around 7 minutes and the questionnaires should basically be answered by the patients themselves everyday in the hospital. The study was performed in 10 hospitals in the Tokyo area, and 394 samples collected from 21 patients with rectal and colonic cancers were analyzed. A number of respondents failed to answer the question "Do you feel your foods tasty?", so we judged this item inappropriate and deleted it from the analysis. Fifteen items, including 5 scales showed satisfactory internal consistency and construct validity in correlation and factor analyses. Performance status showed a low correlation between each item, each scale and the global scale, while SDS and STAI showed an inordinately negative correlation with the fundamental and physical scales. Especially, SDS revealed an extremely close correlation with the active scale, and STAI showed an excessive correlation with the psychological scale. In the time course of QOL under chemotherapy, reductions (aggravations) were observed in both the total score of 15 items and global scale within one week postoperatively, but after that recovered to preoperative levels at 2 weeks postoperatively. A tendency to QOL improvement was observed 2 weeks after starting chemotherapy or chemoimmunotherapy. QOL of 13 patients was measured over 3 months, and the longest term was 8 months. The results suggested that this QOL questionnaire has sufficient reliability and validity to be usable for patients with colorectal cancer in the surgical area and that this model is applicable for long-term QOL surveys and frequent measurement.  相似文献   

15.
This two-experiment study examined the efficiency and sensitivity of five accuracy-based phonological awareness tasks for monitoring the development of these skills in kindergarten and Grade 1 students. The first experiment examined responses to different numbers and types of items included in each phonological awareness task for their correspondence to responses obtained from a larger, more inclusive item pool. Results suggested that an internally consistent and valid measure of each skill included 10 items per task, each representing a different linguistic combination. The second experiment examined the interscorer reliability and concurrent validity of the 5 measures, and compared their sensitivity to growth. Sensitivity was examined by administering 12 alternate forms of the tasks once per week to 32 kindergarten and 35 Grade 1 students. Mean slopes computed for each task suggested positive growth across all tasks and grades. Mean kindergarten slopes were significantly steeper than mean Grade 1 slopes for each of the 5 tasks, whereas the most sensitive task for both kindergarten and grade I students was Segmentation. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

16.
A study of item bias in standard cognitive screening measures was conducted in a sample of Afro-American, Hispanic and non-Hispanic white elderly respondents who were part of a dementia case registry study. The methods of item-response theory were applied to identify biased items. Both cross-cultural and high and low education groups were examined to determine which items were biased. Out of 50 cognitive items examined from six widely used cognitive screening measures, 16 were identified as biased for either high and low education groups or ethnic/racial group membership.  相似文献   

17.
Using 2 separate large samples of children (N1 = 957 and N2 = 3,885) and 1 smaller sample of adolescents and adults (N3 = 416), 3 studies of item selection for measurement of anxiety were conducted to determine if item selection differed across gender when traditional psychometric methods were applied. Applying a common set of item selection rules for males and for females, the same items were selected for inclusion on various measures of anxiety with differing item-response formats with comparable internal consistency reliability obtained using separate gender and combined gender samples. Standard psychometric methods indicate anxiety is measured in males and females about equally well and by essentially the same items.  相似文献   

18.
In a study of unconstrained recall, 18 undergraduates named as many acquaintances as possible in 10 min. One month later, Ss sorted these acquaintances into person types and into naturally occurring social groups. Timing results indicate that the Ss generated person memories in discrete bursts: After naming several acquaintances, Ss paused before naming several more. The temporal bursts were usually social groups. The process of unconstrained recall can be simulated by a simple model that samples items and traverses networks in a cognitive domain. After reproducing Ss' memory protocols with a computerized version of this sampling/traversal model, alternative models and the structure of naturally acquired person memories are discussed. It is suggested that pauses between clusters rise over a recall session because of an increase in the number of trials needed to locate a new item when sampling from the domain at large. (24 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

19.
In this article, the authors developed a common strategy for identifying differential item functioning (DIF) items that can be implemented in both the mean and covariance structures method (MACS) and item response theory (IRT). They proposed examining the loadings (discrimination) and the intercept (location) parameters simultaneously using the likelihood ratio test with a free-baseline model and Bonferroni corrected critical p values. They compared the relative efficacy of this approach with alternative implementations for various types and amounts of DIF, sample sizes, numbers of response categories, and amounts of impact (latent mean differences). Results indicated that the proposed strategy was considerably more effective than an alternative approach involving a constrained-baseline model. Both MACS and IRT performed similarly well in the majority of experimental conditions. As expected, MACS performed slightly worse in dichotomous conditions but better than IRT in polytomous cases where sample sizes were small. Also, contrary to popular belief, MACS performed well in conditions where DIF was simulated on item thresholds (item means), and its accuracy was not affected by impact. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

20.
The number of life events reported by study participants is sensitive to the method of data collection and time intervals under consideration. Individual characteristics also influence reporting; respondents with poor mental health report more life events. Much current research on life events is cross-sectional. Data from a longitudinal study of women's health from 4 waves over a decade suggest that over time additional systematic biases in reporting life events occur. Inconsistency over time is due to both fall-off of reporting and telescoping. Intracategory variability and ambiguity of items, as well as respondent characteristics, also potentially contribute to response biases. Although some factors (e.g., item wording) are controllable, others (e.g., respondents' mental health) are not and must be factored into data analysis and interpretation. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号