共查询到20条相似文献,搜索用时 15 毫秒
1.
Indexes of interrater reliability and agreement are reviewed, and suggestions are made regarding their use in counseling psychology research. The distinction between agreement and reliability is clarified, and the relationships between these indexes and the level of measurement and type of replication are discussed. Indexes of interrater reliability appropriate for use with ordinal and interval scales are considered. The intraclass correlation as a measure of interrater reliability is discussed in terms of the treatment of between-raters variance and the appropriateness of reliability estimates based on composite or individual ratings. The advisability of optimal weighting schemes for calculating composite ratings is also considered. Measures of interrater agreement for ordinal and interval scales are described, as are measures of interrater agreement for data at the nominal level of measurement. (54 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
2.
Ss were 549 illiterate Iranian truck drivers rated for intelligence and given an individual intelligence test. To relate unreliability of ratings to validity, correlations were made between intelligence test scores and 4 groups of criterion ratings differing in reliability. The authors conclude that in the construction of rating scales, weighting of ratings by their agreement is better than by their disagreement. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
3.
Three psychologists who regularly conduct individual assessments were asked to assess 3 individuals posing as job candidates for the same position. The materials from these 9 assessments—test scores, biographical information, and audiotapes of interviews—served as protocols for 50 industrial/organizational psychologists who rated the candidates and assessors. Comparisons of the approaches and conclusions of the assessors indicated variability in job/organizational information obtained, test instruments used, personal history information gathered, interview and the report generated, and conclusions regarding the candidates. On average, only one third of the raters agreed with the conclusions of the assessor whose protocol they were reviewing. Significant differences were found in the raters' evaluations of 2 of the assessors, depending on which candidate the assessor had evaluated. The study's design limits generalizability; however, the low interrater agreement is disturbing. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
4.
5.
6.
Using a round-robin design in which every subject served both as judge and target, subjects made liking judgments, trait ratings, and physical attractiveness ratings of each other on each of 4 days. Although there was some agreement in the liking judgments, most of the variance was due to idiosyncratic preferences for different targets. Differences in evaluations were due to at least 2 factors: disagreements in how targets were perceived (is this person honest?) and disagreements in how to weight the trait attributes that predicted liking (is honesty more important than friendliness?) When evaluating the targets in specific roles (as a study partner), judgments showed much greater agreement, as did the weights of the trait attributes. A 2nd study confirmed the differential weighting of trait attributes when rating liking in general and the increased agreement when rating specific roles. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
7.
8.
Thompson Eileen G.; Gard James W.; Phillips James L. 《Canadian Metallurgical Quarterly》1980,38(1):57
Examined the effect of trait dimensionality on subject–verb–object (SVO) judgments. In Exp I, 26 undergraduates made likelihood judgments for sentences in which the S and O were described by traits from either the same or different (idiographically determined) trait dimensions. The SVO effect was found to be greater for unidimensional than for bidimensional sentences. Exp II used a concept-identification task with 126 undergraduates to examine the salience of biases in relational triad sentences and SVO sentences involving either social or intellectual traits. The SVO bias improved learning, relative to 2 nonsubstantive rules, whereas the 2-element biases did not. The SV bias had a greater effect for social sentences than for intellectual traits or triadic relations, suggesting a schema unique to the content of an empirically determinable trait word category. Together, the results of both studies show that the SVO or balance bias plays a greater role in SVO judgments than was previously believed, if the traits are from the same evaluative dimension. (25 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
9.
Based on the contention that clarifying the psychometric foundations of family instruments is essential to the field's future progress, the current study pursued 3 major aims: examining issues of instrument dimensionality; determining the generalizability of dimensional structures across whole-family, marital, and parent–child forms; and assessing the degree to which there is correspondence across different members' reports. Drawing on a community sample of intact families (N?=?192) and making use of a latent-variable approach, results provided support for a 3-dimensional framework (Affect, Activities, and Control) in accounting for score variance on whole-family, marital, and parent–child forms. Results indicated a significant degree of correspondence across different members' reports of these constructs for each family subsystem. Implications of these findings are discussed, and topics in need of further research attention are identified. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
10.
Compared counselor intake judgments about White and Black clients at a university counseling center. 1,078 White and 42 Black clients were randomly assigned to 1 of 13 counselors (11 Whites, 2 Blacks), who rated the clients on 11 variables. Counselors reported significantly higher ratings only on the judged potential for change in Black clients when compared with that in White clients. Ratings in the type and severity of presenting problem, client anxiety level, ease of expression, motivation, realism of goals, and physical appearance were not significantly different. Counselors' feelings about clients and the predicted number of counseling sessions were also similar for Black and White clients. (25 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
11.
Gaugler Barbara B.; Rosenthal Douglas B.; Thornton George C.; Bentson Cynthia 《Canadian Metallurgical Quarterly》1987,72(3):493
Meta-analysis (Hunter, Schmidt, & Jackson, 1982) of 50 assessment center studies containing 107 validity coefficients revealed a corrected mean and variance of .37 and .017, respectively. Validities were sorted into five categories of criteria and four categories of assessment purpose. Higher validities were found in studies in which potential ratings were the criterion, and lower validities were found in promotion studies. Sufficient variance remained after correcting for artifacts to justify searching for moderators. Validities were higher when the percentage of female assessees was high, when several evaluation devices were used, when assessors were psychologists rather than managers, when peer evaluation was used, and when the study was methodologically sound. Age of assessees, whether feedback was given, days of assessor training, days of observation, percentages of minority assessees, and criterion contamination did not moderate assessment center validities. The findings suggest that assessment centers show both validity generalization and situational specificity. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
12.
This study considered whether assessments of violence risk in which 2 clinicians reach similar conclusions are more accurate than the conclusions of either clinician alone when their assessments disagree. One hundred ten physicians and 44 nurses estimated the probability of physical assault of 478 patients admitted to a short-term locked psychiatric inpatient unit. The level of assessed risk showed a substantial correspondence with the likelihood of later violence when the physician and nurse ratings were highly concordant. As the extent of agreement between the physician and nurse ratings decreased, the strength of the association between the risk assessments and the occurrence of violence decreased accordingly. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
13.
Compared the validity of individual's self-assessments with other assessment procedures commonly used in psychological evaluation. Comparisons are made in the prediction of all criteria investigated: intellectual achievement, vocational choice, job performance, therapy outcome, adjustment following hospitalization, and peer ratings. Self-assessments were at least as predictive of these criteria as were other assessment methods against which they have been pitted. Limitations of this conclusion and its implications for current psychological evaluation procedures are examined. It is argued that greater attention should be given to self-assessments and to the evaluation procedures that may enhance their predictive validity. Steps are outlined for deciding when self-assessment should be used, and suggestions are offered as to how the validity of self-judgments might be maximized. (129 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
14.
P. R. Sackett and G. F. Dreher (see record 1982-31460-001) examined the internal construct validity for 3 different sets of assessment center data. Based on the failure of the exercise ratings to satisfy construct validity requirements, Sackett and Dreher concluded that assessment centers should not rely on a content validation strategy. The present authors question the finality of Sackett and Dreher's contention. In addition, clarification of the psychometric bases for multiple exercises in assessment centers is presented. The results of several studies that have examined the convergent and discriminant validities of internal assessment center ratings are discussed with respect to the expanded explanation of the purposes of multiple exercises. The causes and the impact of assessee variance across exercises are also considered. (12 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
15.
Analyzed individual ratings made by 46 assessors working in an assessment center for the selection of entry-level managers. 10 Ss' ratings (each of whom had rated more than 200 assessees) were individually subjected to confirmatory factor analyses (using linear structural relations) and examined within a lens model framework. Support was found for both a formal and informal method of arriving at an overall assessment rating. Subgroup analyses suggest that there was little effect of assessee sex on the way Ss arrived at a rating. (18 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
16.
Although the assessment center is an ideal method for the study of lives, it has been used for that purpose only in a few landmark investigations. One of these, the Management Progress Study, has evaluated participants in repeated assessment centers over a period of years. Data from that study include analysis of individual factors in managerial success; several of these are personality–motivational in nature (the EPPS, Guilford-Martin Inventory, California F Scale, TAT, and Rotter Incomplete Sentences Blank). In addition, personality characteristics are strongly related to key managerial abilities. Recommendations made for enhancing the contribution of assessment centers to the study of lives include methodological improvements, new participant populations, and focusing on life roles other than occupational. (14 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
17.
18.
Bishop John B.; Sharf Richard S.; Adkins Deborah M. 《Canadian Metallurgical Quarterly》1975,22(6):557
Conducted a study to investigate the variables used by intake counselors at a university counseling center in estimating the number of interviews a client will attend and to assess the accuracy of those estimates. Data collected from 448 cases indicate that counselors relied most heavily on their judgment of the severity of personal problems in estimating the number of interviews clients would attend. The variables investigated accounted for a relatively small amount of the total variance found in the actual number of counseling sessions held. An unexpected finding was that the judged severity of vocational problem was negatively correlated to both the estimated and actual number of interviews. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
19.
KJ Ottenbacher ME Msall NR Lyon LC Duffy CV Granger S Braun 《Canadian Metallurgical Quarterly》1997,78(12):1309-1315
OBJECTIVE: Examination of the interrater agreement and stability of ratings obtained using the Functional Independence Measure for Children (WeeFIM) in a sample of children with developmental disabilities. DESIGN: A relational design was used in which two sets of WeeFIM scores were collected under four conditions: same rater-short interval; same rater-long interval; different rater-short interval; and different rater-long interval. SETTING: WeeFIM scores were collected in outpatient developmental rehabilitation centers, school programs, and the children's homes. PARTICIPANTS: Data were collected for 205 children ranging in age from 11 to 87 months. All children had a medical diagnosis of disability and were receiving habilitative-educational intervention or follow-along services including neurodevelopmental surveillance. INSTRUMENT: The WeeFIM instrument examines basic daily living and functional skills in children from birth to 7 years of age. The WeeFIM is modeled after the Functional Independence Measure (FIM) for adults and includes 18 items in the following subscales: self-care, sphincter control, transfers, locomotion, communication, and social cognition. RESULTS: Kappa values for items ranged from .44 to .82. Intraclass correlation coefficients (ICC) for the six subscales ranged from .73 to .98. Total WeeFIM ICC values were greater than .95 for all analyses. CONCLUSIONS: The WeeFIM ratings for the 205 children with developmental disabilities participating in this investigation were consistent across raters and time. 相似文献
20.
47 individuals evaluated in a management assessment center in 1967 in the marketing organization of a large manufacturing company were followed up 8 yrs later. Both the overall assessment center rating and a general management evaluation of potential derived from personnel files were significantly related to position level attained after 8 yrs for 30 individuals still with the company. The shrunken multiple correlation of these 2 predictors with level attained was .58. Characteristics of aggressiveness, persuasiveness, oral communications, and self-confidence plus test scales of ascendency and self-assurance were most strongly related to level attained 8 yrs later. It is commented that while the validity of this specific assessment center for predicting advancement appears adequate, additional research is desirable to evaluate the ability to predict performance in management. (7 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献