首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
We argue that the multisection validation design is the strongest design for addressing the degree to which student ratings predict teacher-produced learning. Results of several dozen multisection validity studies appear inconsistent. Unfortunately, prior quantitative reviews did not answer questions about the diversity of findings. The authors explore sensitivity of the prior analyses to identify true explanatory characteristics, generalizability of the findings across dimensions of teaching, and adequacy of the analyses to identify potential explanatory characteristics. They conclude that prior analyses lack adequate statistical power, explanatory characteristics vary with the dimension of teaching being validated, and a host of other study features remain to be investigated. Those features are identified through nomological coding of 43 validity studies. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

2.
Validity concerns and usefulness of student ratings of instruction.   总被引:6,自引:0,他引:6  
The validity of student rating measures of instructional quality was severely questioned in the 1970s. By the early 1980s, however, most expert opinion viewed student rating measures as valid and as worthy of widespread use. In retrospect, older discriminant-validity concerns were not so much resolved as they were displaced from research attention by accumulating evidence for convergent validity. This article introduces a Current Issues section that gives new attention to validity concerns associated with student ratings. The section's 4 articles deal, respectively, with (a) conceptual structure (are student ratings unidimensional or multidimensional?), (b) convergent validity (how well do ratings correlate with other indicators of effective teaching?), (c) discriminant validity (are ratings influenced by factors other than teaching effectiveness?), and (d) consequential validity (are ratings used effectively in personnel development and evaluation?). Although all 4 articles favor the use of ratings, they disagree on controversial points associated with interpretation and use of ratings data. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

3.
4.
Examined the generalizability of social-skills and social-anxiety global ratings made by judges trained in 6 experimental laboratories. Stimulus material consisted of videotapes of 20 psychiatric patients and a control sample of 20 National Guardsmen interacting in 2 types of social simulation typically employed in social-skills assessment: brief role plays and more extended interactions. Moderate degrees of generalizability across laboratories were found for the social-skills and social-anxiety ratings based on the brief role plays. For the more extended interaction, moderate generalizability was obtained for anxiety ratings, with some differences found among laboratories for the skills ratings. Results are viewed as encouraging, since numerous mitigating factors worked against establishing strong generalizability across laboratories. (9 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

5.
The construct validity measures in teaching, research, and service were determined using a multitrait–multimethod matrix. The measures in research (research publications, research grants, research awards, and peer ratings in research) and three of the measures in teaching (student evaluations, teaching awards, and peer ratings in teaching) supported convergent and divergent validity. However, the three-pronged model in academia was not validated, and exploratory factor analysis identified 5 and possibly 6 domains of behavior (research, classroom teaching, writing about teaching, community service, writing about service, and internal service). For one measure, peer ratings, reliability varied across academic areas and was moderately high for only high-confident ratings. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

6.
The performance of 79 male undergraduates in 2 heterosexual social situations was rated by questionnaires (including the Social Anxiety and Skill Questionnaire), self-ratings in role plays, self-ratings of videotapes of role plays, ratings by confederates, and ratings of videotaped role plays by judges. These ratings were characterized with respect to mode, method, and situation facets; the consistency of ratings was obtained under different conditions of these facets, which were investigated by use of a generalizability approach in which estimates of variance components and generalizability coefficients were calculated. Self-report and judges' ratings of anxiety and social skill based on a fairly large number of observations were found to be at best moderately generalizable across the various facets investigated; the relationship between anxiety and skill was found to vary considerably among the various methods; substantial proportions of variance, indicative of level differences, were found within judges for anxiety ratings and within judges and methods for skill ratings; and substantial proportions of variance were influenced by mode of measurement. The implications of these findings for obtaining reliable ratings of heterosexual social anxiety and skill are discussed. (24 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

7.
College instructors in 329 classes evaluated their own teaching effectiveness with the same 35-item rating form used by their students. There was student–instructor agreement in courses taught by teaching assistants (r?=?.46), undergraduate courses taught by faculty (r?=?.41), and graduate level courses (r?=?.39). Separate factor analyses of the student and instructor ratings demonstrated that the same 9 evaluation factors (e.g., work load, organization, interaction) underlay both sets of ratings. A multitrait–multimethod analysis supported convergent and divergent validity of these rating factors. Not only were correlations between student and instructor ratings on the same factors statistically significant for each of 9 factors, but correlations between their ratings on different factors were low. Findings demonstrate student–instructor agreement on evaluation of teaching effectiveness, support the validity of student ratings for both graduate and undergraduate courses, and emphasize the importance of using multifactor rating scales derived through the application of factor analysis. (28 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

8.
Six to 8 trained observers visited classes (3 classes/lecturer) taught by 54 university lecturers receiving either low, medium, or high student ratings. The observers, using the Teacher Behaviors Inventory, estimated the frequency of occurrence of 60 specific, low-inference teaching behaviors. Significant differences among low-, medium-, and high-rated Ss were found for 26 individual behaviors divided among 7 categories of teaching. Group differences were largest for attention-getting behaviors such as speaking expressively, moving about while lecturing, using humor, and showing enthusiasm for the subject. Factor analysis of individual teaching behaviors yielded 9 interpretable factors, of which three (Clarity, Enthusiasm, and Rapport) differed significantly across groups, and all but one showed correlations with various teacher and course characteristics. Results are discussed with reference to the pivotal role of attention-getting behavior in classroom teaching, the validity of student instructional ratings, and the design of teaching improvement programs in higher education. (18 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

9.
Investigated the reliability and concurrent validity of an inpatient memory impairment scale (IMIS), a 10-item behavior rating scale for use with hospitalized amnesiac patients, using 13 male and 7 female memory-impaired chronic alcoholic and Korsakoff patients (average age 58.12 yrs). A generalizability analysis revealed that the IMIS had a high degree of internal consistency. When 2 or more raters were used, interrater reliability was also high. Scores on the IMIS correlated .77 with the mean score of 7 practical and experimental cognitive memory tasks, and .86 with the mean score of 3 questionnaires evaluating orientation and memory for recent events. It is concluded that the IMIS has encouraging psychometric characteristics and that behavior ratings can be used to assess degree of amnesiac deficit accurately. (13 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

10.
The construct validity of developmental ratings of managerial performance was assessed by using 2 data sets, each based on a different 360° rating instrument. Specifically, the authors investigated the nature of the constructs measured by developmental ratings, the structural relationships among those constructs, and the generalizability of results across 4 rater perspectives (boss, peer, subordinate, and self). A structure with 4 lower order factors (Technical Skills, Administrative Skills, Human Skills, and Citizenship Behaviors) and 2 higher order factors (Task Performance and Contextual Performance) was tested against competing models. Results consistently supported the lower order constructs, but the higher order structure was problematic, indicating that the structure of ratings is not yet well understood. Multisample analyses indicated few practically significant differences in factor structures across perspectives. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

11.
Bias in observer ratings compromises generalizability of measurement, typically resulting in attenuation of observed associations between variables. This quantitative review of 79 generalizability studies including raters as a facet examines bias in observer ratings in published psychological research and identifies properties of rating systems likely to place them at risk for problems with rater bias. For the rating systems studied, an average of 37% of score variance was attributable to 2 types of rater bias: (a) raters' differential interpretations of the rating scale and (b) their differential evaluations of the same targets. Ratings of explicit attributes (e.g., frequency counts) contained negligible bias variance, whereas ratings of attributes requiring rater inference contained substantial bias variance. Rater training ameliorated but did not solve the problem of bias in inferential rating scales. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

12.
This study investigated within-source interrater reliability of supervisor, peer, and subordinate feedback ratings made for managerial development. Raters provided 360-degree feedback ratings on a sample of 153 managers. Using generalizability theory, results indicated that little within-source agreement exists; a large portion of the error variance is attributable to the combined rater main effect and Rater X Ratee effect; more raters are needed than currently used to reach acceptable levels of reliability; supervisors are the most reliable with trivial differences between peers and subordinates when the numbers of raters and items are held constant; and peers are the most reliable, followed by subordinates, followed by supervisors, under conditions commonly encountered in practice. Implications for the validity, design, and maintenance of 360-degree feedback systems are discussed along with directions for future research in this area. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

13.
14.
Divergent thinking is central to the study of individual differences in creativity, but the traditional scoring systems (assigning points for infrequent responses and summing the points) face well-known problems. After critically reviewing past scoring methods, this article describes a new approach to assessing divergent thinking and appraises its reliability and validity. In our new Top 2 scoring method, participants complete a divergent thinking task and then circle the 2 responses that they think are their most creative responses. Raters then evaluate the responses on a 5-point scale. Regarding reliability, a generalizability analysis showed that subjective ratings of unusual-uses tasks and instances tasks yield dependable scores with only 2 or 3 raters. Regarding validity, a latent-variable study (n=226) predicted divergent thinking from the Big Five factors and their higher-order traits (Plasticity and Stability). Over half of the variance in divergent thinking could be explained by dimensions of personality. The article presents instructions for measuring divergent thinking with the new method. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

15.
Tested the hypothesis that classroom teaching behavior mediates the relation typically found between personality and college teaching effectiveness. Colleagues rated 37 full-time college instructors on 29 personality traits, and trained observers assessed the frequency with which the same instructors exhibited 95 specific classroom teaching behaviors. Instructional effectiveness was measured by global end-of-term student ratings averaged over a 5-yr period. Path analyses revealed that approximately 50% of the relation between personality and teaching effectiveness was mediated by classroom behavior. The highly rated teacher was found to exhibit 2 types of personality traits: achievement orientation and interpersonal orientation. Results are discussed in terms of the validity of student ratings of teaching and in relation to M. J. Dunkin and B. J. Biddle's (1974) model of classroom teaching. (31 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

16.
A new approach to the study of memory has emerged recently, characterized by a preoccupation with natural settings and with the immediate applicability of research findings. In contrast, the laboratory study of memory relies on experimental techniques for theory testing and is concerned with the discovery of generalizable principles. Although both approaches share the goal of generalizability, they differ sharply in the evaluation of how that goal is best accomplished. In this article, we criticize the everyday memory approach, arguing that ecologically valid methods do not ensure generalizability of findings. We discuss studies high in ecological validity of method but low in generalizability, and others low in ecological validity of method but high in generalizability. We solidly endorse the latter approach, believing that an obsession with ecological validity of method can compromise genuine accomplishments. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

17.
This study of married couples investigated the short-term predictive validity of the partner-report and self-report scales of the Conflict Communication Inventory and compared the validity of these scales with the validity of observer ratings. A sample of 83 married couples completed two problem-solving conversations. Self-report, partner-report, and observer ratings from Conversation 1 were used to predict behavior in Conversation 2, as rated by a separate panel of observers. The short-term predictive validity of partner-report ratings was extremely high and indistinguishable from the validity of observer ratings. Self-report ratings also demonstrated good validity, albeit slightly lower than other methods. Both partner-report and self-report scores explained a substantial amount of variance in concurrent observer ratings of communication after controlling for relationship satisfaction. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

18.
Attempted to expand the construct validity of the standard profile and the recently developed factor scales for the PIC by determining the relationship of these scales to empirically derived dimensions of problem behaviors in children and adolescents. Extensive behavioral ratings of 398 2–12 yr old children and 293 13–18 yr old adolescents were obtained on 3 criterion checklists completed by parents, teachers, and clinicians. Ratings on these 3 forms were submitted to iterative common factor analysis with varimax rotation and yielded 5, 7, and 7 problem-behavior dimensions, respectively. T scores on the 16 clinical profile and the 4 PIC factor scales were correlated with the problem behavior dimensions separately for male children, male adolescents, female children, and female adolescents. The resulting correlation matrices permitted identification of scale correlates and estimates of their generalizability across age and sex. Results provide evidence of convergent and discriminant validity for both broad-band factor and narrow-band profile scales. The increased utility of the PIC accruing from the availability of both broad- and narrow-band measures of psychopathology is discussed, and suggestions for clinical and research application are noted. (19 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

19.
Replies to articles in the November 1997 issues of American Psychologist concerning student evaluations of teaching (SETs) (see records 1997-43129-002, 1997-43129-003, 1997-43129-004, and 1997-43129-005) and comments by J. S. Armstrong (see record 1998-11971-007) regarding these articles. The current authors note that the arguments concerning student ratings made present a range of conclusions from endorsing student ratings as largely valid and broadly useful to assailing ratings as invalid and harmful to instruction. Gillmore and Greenwald state that their position is intermediate between these poles. Although they recognize some validity of student ratings and acknowledge their useful role in giving students voice in the evaluation of instruction, they stress the possibility of improving their validity by statistically removing identifiable biases. They describe steps taken by the University of Washington toward achieving just improvement in the system of student ratings. Gillmore and Greenwald did not rule out contributions of other variables to the correlations between grades and ratings and conclude that leniency differences among instructors result in rating differences that should not be interpreted as indicating that more lenient graders are better teachers. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

20.
Construct validity of measures of college teaching effectiveness.   总被引:1,自引:0,他引:1  
Compared evaluation form, student, colleague, trained observer, former student, and self-rating teacher assessments of 43 university instructors. Data show that student and former student ratings displayed substantially greater validity coefficients of teaching effectiveness than self-report, colleague, and trained observer ratings. Advantages of student rating techniques (i.e., greater exposure to instructor's teaching), various teaching assessment methods, and problems in the literature due to limitations of research approaches typically used are discussed. (27 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号