首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 890 毫秒
1.
Some personality test scores improve on retest. Three studies investigated the role of meaning change in producing this phenomenon. In Study 1 multiple versions of a Manifest Anxiety Scale were administered, with counterbalanced item orders. It was found that measurement-induced improvement (a) occurred within a test as well as between test and retest, (b) was unaffected by participants' anxiety scores, and (c) occurred even when the retest contained different items than the first test. Studies 2 and 3 found that as respondents experience more of a test, they are better able to discern its meaning and to use that meaning to interpret an item. These findings indicate that mean shifts in answers from test to retest also occur within a test along with context-induced shifts in meaning. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

2.
Tested the assumption that individuals' scores and criterion group characteristics for nonprofessional vocations remain stable over long periods of time. Navy Vocational Interest Inventory scores received by 208 Navy enlisted men in a variety of occupational specialties were compared with their retest scores obtained 5 yr. later. Results show substantial reliability of individual scores, paralleling reliabilities obtained with the SVIB. Comparisons of interest profiles of criterion groups tested in 1951 with those of men entering the same specialties 13 yr. later also showed considerable stability. Findings should be generalizable to the Minnesota Vocational Interest Inventory, a revision of the original inventory. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

3.
Controversy abounds over attributing group differences on tests to nature, nurture, or test bias. Limitations of correlational sampling from natural populations necessitate experimental methods to resolve underlying issues. In classicial psychometrics test items are selected from a larger item pool through analysis of item responses in a sample of subjects. Rats of six inbred strains (n?=?366) were tested in multiple mazes to provide a large item pool. Six populations were created, each with differing proportions of each strain. Items selected through independent item analyses within each population yielded six tests. An independent cross-validation sample (n?=?146) provided scores on all six items. This sample was also tested in another set of maze problems defined as the criterion to be predicted. Strain means and intrastrain predictive validities for the six tests varied with strain representation in the population used for item selection (p?  相似文献   

4.
Fourth through 6th graders (n?=?418) completed the Children's Depression Inventory (CDI; M. Kovacs, 1980). Each teacher (n?=?31) rated 6 students with high, low, or medium CDI scores (n?=?181) using the CDI items (teacher-CDI) and a single global rating. Remaining students received the global rating only. 16 teachers were randomly assigned to receive instruction on childhood depression. Contrary to earlier studies, moderate correspondence was found for both measures. Familiarity was related to correspondence, whereas confidence and student gender were unrelated to correspondence. Instruction improved knowledge, but not correspondence. School-related behaviors yielded the highest correspondence. The teacher-CDI displayed high test–retest reliability. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

5.
Administered the Cornell Medical Index (CMI) to 630 Navy psychiatric patients and 454 healthy controls. Patient and control samples were split into 2 groups for cross-validation purposes, and 2 methods, regression analysis and a new item selection technique called SEQUIN, were applied to the problem of selecting the most discriminating set of CMI items. The percentages correctly classified "sick" or "well" when results from Sample 1 were used to predict Sample 2 and vice versa were 82 and 85% by the regression method and 86 and 86% by the SEQUIN method. 7 items, perhaps representing general attributes defining mental illness in the Navy culture, contributed significantly to the predictive scales regardless of particular item selection method or sample. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

6.
The Addiction Severity Index—Multimedia Version (ASI–MV) is a CD-ROM-based simulation of the interview-administered Addiction Severity Index (ASI). Clients in treatment (N ?=?202) self-administered the ASI–MV to examine the test–retest reliability, criterion validity, and convergent–discriminant validity of the ASI–MV. Excellent test–retest reliability was observed for composite scores and severity ratings. Criterion validity, tested against the interviewer-administered ASI, was good for the composite scores. For severity ratings, variable agreement was observed between the ASI–MV and each interviewer, suggesting poor interrater reliability among interviewers. This conclusion was bolstered by a finding of superior convergent–discriminant validity for both composite scores and severity ratings compared to the standard ASI. The ASI–MV is a viable alternative to the expensive and potentially unreliable interviewer-administered version. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

7.
Despite recent interest in the practice of allowing job applicants to retest, surprisingly little is known about how retesting affects 2 of the most critical factors on which staffing procedures are evaluated: subgroup differences and criterion-related validity. We examined these important issues in a sample of internal candidates who completed a job-knowledge test for a within-job promotion. This was a useful context for these questions because we had job-performance data on all candidates (N = 403), regardless of whether they passed or failed the promotion test (i.e., there was no direct range restriction). We found that retest effects varied by subgroup, such that females and younger candidates improved more upon retesting than did males and older candidates. There also was some evidence that Black candidates did not improve as much as did candidates from other racial groups. In addition, among candidates who retested, their retest scores were somewhat better predictors of subsequent job performance than were their initial test scores (rs = .38 vs. .27). The overall results suggest that retesting does not negatively affect criterion-related validity and may even enhance it. Furthermore, retesting may reduce the likelihood of adverse impact against some subgroups (e.g., female candidates) but increase the likelihood of adverse impact against other subgroups (e.g., older candidates). (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

8.
A self-report measure for the Type A subcomponent of time urgency and perpetual activation (TUPA) was developed. An original pool of 137 items for measuring TUPA was created with the help of 10 Type A patients with coronary heart disease (CHD). An attempt was then made to validate these items by correlating them with interview-based ratings for TUPA while using 48 additional CHD patients. On the basis of these latter ratings, 47 items were retained for the final scale. That instrument was then cross-validated on an additional sample of 40 non-CHD Ss. The internal consistency and test–retest reliabilities for this scale, combined with validity and cross-validation data from this and previous studies, indicate the scale to be sufficiently refined for initial clinical and research use. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

9.
"Examination of mechanical test data for male and female recruits suggested that a more valid test for use with enlisted women in the Navy might be constructed with items concerned with 'male' mechanical activities, but at a difficulty level appropriate for female recruits. By selecting on the basis of item characteristics, 52 items were chosen from the 100 in the Basic Test Battery Mechanical Test. Using as a criterion the scores from the Breech Block Performance Test (a measure of ability to learn mechanical-motor skills), the validity of the new 'easier' 52-item test was found to be .47 as compared with .39 for the original 100-item test." (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

10.
The Hand Skills Test, a device which measures "persistence beyond minimum standards on tiring tasks," was used to predict school grades and job performance evaluations for higher and lower aptitude Navy personnel. 3 enlisted samples and 1 officer candidate sample were employed. Within each sample men were divided into higher and lower aptitude groups at the median of their aptitude test scores. Principal findings were: (a) the Hand Skills Test significantly predicted school grades of the 2 lower aptitude enlisted samples (grades were not available for the 3rd enlisted samples) but did not predict for higher aptitude enlisted men or for officer candidates, and (b) the Hand Skills Test significantly predicted job performance evaluations among lower aptitude men in all 4 samples, but again validities were not significantly different from zero among the 4 higher aptitude samples. From Psyc Abstracts 36:05:5LD76K. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

11.
Suggests that there are problems associated with assessments of psychopathy in prison populations that use self-report inventories and global diagnostic procedures. In response to these problems, the authors developed a behavioral checklist for psychopathy. The psychometric qualities of the checklist were evaluated using generalizability theory and classical test score indices of reliability. In each of 5 yrs, 2 raters (usually different each year) rated prison inmates (N?=?301; mean age 26.9 yrs) on 22 items. The generalizability coefficients were .85, .86, and .89 for the years 1977–1981, respectively. The generalizability coefficient for a test–retest study was .89. Classical indices of reliability (alpha coefficients and inter- and intrarater reliability) ranged from .82 to .93. Results indicate that the checklist is a highly reliable and generalizable instrument when used with prison populations. It is highly correlated with global ratings of psychopathy and criteria from the DSM-III for antisocial personality disorder. (14 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

12.
Shows that differential response latencies are a meaningful indicator of the presence of a trait. A total of 92 subjects responded to a series of microcomputerized personality test items reflecting 4 different traits on each of 4 occasions. Estimates of internal consistency, parallel forms reliability, and test–retest stability suggested that the reliability of the response latencies was modest. Differential response latencies showed excellent convergent validity for corresponding trait level measures and excellent discriminant validity for irrelevant trait level measures. Moreover, as predicted, the latencies for endorsing trait relevant items were negatively related to trait level measures whereas the latencies for rejecting items were positively related. Differential response latencies had no tendency to group together as a method factor. Rather, the pattern of convergent and discriminant relationships generalized across all 4 retest sessions. (French abstract) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

13.
Two easy-to-administer Wechsler Adult Intelligence Scale—Revised (WAIS—R) short forms for test–retest situations were developed. One sample of 90 22–67 yr old male psychiatric inpatients and a cross-validation sample of 30 22–79 yr old psychiatric inpatients completed the full WAIS—R. Test results were scored in the standard fashion and for 2 short forms developed by an odd–even split on 9 of the 11 subtests. Verbal, Performance, and Full Scale IQs derived from both short form scales closely approximated standard-form WAIS—R IQs. Short-form subtest-scaled scores, however, were more discrepant from the standard-form subtest-scaled scores. The short forms' practical advantages are discussed. (8 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

14.
40 3rd and 5th graders were administered the 1st 3 dilemmas of Kohlberg's Moral Judgment Scale. One group received the scale 2 wks after first administration, while another group received a multiple-choice variant of the scale. Data analyses revealed low test–retest reliability for scores attained on the 3 dilemmas together as well as individually. Scores on items within each dilemma were found to be low and generally nonsignificant. Ss who received the multiple choice variant of the scale scored at significantly higher moral levels than did those who received the typical verbal production version of the scale. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

15.
Analyzed test–retest reliability data gathered from 106 sources (89 independent samples), using a multiple-regression method in an attempt to estimate the effects of several factors on questionnaire stability. We examined 8 self-report inventories: the High School Personality Questionnaire, the 16PF, the MMPI, the Myers–Briggs Type Indicator, the CPI, the Guilford–Zimmerman Temperament Survey, the EPPS, and the OPI. Samples ranged in size and encompassed a wide range of Ss divergent on status and age. We found S's age and status, number of test items, test interitem correlation, and test–retest interval to be significant predictors of reliability. Variables representing general adjustment were found to be less predictable than extraversion variables, and short-term reliability was more predictable than long-term reliability. S's sex and specific questionnaires were not found to have a significant effect on reliability. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

16.
This study examines long-term health and physical readiness trends in the U.S. Navy. We mailed lifestyle questionnaires to all participants in baseline studies between 1983 and 1989 who were still on active duty in 1994. Commands provided body composition and physical readiness test scores for the participants. Two longitudinal cohorts were created: an 8-year sample (N = 640) with matched data from 1986, 1989, and 1994; and an 11-year sample (N = 1,576), with data from 1983 and 1994. Analyses of both cohorts revealed significant improvements in cardiovascular fitness, muscle strength, exercise, lean body mass, dietary habits, and sleep, as well as significant decreases in tobacco and alcohol use and job stress. However, hypertension rates, percentage of body fat, and body mass index increased over time. Women's scores were significantly better than men's on a number of factors. Overall, these findings suggest that the Navy's health promotion efforts have had a significant positive effect on the fitness and health behaviors of career Navy men and women.  相似文献   

17.
Several investigators have proposed item-selection methods which construct a 1st-stage test consisting of the most valid items then a 2nd-stage test by adding to the 1st-stage test items which are moderately valid yet which correlate low with the 1st-stage test. Several proposed indices for selecting 2nd-stage items were compared, and some found noticeably better than others; a 3rd-stage test was noticeably better than a 2nd-stage test, but a 4th-stage test was no better than the 3rd-stage test. A method which adds several items to form each new stage was found superior to a method which adds only 1 item. The best method constructed tests substantially better on cross-validation than methods which ignore interitem correlations. (19 ref.) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

18.
Data gathered during the development of the Navy Arithmetic Test were analyzed to evaluate the utility of using distracters derived by administering the items in open-ended (OE) form. For both Computation items and Reasoning items moderately high correlations (about .50-.60) were found between frequency with which responses were written in by Ss in OE format and frequency with which the same responses were chosen in multiple-choice format. That OE-derived responses tend to retain their relative popularity in multiple-choice format appears to provide some support for the use of the relatively expensive OE technique in arithmetic test construction. Caution is urged in applying these findings to other test types. From Psyc Abstracts 36:05:5KJ31R. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

19.
Four studies examined the Conflict in Adolescent Dating Relationships Inventory (CADRI), a measure of abusive behavior among adolescent dating partners. Exploratory factor analysis was used to refine items based on high school participants with dating experience (N?=?393; 49% female). Confirmatory factor analysis was used to derive and cross-validate the factor structure with participants from 10 high schools (N?=?1,019, 55% female; ages 14–16). The model structure fit for all grades and both sexes, with physical abuse, verbal abuse, and threatening behavior most representative of the underlying "abuse" factor. In Studies 3 and 4, the second-order abuse factor showed acceptable test–retest reliability, partner agreement, and correlation (significant for males only) between observer ratings of dating partners' interactions and youths' CADRI scores. Results support the CADRI as a measure of abusive behavior in adolescent dating relationships. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

20.
Examined retest reliability of the Group Embedded Figures Test (GEFT) under 3 different intervals between test and retest, all with an interpolated cognitive task. 159 18–51 yr old undergraduates served as Ss in 3 experiments. Despite a significant increase in the group mean score from test to retest in all 3 experiments, retest reliability coefficients were high, .78–.92. There was also a suggestion that reliability increased with duration of delay. Examination of individual patterns of test–retest score change revealed 4 patterns: consistent field dependent (FD), consistent field independent (FI), unclassifiable, and latent field independent, whose retest scores took them from the FD range to the FI range. The latter 2 patterns accounted for the significant retest improvement. Relative frequencies of Ss in each pattern were relatively constant over the 3 experiments. Ss in the pattern categories also differed with respect to score on number series completion tests, suggesting that the 4 patterns reflect more general individual differences in analytic ability. It is concluded that the GEFT is a reliable test, but suggestions for a more error-free classification procedure based on a test–retest score pattern is proposed. (15 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号