首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In an attempt to evaluate the often stated rule that test items should be arranged in an increasing order of difficulty, the effect of item difficulty order on total test reliability, difficulty, and discrimination was investigated in a series of 4 experiments. Each experiment involved a comparison of 2 or more tests, containing the same 40 items and differing only with respect to the order of those items. The differently ordered forms did not, in any of the experiments, differ significantly in test difficulty or test reliability. The results with respect to discrimination were not as clear-cut. However, the results tend to lead to the conclusion that item difficulty order on a power test of facts and principles given in the normal college classroom will not significantly affect these 3 test statistics. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

2.
Pathological personality item responses have been shown to relate to the social desirability scale values of test items. It was hypothesized that both social desirability and pathological item-response frequency might vary as a function of the time permitted to answer test items. Two groups of Ss were administered the items of the Maslow, Birch, Honigman, McGrath, Plason, and Stein Security-Insecurity Inventory. Social desirability scale values for the items were established. Maximal reading time required for each item was also determined, and both groups were permitted to view each item for the same established length of time. 1 group was allowed 2 sec., the other group 10 sec. for each response. It was observed that time pressure reduced the number of pathological item responses, and that items scaled either high or low in social desirability tended to be answered in the socially desirable direction under time pressure. Females generally provided more critical or pathological item responses than did males. (24 ref.) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

3.
Formulas and procedures are given for calculating item and test means, variances, intercorrelations, and item-selection indices on the basis of an 'F-matrix' that shows the number passing every pair of items. If IBM equipment is not available to develop the F-matrix, alternate procedures based on hand sorting are recommended and described. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

4.
"Examination of mechanical test data for male and female recruits suggested that a more valid test for use with enlisted women in the Navy might be constructed with items concerned with 'male' mechanical activities, but at a difficulty level appropriate for female recruits. By selecting on the basis of item characteristics, 52 items were chosen from the 100 in the Basic Test Battery Mechanical Test. Using as a criterion the scores from the Breech Block Performance Test (a measure of ability to learn mechanical-motor skills), the validity of the new 'easier' 52-item test was found to be .47 as compared with .39 for the original 100-item test." (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

5.
为分析能力验证对实验室检测能力提升的促进方式,定量能力验证对实验室检测能力的提升效能,分析了305家实验室2010-2015年间参加柴油检测能力验证的3 133个评价结果和能力评价统计量|Z|值的变动趋势。研究表明,随参加能力验证次数增加,实验室检测能力显著提高,但单次能力验证不能有效保证持续检测能力。通过3轮能力验证,实验室能够基本消除检测过程中的关键错误。对于不同检测项目间能力验证的相互作用研究表明,通过参加某一项目的能力验证,实验室提高了其他项目的技术水平,表明能力验证能够促进实验室整体能力提升,而非局限于单一的检测项目。能力验证对实验室检测能力的作用方式可分为2种,一种通过消除检测技术的关键错误,直接提升能力验证参加项目的技术水平;另一种通过提升实验室整体管理能力,间接提升其他检测项目的技术水平。  相似文献   

6.
The widely used Social Interaction Anxiety Scale (SIAS; R. P. Mattick & J. C. Clarke, 1998) possesses favorable psychometric properties, but questions remain concerning its factor structure and item properties. Analyses included 445 people with social anxiety disorder and 1,689 undergraduates. Simple unifactorial models fit poorly, and models that accounted for differences due to item wording (i.e., reverse scoring) provided superior fit. It was further found that clients and undergraduates approached some items differently, and the SIAS may be somewhat overly conservative in selecting analogue participants from an undergraduate sample. Overall, this study provides support for the excellent properties of the SIAS's straightforwardly worded items, although questions remain regarding its reverse-scored items. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

7.
A conventional way to analyze item responses in multiple tests is to apply unidimensional item response models separately, one test at a time. This unidimensional approach, which ignores the correlations between latent traits, yields imprecise measures when tests are short. To resolve this problem, one can use multidimensional item response models that use correlations between latent traits to improve measurement precision of individual latent traits. The improvements are demonstrated using 2 empirical examples. It appears that the multidimensional approach improves measurement precision substantially, especially when tests are short and the number of tests is large. To achieve the same measurement precision, the multidimensional approach needs less than half of the comparable items required for the unidimensional approach. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

8.
The main aim of this article is to explicate why a transition to ideal point methods of scale construction is needed to advance the field of personality assessment. The study empirically demonstrated the substantive benefits of ideal point methodology as compared with the dominance framework underlying traditional methods of scale construction. Specifically, using a large, heterogeneous pool of order items, the authors constructed scales using traditional classical test theory, dominance item response theory (IRT), and ideal point IRT methods. The merits of each method were examined in terms of item pool utilization, model-data fit, measurement precision, and construct and criterion-related validity. Results show that adoption of the ideal point approach provided a more flexible platform for creating future personality measures, and this transition did not adversely affect the validity of personality test scores. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

9.
The necessity of cross-validating the results of an item analysis has been cogently and humorously demonstrated by Cureton (Educ. psychol. Measmt., 1950, 10, 94-96; see record 1951-00682-001). By using more predictor items than subjects and by computing his validity coefficient on the original group, Cureton obtained a validity coefficient of .82. The writer undertook an experiment which was designed to be equally illustrative of the importance of cross-validation, but which used a somewhat different design. The "test" in this case consisted of a checklist of 81 adjectives describing personality and 22 items relating to personal characteristics, habits, and preferences. The subjects were 59 students in an introductory psychology course at Cornell University. Twenty-nine subjects' tests were chosen at random from this group for preliminary analysis, and the remaining 30 were put aside as the cross-validation group. The answers to the group of 22 items were examined to find an item which split the group of 29 nearly in half. The "number of letters in last name" was chosen as the criterion solely on this basis, leaving a total of 102 predictor items. The criterion was dichotomized between six or less, and seven or more letters. Tetrachoric correlations were computed between each item and the criterion for the group of 29. Discriminant weights were arbitrarily assigned to each item that correlated .36 or better with the criterion. Using this scoring key, 27 out of 29 correct "predictions" were made as to the number of letters in each subject's last name. The over-all tetrachoric correlation was .97, although the split-half reliability coefficient was only .67. Those subjects with long (seven or more letters) last names tended to be less: charming, impatient, stimulating, gay, happy-go-lucky, and impulsive than those with short (six or fewer letters) last names. The subjects with long last names also tended to be more: cautious, persistent, forgiving, quiet, kind, persuasive, talented, direct, humane, conservative, precise, and God-fearing. The results suggested that if the procedure were reversed, and the criterion were used as a predictor, it might have great promise as a quick, economical, and unfakeable personality test. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

10.
The Psychopathy Checklist--Revised (PCL-R) is an important measure in both applied and research settings. Evidence for its validity is mostly derived from male Caucasian participants. PCL-R ratings of 359 Caucasian and 356 African American participants were compared using confirmatory factor analysis (CFA) and item response theory (IRT) analyses. Previous research has indicated that 13 items of the PCL-R can be described by a 3-factor hierarchical model. This model was replicated in this sample. No cross-group difference in factor structure could be found using CFA; the structure of psychopathy is the same in both groups. IRT methods indicated significant but small differences in the performance of 5 of the 20 PCL-R items. No significant differential test functioning was found, indicating that the item differences canceled each other out. It is concluded that the PCL-R can be used, in an unbiased way, with African American participants. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

11.
Keying-related factors in psychological scales are variously interpreted substantively or as products of violations of the assumptions underlying item keying. The present study investigated whether the extremity of the wording of items may contribute to the emergence of item-keying factors in a commonly used psychological scale. Respondents (N=277) completed the Life Orientation Test (M. F. Scheier & C. S. Carver, 1985) in either its original or modified, more moderately worded form. Results indicate that the interaction of item extremity and item keying significantly affected subscale means and, more important, that the more moderately worded scale was substantially more unidimensional. Results are explained partially through the association of lesser and greater extremity with the tendency for some respondents to agree or disagree with items irrespective of keying direction. These results, although demonstrated in only 1 scale, have potential relevance to any scale comprising positive and negative items. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

12.
The "undecided" response in attitude inventory items may be selected because of actual neutrality, item ambiguity, lack of information, antagonism to the test procedure, or a need to "straddle." The major determiners are probably actual neutrality, "fence-straddling" attitudes, or lack of information. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

13.
The effects of faking on personality test scores have been studied previously by comparing (a) experimental groups instructed to fake or answer honestly, (b) subgroups created from a single sample of applicants or nonapplicants by using impression management scores, and (c) job applicants and nonapplicants. In this investigation, the latter 2 methods were used to study the effects of faking on the functioning of the items and scales of the Sixteen Personality Factor Questionnaire. A variety of item response theory methods were used to detect differential item/test functioning, interpreted as evidence of faking. The presence of differential item/test functioning across testing situations suggests that faking adversely affects the construct validity of personality scales and that it is problematic to study faking by comparing groups defined by impression management scores. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

14.
M. Duncan & B. B. Murdock (2000) compared precued and postcued item recognition and serial recall showing precued-postcued differences for item recognition but not for serial recall. Precuing and postcuing refer to 2 conditions in which the instructions as to the type of recall test following the presentation of short lists of items is given before or after the list presentation. This methodology was extended here to a paired-associate task. In 2 experiments, short lists of paired associates were presented followed by single-item, old-new, or intact-rearranged pair recognition tests; test type was precued or postcued. A fast or slow presentation rate was used to discourage or encourage mediators. TODAM2 (a theory of distributed associative memory) predicts that there should be little or no cuing differences regardless of whether subjects use mediators to remember the pairs. As predicted the recognition data were essentially identical for the precued and postcued conditions. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

15.
7 SVIB scales were developed and cross validated on 461 managers from 13 varied Minnesota companies. Questions studied were (a) Which item weighting method results in the highest scale validity? (b) Are shorter scales as valid as longer scales? (c) How much may scales be shortened? (d) Why may they be shortened? Controls for scale length, content, validity, and for item weighting method were introduced. Results indicated (a) there was no practical difference in validities between simple unit versus variably weighted scales, (b) shorter scales were as valid as longer scales, (c) Clark's "40 to 60 item optimum scale length" hypothesis was supported, (d) although not conclusive, shorter scales appeared superior partly because their average item validities were greater and thus they perhaps should not be used where developmental item pools are rich in valid items. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

16.
17.
Four experiments compare the effect of familiarity on item, associative, and plurality recognition on self-paced and speeded tests. The familiarity of test items was enhanced by presenting a prime that matched the subsequent test item. On item and plurality recognition tests, participants were more likely to respond "old" to primed than to unprimed test items. In associative recognition, priming increased the proportion of old responses on a speeded test, but not on a self-paced test. This suggests that familiarity plays a larger role in item and plurality recognition than in associative recognition on self-paced tests. On speeded tests, priming has a similar effect on item, associative, and plurality recognition. Results suggest that item and associative recognition rely differentially on familiarity and recollection. They are also consistent with recent evidence suggesting that different processes underlie plurality and associative recognition. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

18.
Two chimpanzees (Pan troglodytes) made numerousness judgments of nonvisible sets of items. In Experiment 1, 1-10 items were dropped 1 at a time into an opaque cup, and then an additional 1-10 items were dropped 1 at a time into another opaque cup. The chimpanzees' performance levels were high and were more dependent on factors indicative of an analogue-magnitude mechanism for representation of set size than on an object file mechanism. In Experiment 2, a 3rd visible set was made available after the sequential presentation of the first 2 sets. The chimpanzees again performed at high levels in selecting the largest of the 3 sets. In Experiment 3, 1 of the 2 initially presented sets was reduced in number by the sequential removal of 1, 2, or 3 items. Both chimpanzees performed above chance levels for the removal of 1, but not more than 1, item. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

19.
The authors describe the initial development of the Wagner Assessment Test (WAT), an instrument designed to assess critical thinking, using the 5-faceted view popularized by the Watson-Glaser Critical Thinking Appraisal (WGCTA; G. B. Watson & E. M. Glaser, 1980). The WAT was designed to reduce the degree of successful guessing relative to the WGCTA by increasing the number of response alternatives (i.e., 80% of WGCTA items are 2-alternative, multiple-choice), a change that was hypothesized to result in more desirable test information and standard-error functions. Analyses using the 3-parameter logistic item response theory (IRT) model in a sample of undergraduates (N = 407) supported this prediction, even when the WAT item pool was shortened to match the length of the WGCTA. Convergent validity between full-pool IRT score estimates was r = .69. Implications for subsequent research on IRT-based measurement of critical thinking are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

20.
Although the dangers associated with factoring test scales containing overlapping items (items used on more than 1 scale) have been pointed out by Guilford (1952), factor studies of scales embodying item overlap continue. The present study explores the possibility that the neurotic triad and the psychotic triad or tetrad factors found in 4 studies derive from the existence of a methodological artifact associated with item overlap. To test this possibility, MMPI interscale common-element correlations (produced solely by item overlap) were factor analyzed. 2 of 3 factors extracted are highly similar to neurotic triad and psychotic triad or tetrad factors found for 4 samples. These 2 factors do not appear in a factor analysis of truncated (overlap items removed) MMPI scale scores. Since the overlap factors are based solely on the scale intercorrelations due to overlap items, these results appear to support Guilford's warning and open to question the legitimacy of these MMPI factors. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号