首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Speech comprehension is resistant to acoustic distortion in the input, reflecting listeners' ability to adjust perceptual processes to match the speech input. This adjustment is reflected in improved comprehension of distorted speech with experience. For noise vocoding, a manipulation that removes spectral detail from speech, listeners' word report showed a significantly greater improvement over trials for listeners that heard clear speech presentations before rather than after hearing distorted speech (clear-then-distorted compared with distorted-then-clear feedback, in Experiment 1). This perceptual learning generalized to untrained words suggesting a sublexical locus for learning and was equivalent for word and nonword training stimuli (Experiment 2). These findings point to the crucial involvement of phonological short-term memory and top-down processes in the perceptual learning of noise-vocoded speech. Similar processes may facilitate comprehension of speech in an unfamiliar accent or following cochlear implantation. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

2.
Speech comprehension is resistant to acoustic distortion in the input, reflecting listeners' ability to adjust perceptual processes to match the speech input. For noise-vocoded sentences, a manipulation that removes spectral detail from speech, listeners' reporting improved from near 0% to 70% correct over 30 sentences (Experiment 1). Learning was enhanced if listeners heard distorted sentences while they knew the identity of the undistorted target (Experiments 2 and 3). Learning was absent when listeners were trained with nonword sentences (Experiments 4 and 5), although the meaning of the training sentences did not affect learning (Experiment 5). Perceptual learning of noise-vocoded speech depends on higher level information, consistent with top-down, lexically driven learning. Similar processes may facilitate comprehension of speech in an unfamiliar accent or following cochlear implantation. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

3.
The judgment of annoyance of distorted speech differs radically for different language groups. The results show that those who do comprehend a spoken language, base their annoyance-judgments on the informational content extracted while those who do not base it on the perceptual characteristics of meaningless sound (particularly loudness). A series of distorted German speech sounds were presented to two subject groups consisting of native Swedish and English speakers, and the results were compared with earlier results from groups of native German and Polish subjects. The 50 stimuli were generated from the very same speech signal distorted in two principle ways, either with repeated silent gaps or superimposed noise impulses. The perceived annoyance of the distorted speech was judged both by category scaling for all subject groups, and as a control for "ceiling" effects, also by magnitude estimation for the Swedish and the English subjects. There is a pronounced tendency for German subjects to judge the German speech distorted with silent gaps as more annoying than that distorted with superimposed noise impulses. In contrast, the Swedish, English, and Polish subjects judged the two German-speech distortions in reversed order with regard to annoyance. Thus for noncomprehending listeners, noise-distorted speech is more annoying but for comprehending listeners it is speech distorted by gaps. This means that impaired communication intrusiveness rather than loudness predominates in annoyance judgments from comprehending listeners.  相似文献   

4.
5.
Dutch listeners were exposed to the English theta sound (as in bath), which replaced [f] in /f/-final Dutch words or, for another group, [s] in /s/-final words. A subsequent identity-priming task showed that participants had learned to interpret theta as, respectively, /f/ or /s/. Priming effects were equally strong when the exposure sound was an ambiguous [fs]-mixture and when primes contained unambiguous fricatives. When the exposure sound was signal-correlated noise, listeners interpreted it as the spectrally similar /f/, irrespective of lexical bias during exposure. Perceptual learning about speech is thus constrained by spectral similarity between the input and established phonological categories, but within those limits, adjustments are thorough enough that even nonnative sounds can be treated fully as native sounds. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

6.
Models of compensation for phonological variation in spoken word recognition differ in their ability to accommodate complete assimilatory alternations (such as run assimilating fully to rum in the context of a quick run picks you up). Two experiments addressed whether such complete changes can be observed in casual speech, and if so, whether they trigger perceptual compensation. Experiment 1 used recordings of naive speakers and found that the presence of following context supporting place assimilation led to an increase in miscommunication rate when listeners were asked to identify the potentially assimilated words. This result was also obtained when trained phoneticians gave their considered judgments of a subset of the stimuli. Experiment 2 examined the extent to which words articulated correctly by naive speakers (e.g., rum) would be perceived as assimilated and found that compensation for assimilation in these stimuli depended on the type of following phonemic context and the semantic fit with the preceding sentence. These results suggest that place assimilation does involve complete alternations and that the perceptual system can compensate for them in certain circumstances. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

7.
8.
Listeners rapidly adapt to many forms of degraded speech. What level of information drives this adaptation, however, remains unresolved. The current study exposed listeners to sinewave-vocoded speech in one of three languages, which manipulated the type of information shared between the training languages (German, Mandarin, or English) and the testing language (English) in an audio-visual (AV) or an audio plus still frames modality (A + Stills). Three control groups were included to assess procedural learning effects. After training, listeners' perception of novel sinewave-vocoded English sentences was tested. Listeners exposed to German-AV materials performed equivalently to listeners exposed to English AV or A + Stills materials and significantly better than two control groups. The Mandarin groups and German-A + Stills group showed an intermediate level of performance. These results suggest that full lexical access is not absolutely necessary for adaptation to degraded speech, but providing AV-training in a language that is similar phonetically to the testing language can facilitate adaptation. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

9.
In 3 experiments with a total of 16 Ss, we explored how pigeons learn to classify diverse pictures of cats, flowers, cars, and chairs and later how they accurately categorize brand-new pictures from these classes. Using a 4-key forced-choice procedure, Ss in Exp 1 discriminated individual examples within each of the categories from one another (subcategory training); nevertheless, errors were disproportionately conceptual in nature, with Ss more likely to confuse examples within a given category than between different categories. Ss in Exp 2 trained to classify pictures into human language categories (category training) learned far faster and more completely than Ss trained to sort the same pictures into totally arbitrary groupings (pseudocategory training). Finally, in Exp 3, category-trained and subcategory-trained Ss were tested on normally oriented pictures, on left–right reversals, and on top–bottom reversals. Subcategory-trained Ss responded less accurately on both kinds of reversed pictures and less accurately on top–bottom than on left–right reversals; category-trained Ss were less affected by both types of picture reversals, only top–bottom reversals decrementing their performance. Results suggest that many words in our language denote clusters of related visual stimuli, which pigeons also see as highly similar. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

10.
90 words connoting mood were scaled for degree of elation or depression. 70 of the words were selected as the stimuli in a study of mediated stimulus generalization. For 2 groups of men and 2 groups of women the training stimuli were extremely elated words, and for 2 different groups of men and women, the training stimuli were extremely depressed words. In each of the 2 groups for both sexes, one group was reinforced for whispering and the other for shouting. After training all groups received a generalization series consisting of words varying in degree of elation or depression. Ss trained to shout elated and whisper depressed stimuli produced steeper mediated stimulus generalization gradients than Ss trained to whisper elated and shout depressed stimuli. The Shout-Depressed group produced partially inverted gradients. The results were consistent with an asymmetrical Matching Principle: with connotative stimuli there is a strong tendency to make an intense response to an intense stimulus and a moderate tendency to make a weak response to a weak stimulus. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

11.
Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important are the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context dependent. This study assessed the informational assumptions of several models of speech categorization, in particular, the number of cues that are the basis of categorization and whether these cues represent the input veridically or have undergone compensation. We collected a corpus of 2,880 fricative productions (Jongman, Wayland, & Wong, 2000) spanning many talker and vowel contexts and measured 24 cues for each. A subset was also presented to listeners in an 8AFC phoneme categorization task. We then trained a common classification model based on logistic regression to categorize the fricative from the cue values and manipulated the information in the training set to contrast (a) models based on a small number of invariant cues, (b) models using all cues without compensation, and (c) models in which cues underwent compensation for contextual factors. Compensation was modeled by computing cues relative to expectations (C-CuRE), a new approach to compensation that preserves fine-grained detail in the signal. Only the compensation model achieved a similar accuracy to listeners and showed the same effects of context. Thus, even simple categorization metrics can overcome the variability in speech when sufficient information is available and compensation schemes like C-CuRE are employed. (PsycINFO Database Record (c) 2011 APA, all rights reserved)  相似文献   

12.
Recent work demonstrates that learning to understand noise-vocoded (NV) speech alters sublexical perceptual processes but is enhanced by the simultaneous provision of higher-level, phonological, but not lexical content (Hervais-Adelman, Davis, Johnsrude, & Carlyon, 2008), consistent with top-down learning (Davis, Johnsrude, Hervais-Adelman, Taylor, & McGettigan, 2005; Hervais-Adelman et al., 2008). Here, we investigate whether training listeners with specific types of NV speech improves intelligibility of vocoded speech with different acoustic characteristics. Transfer of perceptual learning would provide evidence for abstraction from variable properties of the speech input. In Experiment 1, we demonstrate that learning of NV speech in one frequency region generalizes to an untrained frequency region. In Experiment 2, we assessed generalization among three carrier signals used to create NV speech: noise bands, pulse trains, and sine waves. Stimuli created using these three carriers possess the same slow, time-varying amplitude information and are equated for na?ve intelligibility but differ in their temporal fine structure. Perceptual learning generalized partially, but not completely, among different carrier signals. These results delimit the functional and neural locus of perceptual learning of vocoded speech. Generalization across frequency regions suggests that learning occurs at a stage of processing at which some abstraction from the physical signal has occurred, while incomplete transfer across carriers indicates that learning occurs at a stage of processing that is sensitive to acoustic features critical for speech perception (e.g., noise, periodicity). (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

13.
To compare the properties of inner and overt speech, Oppenheim and Dell (2008) counted participants' self-reported speech errors when reciting tongue twisters either overtly or silently and found a bias toward substituting phonemes that resulted in words in both conditions, but a bias toward substituting similar phonemes only when speech was overt. Here, we report 3 experiments revisiting their conclusion that inner speech remains underspecified at the subphonemic level, which they simulated within an activation-feedback framework. In 2 experiments, participants recited tongue twisters that could result in the errorful substitutions of similar or dissimilar phonemes to form real words or nonwords. Both experiments included an auditory masking condition, to gauge the possible impact of loss of auditory feedback on the accuracy of self-reporting of speech errors. In Experiment 1, the stimuli were composed entirely from real words, whereas, in Experiment 2, half the tokens used were nonwords. Although masking did not have any effects, participants were more likely to report substitutions of similar phonemes in both experiments, in inner as well as overt speech. This pattern of results was confirmed in a 3rd experiment using the real-word materials from Oppenheim and Dell (in press). In addition to these findings, a lexical bias effect found in Experiments 1 and 3 disappeared in Experiment 2. Our findings support a view in which plans for inner speech are indeed specified at the feature level, even when there is no intention to articulate words overtly, and in which editing of the plan for errors is implicated. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

14.
Much of speech perception research has focused on brief spectro-temporal properties in the signal, but some studies have shown that adults can recover linguistic form when those properties are absent. In this experiment, 7-year-old English-speaking children demonstrated adultlike abilities to understand speech when only sine waves (SWs) replicating the 3 lowest resonances of the vocal tract were presented, but they failed to demonstrate comparable abilities when noise bands amplitude-modulated with envelopes derived from the same signals were presented. In contrast, adults who were not native English speakers but who were competent 2nd-language learners were worse at understanding both kinds of stimuli than native English-speaking adults. Results showed that children learn to extract linguistic form from signals that preserve some spectral structure, even if degraded, before they learn to do so for signals that preserve only amplitude structure. The authors hypothesize that children’s early sensitivity to global spectral structure reflects the role that it may play in language learning. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

15.
In 5 experiments, the authors investigated how listeners learn to recognize unfamiliar talkers and how experience with specific utterances generalizes to novel instances. Listeners were trained over several days to identify 10 talkers from natural, sinewave, or reversed speech sentences. The sinewave signals preserved phonetic and some suprasegmental properties while eliminating natural vocal quality. In contrast, the reversed speech signals preserved vocal quality while distorting temporally based phonetic properties. The training results indicate that listeners learned to identify talkers even from acoustic signals lacking natural vocal quality. Generalization performance varied across the different signals and depended on the salience of phonetic information. The results suggest similarities in the phonetic attributes underlying talker recognition and phonetic perception. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

16.
C. T. Best, M. Studdert-Kennedy, S. Manuel, and J. Rubin-Spitz (1989) reported that listeners given speech labels showed categorical-like perception of a series of complex tone analogs to a /la/-/ra/ speech series, whereas nonspeech listeners were unable to classify the stimuli consistently. In 2 experiments, a new training and testing procedure was used with adult listeners given nonspeech instructions. They classified the /la/-/ra/ tone analogs consistently, showed categorical-like perception, and generalized their training to a new, /li/-/ri/ tone analog series. Two sets of auditory attributes were described for coding the /l/-/r/ distinction, and 1 was shown to quantitatively predict listeners' classification of both series. These results are consistent with models of perception in which a rich, abstract auditory code is computed and forms the basis for both speech and nonspeech auditory categories. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

17.
A central question in psycholinguistic research is how listeners isolate words from connected speech despite the paucity of clear word-boundary cues in the signal. A large body of empirical evidence indicates that word segmentation is promoted by both lexical (knowledge-derived) and sublexical (signal-derived) cues. However, an account of how these cues operate in combination or in conflict is lacking. The present study fills this gap by assessing speech segmentation when cues are systematically pitted against each other. The results demonstrate that listeners do not assign the same power to all segmentation cues; rather, cues are hierarchically integrated, with descending weights allocated to lexical, segmental, and prosodic cues. Lower level cues drive segmentation when the interpretive conditions are altered by a lack of contextual and lexical information or by white noise. Taken together, the results call for an integrated, hierarchical, and signal-contingent approach to speech segmentation. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

18.
This study was designed to determine if sign-object and sign-word training would lead to acquisition of word-object associations and to test the proposal that if two stimuli control the same response, training a new response to one of the stimuli would increase the probability of the second stimulus also controlling that response. The participants were six institutionalized retarded males, each having some receptive and productive speech as well as imitative motor and verbal skills. Nonsense words, signs, and objects were used as the stimuli in this study. All participants were sequentially trained to: (a) pair the objects with their identical matches, (b) imitate the manual signs, (c) pair the manual signs with the objects, (d) imitate the nonsense words, and (e) pair the manual signs with the words. Following this training, participants were given receptive and productive word-object association probes. All participants performed at an 87 percent correct level or better on the first receptive probes, and all performed at a 73 percent correct level or better on the first productive probes. These individuals demonstrated that following sign-object and sign-word training, they could correctly associate the word with the object.  相似文献   

19.
Memory for music: Effect of melody on recall of text.   总被引:1,自引:0,他引:1  
The melody of a song, in some situations, can facilitate learning and recall. The experiments in this article demonstrate that text is better recalled when it is heard as a song rather than as speech, provided the music repeats so that it is easily learned. When Ss heard 3 verses of a text sung with the same melody, they had better recall than when the same text was spoken. However, the opposite occurred when Ss heard a single verse of a text sung or when Ss heard different melodies for each verse of a song; in these instances, Ss had better recall when the text was spoken. Furthermore, the experiments indicate that the melody contributes more than just rhythmic information. Music is a rich structure that chunks words and phrases, identifies line lengths, identifies stress patterns, and adds emphasis as well as focuses listeners on surface characteristics. The musical structure can assist in learning, in retrieving, and if necessary, in reconstructing a text. (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

20.
The contribution of reduced speaking rate to the intelligibility of "clear" speech (Picheny, Durlach, & Braida, 1985) was evaluated by adjusting the durations of speech segments (a) via nonuniform signal time-scaling, (b) by deleting and inserting pauses, and (c) by eliciting materials from a professional speaker at a wide range of speaking rates. Key words in clearly spoken nonsense sentences were substantially more intelligible than those spoken conversationally (15 points) when presented in quiet for listeners with sensorineural impairments and when presented in a noise background to listeners with normal hearing. Repeated presentation of conversational materials also improved scores (6 points). However, degradations introduced by segment-by-segment time-scaling rendered this time-scaling technique problematic as a means of converting speaking styles. Scores for key words excised from these materials and presented in isolation generally exhibited the same trends as in sentence contexts. Manipulation of pause structure reduced scores both when additional pauses were introduced into conversational sentences and when pauses were deleted from clear sentences. Key-word scores for materials produced by a professional talker were inversely correlated with speaking rate, but conversational rate scores did not approach those of clear speech for other talkers. In all experiments, listeners with normal hearing exposed to flat-spectrum background noise performed similarly to listeners with hearing loss.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号