首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Abstract pictures, such as artistic drawings, may evoke subtle emotions in their observers via aesthetic experiences. We examined among 44 participants the emotional responses as measured by facial electromyography (EMG) to aesthetic background images that varied on the emotional valence (unpleasant to pleasant) and arousal (calming to exciting) dimensions and were presented both separately and as background images for news messages. Effects of image arousal on free recall of news messages were also examined. A priori pleasant compared to unpleasant images were associated with higher orbicularis oculi EMG responses, both when presented alone and when presented as news message backgrounds. Analyses based on the participants' subjective pleasantness ratings also showed greater corrugator supercilii EMG activity for unpleasant compared to pleasant images. High-arousal as compared to low-arousal images improved recall for the superimposed news messages. In contrast, recall was not affected by a priori image valence or subjective pleasantness ratings. The results demonstrate that abstract images can be used to evoke emotional responses in the viewers that persist even when unrelated messages are superimposed on them. Similarly, high-arousal images can be used to enhance memory for superimposed textual messages.  相似文献   

3.
This research examined how off-angle or oblique viewing of a VDU screen and the physical location of a message on the screen influenced message legibility. Eight trained subjects viewed five-character long common words, number strings, and alphanumeric messages presented at 15 different combinations of oblique viewing angle and location of message on the VDU screen for 2-5 seconds. It was found that common words and number strings showed little overall loss in legibility except when oblique viewing angle exceeded ± 32°. Alphanumeric messages were found to have a significantly lower legibility than the common words and number strings. It was recommended that for best overall legibility of the three types of messages studied here, that oblique viewing angles be less than ± 32°. Predictor equations were also developed to aid in predicting loss of accuracy based on the type of message and physical features of the viewing task.  相似文献   

4.
Tsimhoni O  Smith D  Green P 《Human factors》2004,46(4):600-610
A driving simulator experiment was conducted to determine the effects of entering addresses into a navigation system during driving. Participants drove on roads of varying visual demand while entering addresses. Three address entry methods were explored: word-based speech recognition, character-based speech recognition, and typing on a touch-screen keyboard. For each method, vehicle control and task measures, glance timing, and subjective ratings were examined. During driving, word-based speech recognition yielded the shortest total task time (15.3 s), followed by character-based speech recognition (41.0 s) and touch-screen keyboard (86.0 s). The standard deviation of lateral position when performing keyboard entry (0.21 m) was 60% higher than that for all other address entry methods (0.13 m). Degradation of vehicle control associated with address entry using a touch screen suggests that the use of speech recognition is favorable. Speech recognition systems with visual feedback, however, even with excellent accuracy, are not without performance consequences. Applications of this research include the design of in-vehicle navigation systems as well as other systems requiring significant driver input, such as E-mail, the Internet, and text messaging.  相似文献   

5.
OBJECTIVES: We investigated whether context or different speech rates could improve older adult performance on identification of synthetically generated words. BACKGROUND: Synthetic speech systems can potentially improve the daily functioning of older adults. However, research must determine whether older adults can effectively implement current text-to-speech technologies, which few studies have examined. Older adults' sensory and cognitive declines may cause difficulties in identifying words in synthetic speech. METHODS: Ninety-six participants (young, middle-aged, and older adults) identified auditory monosyllabic words (half natural, half synthetic) presented in isolation or at the ends of sentences. Participants heard speech at either normal or slower rates. RESULTS: We found an interaction of age, context, and voice type and that slower speech rates worsened performance for all groups. Contrasts revealed that context reduced age differences, though only for natural speech. Hearing acuity was highly correlated with age and fully accounts for the interaction. CONCLUSIONS: Context improves performance for everyone in natural speech. However, whereas context improves performance for synthetic speech, it does not differentially reduce the age impairment for older adults. Slower speed generally impairs everyone's performance compared with the normal rate. APPLICATIONS: Systems using synthetic speech should avoid presenting words in isolation, and rich contextual support should be consistently adopted. Synthetic speech fidelity must be improved significantly before becoming truly useful for older adult populations.  相似文献   

6.
Affective responses of users to system messages in human–computer interaction are a key to study user satisfaction. However, little is known about the particular affective patterns elicited by various types of system messages. In this experimental study we examined if and how different system messages, presented in different modalities, influence users’ affective responses. Three types of messages, input requests, status notifications, and error messages, were presented either as text or speech, and either alone or in combination with icons or sounds, while users worked on several typical computer tasks. Affective responses following system messages were assessed employing a multi-modal approach, using subjective rating scales as well as physiological measures. Results show that affective responses vary systematically depending on the type of message, and that spoken messages generally elicit more positive affect than written messages. Implications on how to enhance user satisfaction by appropriate message design are discussed.  相似文献   

7.
Past research has demonstrated that there are cognitive processing costs associated with comprehension of speech generated by text-to-speech synthesizers, relative to comprehension of natural speech. This finding has important performance implications for the many applications that use such systems. The purpose of this study was to ascertain whether certain characteristics of synthetic speech slow on-line, real-time cognitive processing. Whereas past research has focused on the phonemic acoustic structure of synthetic speech, we manipulated prosodic, syntactic, and semantic cues in a task requiring participants to recall sentences spoken either by a human or by one of two speech synthesizers. The findings were interpreted to suggest that inappropriate prosodic modeling in synthetic speech was the major source of a performance differential between natural and synthetic speech. Prosodic cues, along with others, guide the parsing of speech and provide redundancy. When these cues are absent or inaccurate, the additional burden placed on working memory may exceed its capacity, particularly in time-limited, demanding tasks. Actual or potential applications of this research include improvement of text-to-speech output systems in warning systems, feedback devices in aerospace vehicles, educational and training modules, aids for the handicapped, consumer products, and technologies designed to increase the functional independence of older adults.  相似文献   

8.
9.
This study investigated the effects of display method, number of message lines and text colour on comprehension performance and subjective preferences for reading Chinese on a light‐emitting diode display. The factors and levels studied were: two text display methods (rapid serial visual presentation ‘rapid serial visual presentation (RSVP)’ and paged view scrolling), four numbers of message lines displayed at a time (one to four lines) and three text colours (amber, green and red). The RSVP method resulted in higher comprehension scores than the paged view scrolling method, and the green text produced better comprehension score than amber or red. However, the paged view scrolling received better subjective evaluation ratings than the RSVP method. A multiline display was found to be superior to a single‐line display for both comprehension scores and subjective evaluations. The results here provide useful ergonomics recommendations for choice of appropriate display method and format setting for presenting Chinese messages on light‐emitting diode displays.  相似文献   

10.
The advent of mobile devices and the wireless Internet is having a profound impact on the way people communicate, as well as on the user interaction paradigms used to access information that was traditionally accessible only through visual interfaces. Applications for mobile devices entail the integration of various data sources optimized for delivery to limited hardware resources and intermittently connected devices through wireless networks. Although telephone interfaces arise as one of the most prominent pervasive applications, they present interaction challenges such as the augmentation of speech recognition through natural language (NL) understanding and high-quality text-to-speech conversion. This article presents an experience in building an automated assistant that is natural to use and could become an alternative to a human assistant. The Mobile Assistant (MA) can read e-mail messages, book appointments, take phone messages, and provide access to personal-organizer information. Key components are a conversational interface, enterprise integration, and notifications tailored to user preferences. The focus of the research has been on supporting the pressing communication needs of mobile workers and overcoming technological hurdles such as achieving high accuracy speech recognition in noisy environments, NL understanding, and optimal message presentation on a variety of devices and modalities. The article outlines findings from the 2 broad field trials and lessons learned regarding the support of mobile workers with pervasive computing devices and emerging technologies.  相似文献   

11.
Most intelligibility tests are based on the use of monosyllabictest stimuli. This constraint eliminates the ability to measurethe effects of lexical stress patterns, complex phonotacticorganizations, and morphological complexity on intelligibility.Since these aspects of lexical structure affect speechproduction (e.g., by changing syllable duration), it is likelythat they affect the structure of acoustic-phonetic patterns.Thus, to the extent that text-to-speech systems fail to modifyacoustic-phonetic patterns appropriately in polysyllabic words,intelligibility may suffer. This means that while most standardintelligibility tests may accurately estimate theintelligibility of monosyllabic words, this estimate may notgeneralize as well to predict the intelligibility of words withmore complex lexical structures. The present study was carriedout to measure how words varying in lexical complexity differ inintelligibility. Monosyllabic, bisyllabic, and polysyllabicwords were used varying in morphological complexity(monomorphemic or polymorphemic). Listeners transcribed thesestimuli spoken by two human talkers and two text-to-speechsystems varying in speech quality. The results indicate thatlexical complexity does affect the measured intelligibility ofsynthetic speech and should be manipulated in order toaccurately predict the performance of text-to-speech systemswith unrestricted natural text.  相似文献   

12.
Spoken European Portuguese (EP) is known to be difficult to understand for L2 learners, due to phenomena such as strong vowel reduction. In this paper, we present a method to automatically generate exercises aimed at improving listening comprehension skills in EP. Learners identify the words pronounced in real speech utterances. The exercises introduce two innovative aspects: using broadcast news videos for curriculum and automatically generating exercises with material updated on a daily basis. The videos are automatically transcribed by a speech recognition engine. A filtering chain, used to select appropriate sentences, was validated by a first survey comprised of both manually and automatically selected sentences. Both sets were assigned good to very good subjective quality scores. A second survey concerned the features of the exercise interface. Subjects with varying self-reported exposure to Portuguese as a second language tested several interfaces and functionalities and highlighted their preferred features. The results confirmed that the largest difficulty was the fast speech rate. All participants valued slowed-down audio and video documents, though this feature was more often used by the lowest proficiency subjects. The exercises were integrated into a Web platform where they are automatically updated daily. Though further evaluation is needed to find whether the platform affords skill acquisition, it is expected to be particularly valuable for distance learners who need opportunities to access authentic audio documents in EP.  相似文献   

13.
A general method which combines formant synthesis by rule and time-domain concatenation is proposed. This method utilizes the advantages of both techniques by maintaining naturalness while minimizing difficulties such as prosodic modification and spectral discontinuities at the point of concatenation. An integrated sampled natural glottal source (Matsui et al., 1991) and sampled voiceless consonants were incorporated into a real-time text-to-speech formant synthesizer. In special cases, voicing amplitude envelopes and formant transitions dirived from natural speech were also utilized. Several listening tests were performed to evaluate these methods. We obtained a significant overall improvement in intelligibility over our previous formant synthesizer. Such improvements in intelligibility were previously obtained with a Japanese text-to-speech system using a related hybrid system (Kamai and Matsui, 1993), indicating the applicability of this method for multi-lingual synthesis. The results of subjective analyses showed that these methods can alo improve naturalness and listenability factors.  相似文献   

14.
Bill Boni 《Network Security》2001,2001(2):19-20
There is a rising tide of concern expressed by many commentators, concerning the advent of ever-broader workplace surveillance. In many places, especially the UK, video surveillance cameras have succeeded in reducing crimes against people and property, or at least in driving the crimes into the less desirable parts of the cities. There is now increased deployment of various products and technologies that allow employers to monitor not only employee’s access and use of the Internet, but also to track and examine the contents of electronic mail messages. Indeed the recent example of hapless white collar employees sacked for misuse of their company E-mail to send risque or outright pornographic contents to friends and colleagues provides an example of the current state of electronic surveillance. In another case, Microsoft was tipped off about internal intruders only because they were monitoring outbound E-mail messages for their network and detected the presence of passwords in a message directed to an address in St. Petersburg, Russia.  相似文献   

15.
Can synthetic speech be utilized in foreign language learning as natural speech? In this paper, we evaluated synthetic speech from the viewpoint of learners in order to find out an answer. The results pointed out that learners do not recognize remarkable differences between synthetic voices and natural voices for the words with short vowels and long vowels when they try to understand the meanings of the sounds. The data explicates that synthetic voice utterances of sentences are easier to understand and more acceptable by learners compared to synthetic voice utterances of words. In addition, the ratings on both synthetic voices and natural voices strongly depend upon the learners’ listening comprehension abilities. We conclude that some synthetic speech with specific pronunciations of vowels may be suitable for listening materials and suggest that evaluating TTS systems by comparing synthetic speech with natural speech and building a lexical database of synthetic speech that closely approximates natural speech will be helpful for teachers to readily use many existing CALL tools.  相似文献   

16.
A large-scale, cluster-randomized controlled field trial (Nclassrooms = 47; Nstudents = 1,013) assessed the impact of a digital text-to-speech reading material that supported 8-year-olds’ decoding and reading comprehension. An active control group used the most prevalent Danish learning material with a research-based systematic, explicit phonics approach supporting primarily decoding. The digital tool allows children to read unfamiliar text for meaning. Students are supported in mapping between orthography and phonology by three levels of text-to-speech support and in identifying spelling patterns. The risk of students overusing text-to-speech was countered by postponing access to having words read aloud by directing students towards identifying and training relevant orthographic patterns before activating text-to-speech. Results showed no statistically significant difference in decoding, but treatment improved reading comprehension. The study demonstrates how digital tools can facilitate strengthening students' decoding skills as efficiently as a traditional phonics-based programme while students are reading text of relatively high orthographic complexity for meaning.  相似文献   

17.
The dual task is a data-rich paradigm for evaluating speech modes of a synthetic talking head. Three experiments manipulated auditory–visual (AV) and auditory-only (A-only) speech produced by text-to-speech synthesis from a talking head (Experiment 1—single task; Experiment 2—dual task), and natural speech produced by a human male similar in appearance to the talking head (Experiment 3—dual task). In a dual task, participants perform two tasks concurrently with a secondary reaction time (RT) task sensitive to cognitive processing demands of the primary task. In the primary task, participants either shadowed words or named the superordinate categories to which words belonged under AV (dynamic face with lips moving) or A-only (static face) speech modes. First, it was hypothesized that category naming is more difficult than shadowing. The hypothesis was supported in each experiment with significantly longer latencies on the primary task and slower RT on the secondary task. Second, an AV advantage was hypothesized and supported by significantly shorter latencies for the AV modality on the primary task of Experiment 3 and with partial support in Experiment 1. Third, it was hypothesized that while the AV modality helps it also creates great cognitive load. Significantly longer RT for AV presentation in the secondary tasks supported this hypothesis. The results indicate that task difficulty influences speech perception. Performance on a secondary task can reveal cognitive demand that is not evident in a single task or self-report ratings. A dual task will be an effective evaluation tool in operational environments where multiple tasks are conducted (e.g., responding to spoken directions and monitoring displays) and an implicit, sensitive measure of cognitive load is imperative.  相似文献   

18.
提出了一种基于层叠条件随机场进行救灾机器人自然语言导航命令理解的方法。该方法由三层条件随机场(CRFs)构成:第一层用于导航词性标注,选取词、词性以及上下文作为特征模板生成导航词性标签;第二层用于导航过程提取,选择词、导航词性标签以及上下文构建特征模板生成导航过程标签;第三层用于起点终点识别,选取词、导航词性标签、导航过程标签以及上下文构建特征模板判断出地名词为起点还是终点。根据导航词性与导航要素的对应关系便可从命令中提取出导航信息。该方法能够处理完全不受限的自然语言导航命令,总体正确率达到78.6%,无需依赖特定的指令与地图,对完成救灾机器人导航的人机交互任务具有重要意义。  相似文献   

19.
This work focuses on the development of expressive text-to-speech synthesis techniques for a Chinese spoken dialog system, where the expressivity is driven by the message content. We adapt the three-dimensional pleasure-displeasure, arousal-nonarousal and dominance-submissiveness (PAD) model for describing expressivity in input text semantics. The context of our study is based on response messages generated by a spoken dialog system in the tourist information domain. We use the $P$ (pleasure) and $A$ (arousal) dimensions to describe expressivity at the prosodic word level based on lexical semantics. The $D$ (dominance) dimension is used to describe expressivity at the utterance level based on dialog acts. We analyze contrastive (neutral versus expressive) speech recordings to develop a nonlinear perturbation model that incorporates the PAD values of a response message to transform neutral speech into expressive speech. Two levels of perturbations are implemented—local perturbation at the prosodic word level, as well as global perturbation at the utterance level. Perceptual experiments involving 14 subjects indicate that the proposed approach can significantly enhance expressivity in response generation for a spoken dialog system.   相似文献   

20.
As mobile office technology becomes more advanced, drivers have increased opportunity to process information "on the move." Although speech-based interfaces can minimize direct interference with driving, the cognitive demands associated with such systems may still cause distraction. We studied the effects on driving performance of an in-vehicle simulated "E-mail" message system; E-mails were either system controlled or driver controlled. A high-fidelity, fixed-base driving simulator was used to test 19 participants on a car-following task. Virtual traffic scenarios varying in driving demand. Drivers compensated for the secondary task by adopting longer headways but showed reduced anticipation of braking requirements and shorter time to collision. Drivers were also less reactive when processing E-mails, demonstrated by a reduction in steering wheel inputs. In most circumstances, there were advantages in providing drivers with control over when E-mails were opened. However, during periods without E-mail interaction in demanding traffic scenarios, drivers showed reduced braking anticipation. This may be a result of increased cognitive costs associated with the decision making process when using a driver-controlled interface when the task of scheduling E-mail acceptance is added to those of driving and E-mail response. Actual or potential applications of this research include the design of speech-based in-vehicle messaging systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号