期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Carrot and stick 2.0: The benefits of natural and motivational prosody in computer-assisted learning

《Computers in human behavior》2015

For acquiring new skills or knowledge, contemporary learners frequently rely on the help of educational technologies supplementing human teachers as a learning aid. In the interaction with such systems, speech-based communication between the human user and the technical system has increasingly gained importance. Since spoken computer output can take on a variety of forms depending on the method of speech generation and the employment of prosodic modulations, the effects of such auditory variations on the user’s learning achievement require systematic investigation. The experiment reported here examined the specific effects of speech generation method and prosody of spoken system feedback in a computer-supported learning environment, and may serve as validational tool for future investigations of spoken computer feedback effects on learning. Learning performance in a basic cognitive task was compared between users receiving pre-recorded, naturally spoken system feedback with neutral prosody, pre-recorded feedback with motivating (praising or blaming) prosody, or computer-synthesized feedback. The observed results provide empirical evidence that users of technical tutoring systems benefit from pre-recorded, naturally spoken feedback, and do even more so from feedback with motivational prosodic modulations matching their performance success. Theoretical implications and considerations for future implementations of spoken feedback in computer-based educational systems are discussed. 相似文献

2.

对话意图及语音识别错误对交互体验的影响

杨明浩高廷丽陶建华张大伟孙梦伊李昊巢林林《软件学报》2016,27(S2):69-75

在自然人机对话中,由于环境噪声、方言口音等因素带来的语音识别错误以及语义分析的不充分等原因,计算机在理解用户交互意图时出现偏差,使得计算机对要反馈的话题出现错误,造成人机对话进程的断裂.以面向咖啡为主题的漫谈式人机对话为例,将对话中断分为3种情况：话题反馈不当引起中断、话题正确情况下的模糊反馈不当和精确反馈不当引起中断.根据用户与计算机对话的记录分析比较上述3种情况下人机对话进程断裂情况.统计数据结果表明,话题反馈不当带来的对话中断最为明显,在对话进程断裂情况中达到了60.1%的比例;在话题反馈正确情况下,模糊回答不当和精确回答不当带来的话题中断比例分别为22.2%和21.6%;在语音识别错误情况下,语义分析会带来数量更大的反馈错误.实验数据分析结果表明,在语音识别错误情况下,根据上下文信息提高计算机对用户话题反馈的准确率,能够有效降低人机对话的中断,提高人机对话的自然度.该工作为自然人机对话的意图分类重要性提供了数据分析和实验论证. 相似文献

3.

Comparison of speech input and manual control of in-car devices while on the move

Robert Graham Chris Carter 《Personal and Ubiquitous Computing》2000,4(2-3):155-164

Speech recognition has a number of potential advantages over traditional manual controls for the operation of in-car and other mobile devices. Two laboratory experiments aimed to test these proposed benefits, and to optimise the design of future speech interfaces. Participants carried out tasks with a phone or in-car enteratainment system, while engaged in a concurrent driving task. Speech input reduced the adverse effects of system operation on driving performance, but manual control led to faster transaction times and improved task accuracy. Explicit feedback of the recognition results was found to be necessary, with audio-only feedback leading to better task performance than combined audio-plus-visual. It is recommended that speech technology is incorporated into the user interface as a redundant alternative to manual operation. However, the importance of good human factors in the design of speech dialogues is emphasised. 相似文献

4.

Flattery may get computers somewhere,sometimes: The moderating role of output modality,computer gender,and user gender

《International journal of human-computer studies》2008,66(11):789-800

This experiment extended the Computers Are Social Actors (CASA) paradigm by examining how output modality (text plus cartoon character vs. synthetic speech), computer gender (male vs. female), and user gender (male vs. female) moderate the ways in which people respond to computers that flatter. Specifically, participants played a trivia game with a computer, which they knew might provide incorrect answers. Participants in the generic-comment condition received strictly factual feedback, whereas those in the flattery condition were given additional remarks praising their performance. Consistent with Fogg and Nass [1997. Silicon sycophants: the effects of computers that flatter. International Journal of Human–Computer Studies 46, 551–561] study, flattery led to more positive overall impressions and performance evaluations of the computer, but such effects were found only in the text plus character condition and among women. In addition, flattery increased participants’ suspicion about the validity of the computer's feedback and lowered conformity to the computer's suggestions. Participants conformed more to the male than female computers when computer gender was manifested in gendered cartoon characters in the text condition, with no corresponding effects in the speech condition. Results suggest that synthetic speech output might suppress social responses to computers, such as flattery effects and gender stereotyping. 相似文献

5.

情感语音合成综述

李虎孬赵晖《现代计算机》2014,(7):31-37

情感语音合成作为一个新兴的语音合成方向,糅合生理学、心理学、语言学和信息科学等各学科知识,可以应用于文本阅读、信息查询发布和计算机辅助教学等领域,能够很好地将语音的口语分析、情感分析与计算机技术有机融合,为实现以人为本,具有个性化特征的语音合成系统奠定基础。目前的情感语音合成工作可分为基于规则合成和基于波形拼接合成两类。情感语音合成研究分为情感分析和语音合成两个部分。其中．情感分析的主要工作是收集不同情感的语音数据、提取声学特征,分析声学特征与情感联系;语音合成的主要工作是建立情感转换模型,利用情感转换模型实现合成。相似文献

6.

Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training

Fanbo Meng Zhiyong Wu Jia Jia Helen Meng Lianhong Cai 《Multimedia Tools and Applications》2014,73(1):463-489

Emphasis plays an important role in expressive speech synthesis in highlighting the focus of an utterance to draw the attention of the listener. We present a hidden Markov model (HMM)-based emphatic speech synthesis model. The ultimate objective is to synthesize corrective feedback in a computer-aided pronunciation training (CAPT) system. We first analyze contrastive (neutral versus emphatic) speech recording. The changes of the acoustic features of emphasis at different prosody locations and the local prominences of emphasis are analyzed. Based on the analysis, we develop a perturbation model that predicts the changes of the acoustic features from neutral to emphatic speech with high accuracy. Further based on the perturbation model we develop an HMM-based emphatic speech synthesis model. Different from the previous work, the HMM model is trained with neutral corpus, but the context features and additional acoustic-feature-related features are used during the growing of the decision tree. Then the output of the perturbation model can be used to supervise the HMM model to synthesize emphatic speeches instead of applying the perturbation model at the backend of a neutral speech synthesis model directly. In this way, the demand of emphasis corpus is reduced and the speech quality decreased by speech modification algorithm is avoided. The experiments indicate that the proposed emphatic speech synthesis model improves the emphasis quality of synthesized speech while keeping a high degree of the naturalness. 相似文献

7.

Computer voice output technology

《Data Processing》1984,26(8):30-34

Speech as a means of feedback from computer-based machines is now a reality. In particular, single chip speech synthesis devices and microprocessors have now made it possible to incorporate speaking modules into microsystems and many consumer products. This paper reviews developments in the computer voice output technology and guides system designers in the choice of components for talking modules. General design considerations are also outlined. 相似文献

8.

Optimization of string length for spoken digit input with error correction

《International journal of man-machine studies》1988,28(6):573-581

No matter how much the performance of speech recognition systems improves, it is unlikely that perfect recognition will always be possible in practical situations. Environmental sounds will interfere with the recognition. In such circumstances it is sensible to provide feedback so that any errors which occur may be detected and corrected. In some situations, such as when the eyes are busy or over the telephone, it is necessary to provide feedback auditorily. This takes time, so the most efficient procedure should be determined. In the case of entering digits into a computer the question arises as to whether feedback should be provided after each digit has been spoken or after a string of digits has been recognized. It has been be found that this depends upon the accuracy of the recognizer and on the times required for recognizing the utterances and for changing from recognizing to synthesizing speech. 相似文献

9.

Evaluation of Methods for the Determination of Factors Inducing Fatigue in Man at Work

《Ergonomics》2012,55(1):43-55

The aim of the study was to determine the influence of textual feedback on the content and outcome of spoken interaction with a natural language dialogue system. More specifically, the assumption that textual feedback could disrupt spoken interaction was tested in a human–computer dialogue situation. In total, 48 adult participants, familiar with the system, had to find restaurants based on simple or difficult scenarios using a real natural language service system in a speech-only (phone), speech plus textual dialogue history (multimodal) or text-only (web) modality. The linguistic contents of the dialogues differed as a function of modality, but were similar whether the textual feedback was included in the spoken condition or not. These results add to burgeoning research efforts on multimodal feedback, in suggesting that textual feedback may have little or no detrimental effect on information searching with a real system.

Statement of Relevance: The results suggest that adding textual feedback to interfaces for human–computer dialogue could enhance spoken interaction rather than create interference. The literature currently suggests that adding textual feedback to tasks that depend on the visual sense benefits human–computer interaction. The addition of textual output when the spoken modality is heavily taxed by the task was investigated. 相似文献

10.

基于数据驱动方法的汉语文本-可视语音合成 总被引：7，自引：0，他引：7

王志明蔡莲红艾海舟《软件学报》2005,16(6):1054-1063

计算机文本-可视语音合成系统(TTVS)可以增强语音的可懂度,并使人机交互界面变得更为友好.给出一个基于数据驱动方法(基于样本方法)的汉语文本-可视语音合成系统,通过将小段视频拼接生成新的可视语音.给出一种构造汉语声韵母视觉混淆树的有效方法,并提出了一个基于视觉混淆树和硬度因子的协同发音模型,模型可用于分析阶段的语料库选取和合成阶段的基元选取.对于拼接边界处两帧图像的明显差别,采用图像变形技术进行平滑并.结合已有的文本-语音合成系统(TTS),实现了一个中文文本视觉语音合成系统. 相似文献

11.

Digital formant speech synthesis utilizing wave digital filters

Karl Renner Someshwar C. Gupta 《Computers & Electrical Engineering》1973,1(2):211-225

Digital computer processing of speech is of much current interest. This paper examines the synthesis of speech utilizing the wave digital filter which has been shown to have low coefficient sensitivity properties and to generate smaller roundoff error than conventional filters. Also examined is the coefficient quantization in the digital formant speech synthesis model and how implementation with the wave filter may serve as a better alternative. Simulation and generation of speech confirm the feasibility and corresponding advantages of implementation with the wave digital filter compared to conventional filters. 相似文献

12.

MusicCube: a physical experience with digital music

Miguel Bruns Alonso David V. Keyson 《Personal and Ubiquitous Computing》2006,10(2-3):163-165

Listening to digital music on a computer has led to a loss of part of the physical experience associated with earlier media formats such as CDs and LPs. This paper presents a series of steps and decisions that led to the design of MusicCube, a tangible user interface that allows users to control digitally stored music on a computer by means of gestures and positioning. Interaction with the MusicCube is enriched by offering feedback through multi-coloured light effects and clicking sounds together with computer-generated speech. Despite some ergonomic shortcomings, when comparing to the iPod, users appreciated the design and enjoyed using it. 相似文献

13.

The more humanlike,the better? How speech type and users’ cognitive style affect social responses to computers

Eun-Ju Lee 《Computers in human behavior》2010

The present experiment investigated if anthropomorphic interfaces facilitate people’s tendency to project social expectations onto computers and how such effects might vary depending on users’ cognitive style. In a 2 (synthetic vs. recorded speech) × 2 (flattering vs. generic feedback) × 2 (low vs. high rationality) × 2 (low vs. high experientiality) experiment, participants played a trivia game with a computer. Use of recorded speech did not amplify the previously documented flattery effects (Fogg & Nass, 1997), challenging the notion that anthropomorphism will promote social responses to computers. Participants evaluated the human-voiced computer more positively and conformed more to its suggestions than the one using synthetic speech, but such effects were found only among less analytical or more intuition-driven individuals, suggesting dispositional differences in people’s susceptibility to anthropomorphic cues embedded in the interface. 相似文献

14.

文本－视觉语音合成综述 总被引：3，自引：1，他引：2

王志明陶建华《计算机研究与发展》2006,43(1):145-152

视觉信息对于理解语音的内容非常重要.不只是听力有障碍的人,普通人在交谈过程中也存在着一定程度的唇读,尤其是在语音质量受损的噪声环境下.正如文语转换系统可以使计算机像人一样讲话,文本－视觉语音合成系统可以使计算机模拟人类语音的双模态性,让计算机界面变得更为友好.回顾了文本－视觉语音合成的发展.文本驱动的视觉语音合成的实现方法可以分为两类：基于参数控制的方法和基于数据驱动的方法.详细介绍了参数控制类中的几个关键问题和数据驱动类中的几种不同实现方法,比较了这两类方法的优缺点及不同的适用环境. 相似文献

15.

语音用户界面研究进展

韩勇须德戴国忠《计算机科学》2004,31(6):1-4

语音是人们日常生活中高效、自然的交流方式之一。但是直到目前为止,语音交互方式在计算机技术上的应用还是比较少的。近年来,随着Ubiquitous Computing和便携式计算机的出现,再次对语音用户界面的应用提出了迫切的需求。而且语音识别、合成技术的发展也为语音交互界面的实现提供了技术基础。本文综合参考了国内外语音界面的一些应用系统实例以及语音这种独特的交流媒体的优点和局限性,总结了语音用户界面的适用环境和设计指导原则,并提出了对语音界面的发展展望。相似文献

16.

Speech technology and its applications: A technical overview

《Data Processing》1986,28(9):453-460

The first stage of capturing a speech signal is to change it from an acoustic signal into an electrical signal, using microphone and amplifier. This signal must be digitized before the computer can handle it, using sampling techniques.Speech is produced by capturing human speech patterns and analysing them for later reproduction by mechanical means. Text-to-speech synthesis is used in driving voice synthesizers.Speech recognition is based on the matching of frames of speech. Dynamic programming is a significant step forward in speech recognition.An application is currently being developed using speech recognition on a PABX. 相似文献

17.

人工智能技术在自然语音纠错与反馈系统设计中的应用

冯义金宇朱鹏《计算技术与自动化》2022,(2):184-188

为实现自然语音纠错,提升自然语音识别与拼读的正确率,研究人工智能技术在自然语音纠错与反馈系统设计中的应用。设计由前端学习单元与后端支撑单元组成的自然语音纠错与反馈系统,预处理采集到的自然语音片段,基于片段间距离划分因素,提取自然语音片段特征,采用隐马尔可夫模型识别自然语音,基于B2规范语料,采用动态时间归整方法纠错与评分识别到的自然语音,通过反馈模块将识别、纠错、评分结果反馈给用户。对比实验的结果表明,设计的自然语音纠错与反馈系统的语音识别率高于95%,纠错结果与实际错误一致,可提升自然语音拼读的正确率。相似文献

18.

LSF滤波器的实时实现 总被引：1，自引：0，他引：1

茆邦琴《计算机与网络》1999,(20)

语音信号处理技术在全数字化通信网、综合业务数字网中起十分重要的作用。文章介绍了LSF滤波器、LSF-LPC系数转换及IIR直接型滤波器的算法描述,在DSP56L811硬件上实时实现了上述算法,通过使用多种优化手段,有效地提高了运算速度,控制了存储消费。最后给出了硬件实现与理论值的比较结果。相似文献

19.

Persuasion and social perception of human vs. synthetic voice across person as source and computer as source conditions

《International journal of human-computer studies》2006,64(1):43-52

There is evidence that people react more positively when they are presented with faces that are consistent with their voices. Nass and Brave [2005]. Wired for speech: How voice Activates and Advances the Human–computer Relationship. MIT Press, Cambridge, MA] found that computerized and human faces were perceived more positively when paired, respectively, with synthesized versus human voices than when paired with inconsistent voices. The present study sought to examine whether this type of inconsistency would effect perceptions of persuasive messages delivered by humans versus computers. We created a situation in which reactions to computer synthesized speech were compared to human speech when the speech was either from a person or a computer. This paper presents two studies, one using audio taped stimuli and one using videotaped stimuli, with type of speech (human versus computer synthesized) manipulated factorially with source (person versus computer). As hypothesized, both studies suggest that in the human as source condition, human voice is perceived more favorably than synthetic voice. However, in the computer as source condition, both human and computer voice were rated similarly. We discuss these findings in terms of consistency as well as group processes effects that may be occurring. 相似文献

20.

面向藏语语音合成的语音基元自动标注方法

徐世鹏杨鸿武王海燕《计算机工程与应用》2015,51(6):199-203

在基于隐Markov模型（Hidden Markov Model,HMM）的统计参数藏语语音合成中引入了DAEM（Deterministic Annealing EM）算法,对没有时间标注的藏语训练语音进行自动时间标注。以声母和韵母为合成基元,在声母和韵母的声学模型的训练过程中,利用DAEM算法确定HMM模型的嵌入式重估的最佳参数。训练好声学模型后,再利用强制对齐自动获得声母和韵母的时间标注。实验结果表明,该方法对声母和韵母的时间标注接近手工标注的结果。对合成的藏语语音进行主观评测表明,该方法合成的藏语语音和手工标注声、韵母时间的方法合成的藏语语音的音质接近。因此,利用该方法可以在不需要声、韵母的时间标注的情况下建立合成基元的声学模型。相似文献