首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 93 毫秒
严英 《微计算机信息》2004,20(4):120-121
计算机语音技术涉厦到人工智能、模式识别、微机技术、语言声学、语言学和认知科学等许多学科领域。美国DARPA战略计算计划提出了口语系统研究,即把语音识别和自然语言理解结合起来,并进一步实用化。在我国,语音技术的研究起步较晚,语言识别技术的产品较少。本文结合中国学生学习英语状况,在计算机平台上利用语音的识别和输出及人工智能创建模拟英语对话机器人模型。  相似文献   

智能熊猫口语对话系统   总被引:1,自引:0,他引:1  
论文介绍一套应用于博物馆熊猫模型的口语对话系统。该系统利用大词汇量非特定人连续语音识别技术与口语对话模型实现了智能熊猫系统的人机知识问答。系统采用统计的正则语言模型和机器主导的口语对话策略提高系统的识别速度和识别率。由于采用基于子词的声学模型,系统的识别词表易于增加,不受限制。该系统自2001年7月起在北京自然博物馆正式运行,系统对环境噪声以及带有不同方言口音的普通话都表现出良好的稳健性。在实际环境下的测试表明系统语音识别率达到99.07%。  相似文献   

语音浏览器系统能够提供更易为人们所接受的网络浏览模式,拓展了Internet的发展空间。VoiceXML语言是XML语言在语音浏览器方面的应用,文章设计并实现了一个基于VoiceXML技术的语音浏览器系统。  相似文献   

伴随着互联网的迅猛发展,在互联网上应用的浏览器越来越多,最常用的是IE,NETSCAPE等Web浏览器。然而,人们最方便的交流方式就是对话,因此语音浏览器在网上的应用将会越来越多。本文介绍一种基于Voice Browser的语音同框架,语音介绍了Multimodal Browser。它使人们可以用多种交互方式进行交流,还可以将各种电子设备用多模态方式接入互联网。本文还讨论了它们的技术难点及解决方法。最后介绍了它们的优点和发展趋势。  相似文献   

口语对话系统一直是计算机科学领域人类语言技术的热点,能够应用于不同的领域并且具备广阔的前景。将分析国外不同领域的三种典型会话系统:CommandTalk、ITSPOKE 和NICE。将从使用范围与交互方式、语音识别、对话管理、语音合成等几方面分析和研究这三种来自不同领域的对话系统,并提出观点和见解,为国内的口语对话系统研究和开发提供一定的参考和建议。  相似文献   

为了给英语学习者建立一个虚拟的环境,使其通过与机器进行对话练习,达到学习的目的,本文采用基于实例推理的方法,结合人机对话、语音识别和语音合成技术,研究了一个辅助英语学习的人机对话系统的设计与实现。文章重点阐述了系统的语音功能、对话管理和实例库访问。实验表明,系统很好地满足了用户提高英语听力和口语水平的需求。  相似文献   

口语对话系统是人机交互领域的核心技术,也是实现和谐人机交互的重要途径,具有重大的研究意义和应用价值,其中的各项理论和技术的研究进展一直备受关注.较为全面深入地总结了对话管理及口语对话系统的研究进展和现状.首先阐述了口语对话系统中的主要研究问题,包括系统各模块的研究内容与关键技术、系统的可移植性和鲁棒性设计等;然后从理论模型、研究进展及可用性等角度系统地剖析了现有的多种口语对话管理策略;最后展望了未来可能的研究方向和亟待解决的问题.  相似文献   

朗读语料与自然口语的差异分析   总被引:4,自引:1,他引:4  
本文通过对朗读语音语料库ASCCD、自然口语独白语音语料库CASS和自然口语对话语音语料库CADCC的统计分析,试图说明朗读语料与自然口语的主要差异。文章主要对二者在音节、声韵、副语言学和非语言学现象、语篇话题、话轮转换、基频变化以及音段音变现象等几个方面作了一些统计分析,并由此归纳出朗读语料与自然口语的几点不同。  相似文献   

汉语股票实时行情查询对话系统   总被引:1,自引:0,他引:1  
介绍了一个用于股票实时行情查询的口语化的人机对话系统,该系统集成了语音识别、语言理解、对话控制等技术。文中定义了一个情景语义框架模型,较好地处理了口语理解系统的一些难点。  相似文献   

POP 《网络与信息》2012,26(3):78-79
估计每个iphone4S用户的一大优越感就是可以调戏Siri.但是现在广大的Android用户也有机会了!除了语音输入法等语音应用以外。国内多家手机浏览器公司也都推出了各自的支持语音的手机浏览器产品。作为手机上的重要应用之一.手机浏览器当然也要玩语音!  相似文献   

基于B/S模式的在线口语训练系统的设计与实现   总被引:2,自引:0,他引:2  
基于B/S模式的在线口语训练系统,相对于传统的C/S结构和单机版的系统,体现了许多优越性,能更好地满足使用需求。该文详细论述了基于B/S模式的在线口语训练系统的设计与实现。  相似文献   

This work focuses on the development of expressive text-to-speech synthesis techniques for a Chinese spoken dialog system, where the expressivity is driven by the message content. We adapt the three-dimensional pleasure-displeasure, arousal-nonarousal and dominance-submissiveness (PAD) model for describing expressivity in input text semantics. The context of our study is based on response messages generated by a spoken dialog system in the tourist information domain. We use the $P$ (pleasure) and $A$ (arousal) dimensions to describe expressivity at the prosodic word level based on lexical semantics. The $D$ (dominance) dimension is used to describe expressivity at the utterance level based on dialog acts. We analyze contrastive (neutral versus expressive) speech recordings to develop a nonlinear perturbation model that incorporates the PAD values of a response message to transform neutral speech into expressive speech. Two levels of perturbations are implemented—local perturbation at the prosodic word level, as well as global perturbation at the utterance level. Perceptual experiments involving 14 subjects indicate that the proposed approach can significantly enhance expressivity in response generation for a spoken dialog system.   相似文献   

在口语翻译中,如何融入语义及语用信息一直是目前研究的难点之一。对话行为作为浅层话语结构描述的特征,近年来陆续应用于不同类型的翻译系统中。该文在介绍对话行为理论和口语标注语料的基础上,以基于短语的统计翻译系统为应用对象,提出了对话行为应用于翻译过程的三种方式。该方法通过对对话行为的自动分类,使训练语料—测试语料、开发集—测试集、源语言—目标语言的一致性得到提高,提高了翻译系统的性能,使最终的翻译结果可以更准确地反映源语言所要表达的对话意图。在汉英口语翻译评测数据上的实验证明,对话行为信息的加入使翻译系统的性能得到了有效的提高。  相似文献   

为正确理解口语对话、准确把握话者意图,除必要的语法和语义分析外,口语系统还需进行语用层面上的言语行为分析.文中提出一种基于精简循环网络的、综合使用语段级的微结构信息和语篇级的宏结构特征的汉语口语言语行为分析方法.针对会面安排领域口语语料库训练和测试,取得了满意效果  相似文献   

The development of IP-Telephony in recent years has been substantial. The improvement in voice quality, the integration between voice and data, especially the interaction with multimedia has made the 3G communication more promising. The value added services of Telephony techniques alleviate the dependence on the phone and provide a universal platform for the multimodal telephony applications. For example, the web-based application with VoiceXML has been developed to simplify the human–machine interaction because it takes the advantage of the speech-enabled services and makes the telephone-web access a reality. However, it is not cost-efficient to build voice only stand-alone web application and is more reasonable that voice interfaces should be retrofitted to be compatible or collaborate with the existing HTML or XML-based web applications. Therefore, this paper considers that the functionality of the web service should enable multiple access modalities so that users can perceive and interact with the site in either visual or speech response simultaneously. Under this principle, our research develops a prototype system of multimodal VoIP with the integrated web-based Mandarin dialog system which adopts automatic speech recognition (ASR), text-to-speech (TTS), VoiceXML browser, and VoIP technologies to create user friendly graphic user interface (GUI) and voice user interface (VUI). The users can use traditional telephone, cellular phone, or even VoIP connection via personal computer to interact with the VoiceXML server. In the mean time, the users browse the web and access the same content with common HTML or XML-based browser. The proposed system shows excellent performance and can be easily incorporated into voice ordering service for a wider accessibility.  相似文献   

The development of IP-Telephony in recent years has been substantial. The improvement in voice quality, the integration between voice and data, especially the interaction with multimedia has made the 3G communication more promising. The value added services of Telephony techniques alleviate the dependence on the phone and provide a universal platform for the multimodal telephony applications. For example, the web-based application with VoiceXML has been developed to simplify the human–machine interaction because it takes the advantage of the speech-enabled services and makes the telephone-web access a reality. However, it is not cost-efficient to build voice only stand-alone web application and is more reasonable that voice interfaces should be retrofitted to be compatible or collaborate with the existing HTML or XML-based web applications. Therefore, this paper considers that the functionality of the web service should enable multiple access modalities so that users can perceive and interact with the site in either visual or speech response simultaneously. Under this principle, our research develops a prototype system of multimodal VoIP with the integrated web-based Mandarin dialog system which adopts automatic speech recognition (ASR), text-to-speech (TTS), VoiceXML browser, and VoIP technologies to create user friendly graphic user interface (GUI) and voice user interface (VUI). The users can use traditional telephone, cellular phone, or even VoIP connection via personal computer to interact with the VoiceXML server. In the mean time, the users browse the web and access the same content with common HTML or XML-based browser. The proposed system shows excellent performance and can be easily incorporated into voice ordering service for a wider accessibility.  相似文献   

This paper proposes a domain-independent statistical methodology to develop dialog managers for spoken dialog systems. Our methodology employs a data-driven classification procedure to generate abstract representations of system turns taking into account the previous history of the dialog. A statistical framework is also introduced for the development and evaluation of dialog systems created using the methodology, which is based on a dialog simulation technique. The benefits and flexibility of the proposed methodology have been validated by developing statistical dialog managers for four spoken dialog systems of different complexity, designed for different languages (English, Italian, and Spanish) and application domains (from transactional to problem-solving tasks). The evaluation results show that the proposed methodology allows rapid development of new dialog managers as well as to explore new dialog strategies, which permit developing new enhanced versions of already existing systems.  相似文献   

语句的主题提取是口语对话系统中话语分析部分的工作。目前的口语对话系统大多将自然语言处理的重点放在语法和语义平面,而忽视了对上下文语境的分析,该文提出一种基于规则的语句主题提取方法,通过自底向上与自顶向下两种分析器完成主题与用户意图的提取,为系统的自然语言生成提供更准确的领域知识,从而大大提高了系统的整体性能。  相似文献   

口语语言理解是任务式对话系统的重要组件,预训练语言模型在口语语言理解中取得了重要突破,然而这些预训练语言模型大多是基于大规模书面文本语料。考虑到口语与书面语在结构、使用条件和表达方式上的明显差异,构建了大规模、双角色、多轮次、口语对话语料,并提出融合角色、结构和语义的四个自监督预训练任务:全词掩码、角色预测、话语内部反转预测和轮次间互换预测,通过多任务联合训练面向口语的预训练语言模型SPD-BERT(SPoken Dialog-BERT)。在金融领域智能客服场景的三个人工标注数据集——意图识别、实体识别和拼音纠错上进行详细的实验测试,实验结果验证了该语言模型的有效性。  相似文献   

Spoken dialog systems have difficulty selecting which action to take in a given situation because recognition and understanding errors are prevalent due to noise and unexpected inputs. To solve this problem, this paper presents a hybrid approach to improving robustness of the dialog manager by using agenda-based and example-based dialog modeling. This approach can exploit n-best hypotheses to determine the current dialog state in the dialog manager and keep track of the dialog state using a discourse interpretation algorithm based on an agenda graph and a focus stack. Given the agenda graph and multiple recognition hypotheses, the system can predict the next action to maximize multi-level score functions and trigger error recovery strategies to handle exceptional cases due to misunderstandings or unexpected focus shifts. The proposed method was tested by developing a spoken dialog system for a building guidance domain in an intelligent service robot. This system was then evaluated by simulated and real users. The experimental results show that our approach can effectively develop robust dialog management for spoken dialog systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号