首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
提出了一种BVH格式运动捕捉数据驱动Jack三维骨架模型产生人体运动效果的方法。将Peabody结构的Jack虚拟人模型简化成能够映射BVH数据的树状骨骼结构,使用欧拉角旋转方程建立运动捕捉数据与Jack角色模型的关节数据映射公式,最后在Jack平台上用Python等脚语言进行了编程实现。为在Jack平台中大规模重用运动捕捉数据提供了条件。  相似文献   

2.
3.
Audio-visual speech recognition (AVSR) has shown impressive improvements over audio-only speech recognition in the presence of acoustic noise. However, the problems of region-of-interest detection and feature extraction may influence the recognition performance due to the visual speech information obtained typically from planar video data. In this paper, we deviate from the traditional visual speech information and propose an AVSR system integrating 3D lip information. The Microsoft Kinect multi-sensory device was adopted for data collection. The different feature extraction and selection algorithms were applied to planar images and 3D lip information, so as to fuse the planar images and 3D lip feature into the visual-3D lip joint feature. For automatic speech recognition (ASR), the fusion methods were investigated and the audio-visual speech information was integrated into a state-synchronous two stream Hidden Markov Model. The experimental results demonstrated that our AVSR system integrating 3D lip information improved the recognition performance of traditional ASR and AVSR system in acoustic noise environments.  相似文献   

4.
基于视频与语音的多通道游戏用户界面系统   总被引:2,自引:1,他引:2  
设计和实现了一套基于视频和语音的多通道游戏用户界面系统,以增强计算机游戏的交互性和游戏用户的沉浸感.系统新创建并有效地整合了基于视频与语音两种交互通道,其中包含脸部模型重建、头部姿态估计、汉语语音识别三个模块,可快速实现个性化的游戏角色脸部模型,并允许游戏用户使用头部姿态和语音命令实时控制游戏角色和游戏进展.测试和应用结果表明:该系统适用于普通游戏用户和实际游戏环境.  相似文献   

5.
研究了虚拟演播室中,节目主持人实拍图象与计算机生成的虚拟环境的一种无缝合成技术,提出了用以视频对象分割边缘等分抽样点为型值点的闭合B样条曲线,建立视频对象的平面格网真实感图形的算法,并用于虚拟演播室中节目主持人图象的真实感图形建立,通过对单个节目主持人视频摄像输入的实验,对算法的可行性进行了验证,实验结果表明,该算法对视频对象做基于分割边缘线拟合的简单三维重建是有效的,且具有自适应性。  相似文献   

6.
面向儿童的多通道交互系统   总被引:9,自引:2,他引:9  
李杰  田丰  王维信  戴国忠 《软件学报》2002,13(9):1846-1851
设计和实现了一个基于笔和语音的面向儿童的多通道三维交互系统.系统中包含了基于笔和语音的交互信息整合框架,用来整合儿童输入的笔和语音信息.同时,系统中定义了一些基于笔和语音的交互技术,可以支持孩子们以自然的方式,通过笔和语音同系统进行交互.用笔来勾画三维的场景和小动物等实体,同时用笔和语音同场景和场景中的实体进行一定的交互.  相似文献   

7.
基于虚拟人合成技术的中国手语合成方法   总被引:13,自引:1,他引:13  
王兆其  高文 《软件学报》2002,13(10):2051-2056
介绍了一种中国手语合成方法,实现了文本到中国手语的自动翻译,并使用虚拟人合成技术,实现了中国手语的合成与显示,以此帮助聋人与听力正常人之间实现自然交流.在该方法中,首先应用两只数据手套和3个6自由度位置跟踪器,基于运动跟踪的原理,记录真实人体演示每个手语词的运动数据,建立一个初始的手语词运动数据库.然后,应用一种基于控制点的人体运动编辑方法,对每个手语词的运动数据进行编辑与微调,最后得到一个高质量的手语词运动数据库.当给定一个文本句子时,应用人体运动合成方法,对每个手语词的手语运动片段进行拼接合成,最终生成一个完整的手语运动,并基于VRML的人体运动显示方法将合成的运动逼真地显示出来.基于该方法,在PC/Windows/VC6.0环境下实现了一个中国聋人手语合成系统.该系统采集了<中国手语>(含续集)中收录的5 596个手语词,可以合成一般生活与教学用语.经聋校的老师和学生确认,合成手语准确逼真,可以广泛应用于教学、电视、Internet 等多种大众媒体,帮助聋人参与其他听力正常人的活动,具有广泛的应用前景和重要的社会意义.  相似文献   

8.
Humanoid three‐dimensional (3D) models can be easily acquired through various sources, including through online marketplaces. The use of such models within a game or simulation environment requires human input and intervention in order to associate such a model with a relevant set of motions and control mechanisms. In this paper, we demonstrate a pipeline where humanoid 3D models can be incorporated within seconds into an animation system and infused with a wide range of capabilities, such as locomotion, object manipulation, gazing, speech synthesis and lip syncing. We offer a set of heuristics that can associate arbitrary joint names with canonical ones and describe a fast retargeting algorithm that enables us to instill a set of behaviors onto an arbitrary humanoid skeleton on‐the‐fly. We believe that such a system will vastly increase the use of 3D interactive characters due to the ease that new models can be animated.Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

9.
葛磊  强彦  赵涓涓 《软件学报》2016,27(S2):130-136
语音情感识别是人机交互中重要的研究内容,儿童自闭症干预治疗中的语音情感识别系统有助于自闭症儿童的康复,但是由于目前语音信号中的情感特征多而杂,特征提取本身就是一项具有挑战性的工作,这样不利于整个系统的识别性能.针对这一问题,提出了一种语音情感特征提取算法,利用无监督自编码网络自动学习语音信号中的情感特征,通过构建一个3层的自编码网络提取语音情感特征,把多层编码网络学习完的高层特征作为极限学习机分类器的输入进行分类,其识别率为84.14%,比传统的基于提取人为定义特征的识别方法有所提高.  相似文献   

10.
Synthesizing expressive facial animation is a very challenging topic within the graphics community. In this paper, we present an expressive facial animation synthesis system enabled by automated learning from facial motion capture data. Accurate 3D motions of the markers on the face of a human subject are captured while he/she recites a predesigned corpus, with specific spoken and visual expressions. We present a novel motion capture mining technique that "learns" speech coarticulation models for diphones and triphones from the recorded data. A phoneme-independent expression eigenspace (PIEES) that encloses the dynamic expression signals is constructed by motion signal processing (phoneme-based time-warping and subtraction) and principal component analysis (PCA) reduction. New expressive facial animations are synthesized as follows: First, the learned coarticulation models are concatenated to synthesize neutral visual speech according to novel speech input, then a texture-synthesis-based approach is used to generate a novel dynamic expression signal from the PIEES model, and finally the synthesized expression signal is blended with the synthesized neutral visual speech to create the final expressive facial animation. Our experiments demonstrate that the system can effectively synthesize realistic expressive facial animation  相似文献   

11.
基于UniSPeech-SDA80D51的车载音响声控系统   总被引:1,自引:0,他引:1  
提出一种语音命令控制车载音响操作的设计方案,以德国Infineon公司新推出的具有DSP和单片机双核的SoC语音处理芯片UniSpeech-SDA80D51为核心组成非特定人车载音响语音控制系统,并实现了系统样机的研制.该系统在江淮同悦SL1102C1型车载音响上进行了语音控制实测,实测数据表明系统语音识别率可达到95...  相似文献   

12.
情感语音合成是情感计算和语音信号处理研究的热点之一,进行准确的语音情感分析是合成高质量情感语音的前提.文中采用PAD情感模型作为情感分析量化模型,对情感语料库中的语音进行情感分析和聚类,获得各情感PAD参数模型.由HMM语音合成系统合成的情感语音,通过PAD模型进行参数修正,使得合成语音的情感参数更加准确,从而提高情感语音合成的质量.实验表明该方法能较好地提高合成语音的自然度和情感清晰度,在同性别不同说话人中也能达到较好的性能.  相似文献   

13.
We present a novel approach to synthesizing accurate visible speech based on searching and concatenating optimal variable-length units in a large corpus of motion capture data. Based on a set of visual prototypes selected on a source face and a corresponding set designated for a target face, we propose a machine learning technique to automatically map the facial motions observed on the source face to the target face. In order to model the long distance coarticulation effects in visible speech, a large-scale corpus that covers the most common syllables in English was collected, annotated and analyzed. For any input text, a search algorithm to locate the optimal sequences of concatenated units for synthesis is described. A new algorithm to adapt lip motions from a generic 3D face model to a specific 3D face model is also proposed. A complete, end-to-end visible speech animation system is implemented based on the approach. This system is currently used in more than 60 kindergartens through third grade classrooms to teach students to read using a lifelike conversational animated agent. To evaluate the quality of the visible speech produced by the animation system, both subjective evaluation and objective evaluation are conducted. The evaluation results show that the proposed approach is accurate and powerful for visible speech synthesis.  相似文献   

14.
基于JACK的虚拟人行走建模与实现   总被引:2,自引:0,他引:2  
针对目前Jack软件中人体直立行走仿真过程存在的缺陷(动作不自然问题、姿态转变存在突变现象),分析了Jack软件中人体运动模型及直立行走的控制方法,提出利用参数化关键帧技术对人体运动控制过程进行改进,通过改进,使得人体的行走仿真更加逼真、自然,有效地提高了人体运动仿真的真实感。结果将改进的运动控制流程应用到虚拟人体的弯腰走、携物走、跑、爬行、攀登、匍匐等广义行走仿真中,实现了人体各种姿态的行走动作,有效提高了jack软件在工程中的实际应用能力。  相似文献   

15.
贺越生  卢晓军  李焱 《计算机仿真》2006,23(4):265-268,321
在工业产品、武器装备的维修性设计中,用虚拟现实技术进行人素分析,可以有效地提高产品的可维修性。该文提出了一个虚拟人素分析软件框架,并在此基础上,基于虚拟人平台—Jack,设计实现了一个虚拟人素分析系统。该系统根据给定的虚拟人体模型和虚拟样机进行人素分析并产生人素分析报告。该软件实现了维修仿真过程建模,以及基于维修仿真过程的人素分析功能,给用户提供了一个高效的分析环境。软件系统的应用实例研究表明,该文提出的软件框架是合理有效的。  相似文献   

16.
17.
The emergence of portable 3D mapping systems are revolutionizing the way we generate digital 3D models of environments. These systems are human-centric and require the user to hold or carry the device while continuously walking and mapping an environment. In this paper, we adapt this unique coexistence of man and machines to propose SAGE (Semantic Annotation of Georeferenced Environments). SAGE consists of a portable 3D mobile mapping system and a smartphone that enables the user to assign semantic content to georeferenced 3D point clouds while scanning a scene. The proposed system contains several components including touchless speech acquisition, background noise adaptation, real time audio and vibrotactile feedback, automatic speech recognition, distributed clock synchronization, 3D annotation localization, user interaction, and interactive visualization. The most crucial advantage of SAGE technology is that it can be used to infer dynamic activities within an environment. Such activities are difficult to be identified with existing post-processing semantic annotation techniques. The capability of SAGE leads to many promising applications such as intelligent scene classification, place recognition and navigational aid tasks. We conduct several experiments to demonstrate the effectiveness of the proposed system.  相似文献   

18.
Assistance is currently a pivotal research area in robotics, with huge societal potential. Since assistant robots directly interact with people, finding natural and easy-to-use user interfaces is of fundamental importance. This paper describes a flexible multimodal interface based on speech and gesture modalities in order to control our mobile robot named Jido. The vision system uses a stereo head mounted on a pan-tilt unit and a bank of collaborative particle filters devoted to the upper human body extremities to track and recognize pointing/symbolic mono but also bi-manual gestures. Such framework constitutes our first contribution, as it is shown, to give proper handling of natural artifacts (self-occlusion, camera out of view field, hand deformation) when performing 3D gestures using one or the other hand even both. A speech recognition and understanding system based on the Julius engine is also developed and embedded in order to process deictic and anaphoric utterances. The second contribution deals with a probabilistic and multi-hypothesis interpreter framework to fuse results from speech and gesture components. Such interpreter is shown to improve the classification rates of multimodal commands compared to using either modality alone. Finally, we report on successful live experiments in human-centered settings. Results are reported in the context of an interactive manipulation task, where users specify local motion commands to Jido and perform safe object exchanges.  相似文献   

19.
A kinematic model of the human spine and torso   总被引:1,自引:0,他引:1  
Efforts to develop a more accurate model of the human spine and torso in order to improve realism in human motion modeling are discussed. The model of spinal motion, which is represented within Jack (a software system for human figure modeling and manipulation), is described. The impact parameters, vertebral joint movement, and the spine database are considered. Application of the motion model is examined, and examples of its use are given  相似文献   

20.
This paper describes a speaker-adaptive HMM-based speech synthesis system. The new system, called “HTS-2007,” employs speaker adaptation (CSMAPLR+MAP), feature-space adaptive training, mixed-gender modeling, and full-covariance modeling using CSMAPLR transforms, in addition to several other techniques that have proved effective in our previous systems. Subjective evaluation results show that the new system generates significantly better quality synthetic speech than speaker-dependent approaches with realistic amounts of speech data, and that it bears comparison with speaker-dependent approaches even when large amounts of speech data are available. In addition, a comparison study with several speech synthesis techniques shows the new system is very robust: It is able to build voices from less-than-ideal speech data and synthesize good-quality speech even for out-of-domain sentences.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号