期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

张晓静蒋冬梅 FAN Ping SAHLI Hichem 《计算机工程与应用》2014,(21):162-165,170

提出了一个改进的三特征流听视觉融合异步动态贝叶斯网络情感模型（VVA_AsyDBN）,采用面部几何特征（GF）和面部主动外观模型特征（AAM）作为两个视觉输入流,语音Mel倒谱特征（MFCC）作为听觉输入流,且视觉流的状态和听觉流的状态可以存在有约束的异步。在eNTERFACE’05听视觉情感数据库上进行了情感识别实验,并与传统的多流同步隐马尔可夫模型（MSHMM）,以及具有两个听觉特征流（语音MFCC和局域韵律特征LP）和一个视觉特征流的听视觉异步DBN模型（T_AsyDBN）进行了比较。实验结果表明,VVA_AsyDBN获得了最高识别率75.61%,比视觉单流HMM提高了12.50%,比采用AAM、GF和MFCC特征的MSHMM提高了2.32%,比T_AsyDBN的最高识别率也提高了1.65%。相似文献

2.

基于三流DBN模型的听视觉情感识别

下载免费PDF全文

吕兰兰蒋冬梅王风娜 Hichem Sahli Werner Verhelst 《计算机工程》2012,38(5):161-162,166

为更好地对听视觉情感信息之间的关联关系进行建模,提出一种三流混合动态贝叶斯网络情感识别模型(T_AsyDBN)。采用MFCC特征及基于基频和短时能量的局域韵律特征作为听觉输入流,在状态层同步。将面部几何特征和面部动作参数特征作为视觉输入流,与听觉输入流在状态层异步。实验结果表明,该模型优于有状态异步约束的听视觉双流DBN模型,6种情感的平均识别率从 52.14%提高到63.71%。相似文献

3.

基于汉语文本驱动的人脸语音同步动画系统

下载免费PDF全文

杜鹏房宁赵群飞《计算机工程》2012,38(13):260-262,265

为解决动画流与语音流的同步问题,设计并实现一种人脸语音同步动画系统。将所有中文音素分为16组中文可视音素,并用输入的人脸图像合成对应的关键帧,分析输入文本得到中文可视音素序列和动画的关键帧序列,将该关键帧序列与语音流对齐,在关键帧之间插入过渡帧的同时,播放语音流和动画流,以实现人脸语音同步动画。实验结果表明,该系统能产生符合人们视觉和听觉感受的人脸语音同步动画。相似文献

4.

基于BTSM—LDA的口形动态特征及多流异步音视频语音识别

吕国云赵荣椿蒋冬梅 H.Sahli 樊养余 W.Verhelst 《数据采集与处理》2008,23(4)

引入一种基于贝叶斯切线形状模型(BTSM)的口形轮廓特征提取和基于线性判别分析(LDA)的视觉语音动态特征提取方法,该特征充分体现了口形特征变化的动态性,消除了直接口形轮廓几何特征的冗余.同时采用一种新颖的多流异步动态贝叶斯网络(MS-ADBN)模型来实现音视频的连续语音识别,该模型在词节点级别体现了音视频流的同步异步性.识别实验结果表明:采用LDA视觉语音动态特征的系统性能明显优于静态的口形轮廓几何特征,在语音信噪比为0～30 dB的测试环境下,融合LDA视觉特征的MS-ADBN模型比多流异步HMM的平均识别率提高4.92%,说明MS-ADBN模型更好地表达了音视频流之间的异步关系. 相似文献

5.

基于发音特征的音视频融合语音识别模型

下载免费PDF全文

吴鹏蒋冬梅王风娜 Hichem SAHLI Werner VERHEIST 《计算机工程》2011,37(22):268-269

构建一种基于发音特征的音视频双流动态贝叶斯网络(DBN)语音识别模型(AFAV_DBN),定义节点的条件概率关系,使发音特征状态的变化可以异步.在音视频语音数据库上的语音识别实验表明,通过调整发音特征之问的异步约束,AF- AV_DBN模型能得到比基于状态的同步和异步DBN模型以及音频单流模型更高的识别率,对噪声也具有... 相似文献

6.

一个基于改进的HMM的人脸语音动画合成系统

叶静董兰芳王洵万寿红《计算机工程》2005,31(13):165-167,219

利用HMM的统计特性，对HMM模型结构进行改动，使其成为人脸语音动画合成中语音特征到图像特征的映射模型。通过一些必要的前期处理，就可以根据训练样本建立特定说话对象的HMM。使用该模型，加上一些必要的后期处理工作，就可以通过输入的语音信号合成语种无关的、平滑的、并富有真实感的人脸语音动画。相似文献

7.

基于生理舌头模型的语音可视化系统

下载免费PDF全文

江辰於俊罗常伟李睿汪增福《中国图象图形学报》2015,20(9):1237-1246

目的目前针对舌头的语音同步动画技术还未得到广泛的研究。在此背景下,提出了一种基于生理模型的舌头动画合成方法。方法首先构建了一个精细的、能够在肌肉激励下产生逼真舌头变形的舌头生理模型;其次利用该舌头模型合成了大量的舌头运动样本,并据此通过学习得到一个从肌肉激励到舌头轮廓的转换模型;然后对采集的动态2维舌头轮廓数据进行运动参数估计以得到与音素对应的体素(肌肉激励序列和刚体位移序列);最后将体素按一定的排列方式输入到舌头生理模型进行仿真以生成相应的舌头动画。结果该系统可以合成听觉效果逼真的语音和视觉效果逼真且与合成语音同步的舌头动画。结论本文方法可以根据汉语普通话或其他语言的2维舌头轮廓数据构建音素—体素数据库,并据此合成该语言对应的高真实感的3维舌头动画。相似文献

8.

基于模型的头部运动估计和面部图像合成 总被引：9，自引：0，他引：9

尹宝才高文晏洁宋益波《计算机研究与发展》1999,36(1):67-71

文中讨论一种基于模型的头部运动估计和面部图像合成方法。首先建立了一个基于人脸几何模型的可变形三维面部模型,此模型可根据不同人脸图像特征修正特定人脸模型。为了使特定人脸模型与特定人脸图像相匹配,需根据变形模型修正人脸模型。文中采用自动调整与人机交互相结合的方法实现特定人脸模型匹配。在调整完模型形状之后,应用３个方向的面部图像进行纹理映射生成不同视点方向的面部图像。应用合成面部图像与输入面部图像最佳匹相似文献

9.

虚拟人“双簧”—与语音同步的三维人脸动画的研究

《计算机应用与软件》2015,(8)

为了有效地合成人脸语音动画,提出一种与语音同步的三维人脸口型与表情动画合成的方法。首先,根据对人脸运动解剖学的研究,构建一个基于肌肉模型与微分几何学模型的三维人脸控制模型,通过数据结构的形式去控制肌肉模型和微分几何学模型实现人脸的运动,从而实现各种口型和表情运动的变化;然后,充分考虑汉语的发音特征,提出一种基于几何描述的并且符合汉语发音习惯的协同发音模型,从而产生逼真的三维人脸语音动画。仿真结果和实验对比表明,采用该方法可以得到符合汉语发音习惯的三维人脸口型动画,且合成的三维人脸表情较为自然,逼真。相似文献

10.

基于AAM的人脸图像描述与编码

谢玉鹏吴海燕《计算机仿真》2009,26(6):272-276

利用主动表观模型(AAM)来对人脸图像进行描述和编码,经过一定次数迭代,进行模型和人脸匹配,合成人脸图像.方法基于统计信息建模来实现对目标图像的描述.由于采用了优化算法,经过迭代运算使合成的模型与目标图像不断接近,最终能得到反应目标图像纹理及形状的合成模型.实验表明AAM方法进行人脸描述和编码的有效性.方法在人脸图像编码有重要的意义. 相似文献

11.

Speech driven photo realistic facial animation based on an articulatory DBN model and AAM features

Dongmei Jiang Yong Zhao Hichem Sahli Yanning Zhang 《Multimedia Tools and Applications》2014,73(1):397-415

This paper presents a photo realistic facial animation synthesis approach based on an audio visual articulatory dynamic Bayesian network model (AF_AVDBN), in which the maximum asynchronies between the articulatory features, such as lips, tongue and glottis/velum, can be controlled. Perceptual Linear Prediction (PLP) features from audio speech, as well as active appearance model (AAM) features from face images of an audio visual continuous speech database, are adopted to train the AF_AVDBN model parameters. Based on the trained model, given an input audio speech, the optimal AAM visual features are estimated via a maximum likelihood estimation (MLE) criterion, which are then used to construct face images for the animation. In our experiments, facial animations are synthesized for 20 continuous audio speech sentences, using the proposed AF_AVDBN model, as well as the state-of-art methods, being the audio visual state synchronous DBN model (SS_DBN) implementing a multi-stream Hidden Markov Model, and the state asynchronous DBN model (SA_DBN). Objective evaluations on the learned AAM features show that much more accurate visual features can be learned from the AF_AVDBN model. Subjective evaluations show that the synthesized facial animations using AF_AVDBN are better than those using the state based SA_DBN and SS_DBN models, in the overall naturalness and matching accuracy of the mouth movements to the speech content. 相似文献

12.

语音驱动的人脸动画研究现状综述

李欣怡张志超《计算机工程与应用》2017,53(22):21-28

利用语音来驱动人脸动画,是虚拟现实（Virtual Reality）等领域重要的智能技术,近年来虚拟现实技术的飞速发展更进一步地突出了在沉浸环境下的人机自然交流的迫切需求。语音驱动的人脸动画技术能够创造出自然生动、带有情感的动画,相对于传统预设的人脸动画而言能够更好地辅助人机交互、提升用户体验。为推进该技术的智能化程度和应用,针对语音驱动人脸动画的关键问题：音视频映射,综述了逐帧分析、多帧分析和逐音素分析的映射方法,同时也梳理了多种脸部模型的思想,动画合成、情感融合、人脸动画评价的方法,及可能的研究发展方向。相似文献

13.

Realistic Mouth-Synching for Speech-Driven Talking Face Using Articulatory Modelling 总被引：1，自引：0，他引：1

Lei Xie Zhi-Qiang Liu 《Multimedia, IEEE Transactions on》2007,9(3):500-510

This paper presents an articulatory modelling approach to convert acoustic speech into realistic mouth animation. We directly model the movements of articulators, such as lips, tongue, and teeth, using a dynamic Bayesian network (DBN)-based audio-visual articulatory model (AVAM). A multiple-stream structure with a shared articulator layer is adopted in the model to synchronously associate the two building blocks of speech, i.e., audio and video. This model not only describes the synchronization between visual articulatory movements and audio speech, but also reflects the linguistic fact that different articulators evolve asynchronously. We also present a Baum-Welch DBN inversion (DBNI) algorithm to generate optimal facial parameters from audio given the trained AVAM under maximum likelihood (ML) criterion. Extensive objective and subjective evaluations on the JEWEL audio-visual dataset demonstrate that compared with phonemic HMM approaches, facial parameters estimated by our approach follow the true parameters more accurately, and the synthesized facial animation sequences are so lively that 38% of them are undistinguishable 相似文献

14.

特征联合优化深度信念网络的语音增强算法

下载免费PDF全文

王雁贾海蓉吉慧芳王卫梅《计算机工程与应用》2019,55(9):38-42

针对深度信念网络（Deep Believe Network，DBN）模型泛化能力较弱，导致语音增强效果不佳的问题，提出了一种特征联合优化的回归DBN语音增强算法。该算法对语音和噪声不做任何假设。该算法分别提取语音信号的LMPS（Log-Mel frequency Power Spectrum）和MFCC（Mel-Frequency Cepstral Coefficients）特征。LMPS用于直接重构增强语音，保证了语音听觉质量，MFCC作为辅助次级特征。将两种特征联合输入到DBN体系中对网络参数进行优化。这种联合优化在对LMPS的直接预测中加入MFCC限制，提升了模型对LMPS估计的泛化能力，更加准确地重构增强语音。仿真结果表明，在不同的信噪比环境下，与LPS（Log Power Spectrum）和LMPS单特征优化相比，LMPS和MFCC联合优化使增强语音获得了较高的PESQ和SNR，提高了语音质量和可懂度。相似文献

15.

A coupled HMM approach to video-realistic speech animation

Lei Xie^{Author Vitae} Zhi-Qiang Liu Author Vitae 《Pattern recognition》2007,40(8):2325-2340

We propose a coupled hidden Markov model (CHMM) approach to video-realistic speech animation, which realizes realistic facial animations driven by speaker independent continuous speech. Different from hidden Markov model (HMM)-based animation approaches that use a single-state chain, we use CHMMs to explicitly model the subtle characteristics of audio-visual speech, e.g., the asynchrony, temporal dependency (synchrony), and different speech classes between the two modalities. We derive an expectation maximization (EM)-based A/V conversion algorithm for the CHMMs, which converts acoustic speech into decent facial animation parameters. We also present a video-realistic speech animation system. The system transforms the facial animation parameters to a mouth animation sequence, refines the animation with a performance refinement process, and finally stitches the animated mouth with a background facial sequence seamlessly. We have compared the animation performance of the CHMM with the HMMs, the multi-stream HMMs and the factorial HMMs both objectively and subjectively. Results show that the CHMMs achieve superior animation performance. The ph-vi-CHMM system, which adopts different state variables (phoneme states and viseme states) in the audio and visual modalities, performs the best. The proposed approach indicates that explicitly modelling audio-visual speech is promising for speech animation. 相似文献

16.

Speech-driven facial animation with spectral gathering and temporal attention

Yujin CHAI Yanlin WENG Lvdi WANG Kun ZHOU 《Frontiers of Computer Science》2022,16(3):163703

In this paper, we present an efficient algorithm that generates lip-synchronized facial animation from a given vocal audio clip. By combining spectral-dimensional bidirectional long short-term memory and temporal attention mechanism, we design a light-weight speech encoder that learns useful and robust vocal features from the input audio without resorting to pre-trained speech recognition modules or large training data. To learn subject-independent facial motion, we use deformation gradients as the internal representation, which allows nuanced local motions to be better synthesized than using vertex offsets. Compared with state-of-the-art automatic-speech-recognition-based methods, our model is much smaller but achieves similar robustness and quality most of the time, and noticeably better results in certain challenging cases. 相似文献

17.

基于局部二元模式的面部表情识别研究 总被引：1，自引：0，他引：1

下载免费PDF全文

应自炉方谢燕《计算机工程与应用》2009,45(29):180-183

提出了一种基于局部二元模式（Local Binary Pattern,LBP）与支持向量机（SVM）相结合的面部表情识别方法。使用LBP算子对图像进行处理,对图像的模式进行统计形成面部表情特征;使用线性判别分析对表情特征进行降维处理;采用支持向量机对面部表情进行分类。用Matlab实现了上述方法,并在日本女性人脸表情（JAFFE）数据库上测试,取得了70.95%的识别率。相似文献