共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper proposes a statistical parametric approach to video-realistic text-driven talking avatar. We follow the trajectory HMM approach where audio and visual speech are jointly modeled by HMMs and continuous audiovisual speech parameter trajectories are synthesized based on the maximum likelihood criterion. Previous trajectory HMM approaches only focus on mouth animation, which synthesizes simple geometric mouth shapes or video-realistic effects of the lip motion. Our approach uses trajectory HMM to generate visual parameters of the lower face and it realizes video-realistic animation of the whole face. Specifically, we use active appearance model (AAM) to model the visual speech, which offers a convenient and compact statistical model of both the shape and the appearance variations of the face. To realize video-realistic effects with high fidelity, we use Poisson image editing technique to stitch the synthesized lower-face image to a whole face image seamlessly. Objective and subjective experiments show that the proposed approach can produce natural facial animation. 相似文献
2.
We propose a coupled hidden Markov model (CHMM) approach to video-realistic speech animation, which realizes realistic facial animations driven by speaker independent continuous speech. Different from hidden Markov model (HMM)-based animation approaches that use a single-state chain, we use CHMMs to explicitly model the subtle characteristics of audio-visual speech, e.g., the asynchrony, temporal dependency (synchrony), and different speech classes between the two modalities. We derive an expectation maximization (EM)-based A/V conversion algorithm for the CHMMs, which converts acoustic speech into decent facial animation parameters. We also present a video-realistic speech animation system. The system transforms the facial animation parameters to a mouth animation sequence, refines the animation with a performance refinement process, and finally stitches the animated mouth with a background facial sequence seamlessly. We have compared the animation performance of the CHMM with the HMMs, the multi-stream HMMs and the factorial HMMs both objectively and subjectively. Results show that the CHMMs achieve superior animation performance. The ph- vi-CHMM system, which adopts different state variables (phoneme states and viseme states) in the audio and visual modalities, performs the best. The proposed approach indicates that explicitly modelling audio-visual speech is promising for speech animation. 相似文献
3.
The Journal of Supercomputing - Machine health monitoring systems are vital components of modern manufacturing industries. As advanced sensors collecting machine health-related data become... 相似文献
4.
为了减少传统的命名实体识别需要人工制定特征的大量工作,通过无监督训练获得军事领域语料的分布式向量表示,采用双向LSTM递归神经网络模型解决军事领域命名实体的识别问题,并且通过添加字词结合的输入向量和注意力机制对双向LSTM递归神经网络模型进行扩展和改进,进而提高军事领域命名实体识别。实验结果表明,提出的方法能够完成军事领域命名实体的识别,并且在测试集语料上的F-值达到了87.38%。 相似文献
5.
Multimedia Tools and Applications - Cardiac arrest is a common issue in Intensive Care Units (ICU) with low survival rate. Deep learning algorithms have been used to predict cardiac arrest... 相似文献
6.
针对传统行为识别依赖手工提取特征,智能化程度不高,识别精度低的问题,提出一种基于3D骨骼数据的卷积神经网络(CNN)与双向长短期记忆网络(Bi-LSTM)的混合模型。使用3D骨骼数据作为网络输入,CNN提取每个时间步的3D输入数据间的空间特征,Bi-LSTM更深层地提取3D数据序列的时间特征。该混合模型自动提取特征完成分类,实现骨骼数据到识别结果的端对端学习。在UTKinect-Action3D标准数据集上,模型的识别率达到97.5%,在自制Kinect数据集上的准确率达到98.6%,实验结果表明,该网络有效提高了分类准确率,具备可用性和有效性。 相似文献
7.
Multimedia Tools and Applications - Due to the availability of an enormous amount of multimodal content on the social web and its applications, automatic sentiment analysis, and emotion detection... 相似文献
9.
Multimedia Tools and Applications - Depression has become one of the most widespread mental health disorders across the globe. Depression is a state of mind which affects how we think, feel, and... 相似文献
10.
The spare parts demand forecasting is very much essential for the organizations to minimize the cost and prevent the stock outs. The demand of spare parts/ car sales distribution is an important factor in inventory control. The valuation of the demand is challenging as the automobile spare parts/car sales demand are often recurrent. The renowned empirical method adopts historical demand data to create the distribution of lead time demand. Although it works reasonably well when service requirements are relatively low, it has difficulty reaching high target service levels. In this paper, we proposed Recurrent Neural Networks/ Long-Short Term Memory (RNN / LSTM) with modified Adam optimizer to predict the demand for spare parts. In this LSTM, weight vectors are generated respectively. These weights are optimized using the Modified-Adam algorithm. The accuracy of the forecast and the performance of the inventory are considered in the experimental result. Experimental results confirm that RNN / LSTM with a Modified-Adam works well with minimal error compared to other existing methods. We conclude that the proposed RNN/LSTM with Modified-Adam algorithm is well suited for the prediction of automobile spare parts. 相似文献
11.
Neural Computing and Applications - E-commerce websites produce a large number of online reviews, posts, and comments about a product or service. These reviews are used to assist consumers in... 相似文献
12.
为解决采用softmax作为长短期记忆网络分类器导致实体关系识别模型泛化能力不足,不能较好适用中医实体关系抽取等问题,提出一种融合梯度提升树的双向长短期记忆网络的关系识别算法(BILSTM-GBDT)。先采用word2vec对中医文本进行向量化表示,再利用基于注意力机制的双向长短期记忆网络提取高阶特征,最后采用集成分类模型梯度提升树作为特征分类器,提高关系识别效果。在中医等多个关系语料库上的实验结果表明,该模型与传统SVM方法、GBDT方法及其深度学习方法相比,均有更高的精确率、召回率和◢F◣值。 相似文献
13.
Multimedia Tools and Applications - A Correction to this paper has been published: https://doi.org/10.1007/s11042-021-10591-y 相似文献
14.
The Journal of Supercomputing - Online ride-hailing order forecasting is a very important part of the intelligent traffic dispatch system. Accurate order forecasting can reduce the flow of invalid... 相似文献
15.
In speech recognition research,because of the variety of languages,corresponding speech recognition systems need to be constructed for different languages.Especially in a dialect speech recognition system,there are many special words and oral language features.In addition,dialect speech data is very scarce.Therefore,constructing a dialect speech recognition system is difficult.This paper constructs a speech recognition system for Sichuan dialect by combining a hidden Markov model(HMM)and a deep long short-term memory(LSTM)network.Using the HMM-LSTM architecture,we created a Sichuan dialect dataset and implemented a speech recognition system for this dataset.Compared with the deep neural network(DNN),the LSTM network can overcome the problem that the DNN only captures the context of a fixed number of information items.Moreover,to identify polyphone and special pronunciation vocabularies in Sichuan dialect accurately,we collect all the characters in the dataset and their common phoneme sequences to form a lexicon.Finally,this system yields a 11.34%character error rate on the Sichuan dialect evaluation dataset.As far as we know,it is the best performance for this corpus at present. 相似文献
16.
Action recognition is an important research topic in video analysis that remains very challenging. Effective recognition relies on learning a good representation of both spatial information (for appearance) and temporal information (for motion). These two kinds of information are highly correlated but have quite different properties, leading to unsatisfying results of both connecting independent models (e.g., CNN-LSTM) and direct unbiased co-modeling (e.g., 3DCNN). Besides, a long-lasting tradition on this task with deep learning models is to just use 8 or 16 consecutive frames as input, making it hard to extract discriminative motion features. In this work, we propose a novel network structure called ResLNet (Deep Residual LSTM network), which can take longer inputs (e.g., of 64 frames) and have convolutions collaborate with LSTM more effectively under the residual structure to learn better spatial-temporal representations than ever without the cost of extra computations with the proposed embedded variable stride convolution. The superiority of this proposal and its ablation study are shown on the three most popular benchmark datasets: Kinetics, HMDB51, and UCF101. The proposed network could be adopted for various features, such as RGB and optical flow. Due to the limitation of the computation power of our experiment equipment and the real-time requirement, the proposed network is tested on the RGB only and shows great performance. 相似文献
17.
从大量的产品评论中进行观点评价对象的自动抽取是观点挖掘研究的重要课题,然而目前观点评价对象抽取结果只提供少量信息,因此提出一种基于上下文相关的双向自举方法同时获取产品名称和产品属性。该方法利用初始种子集、词性模板集获得候选观点评价对象,采用上下文相关的方法对文中所有包含候选观点评价对象的语句抽取出观点评价对象并进行边界识别,同时抽取观点评价对象的词性模板并计算分数,将分值高的模板加入模板集,这样重复迭代直到没有出现新的观点评价对象为止。实验结果表明采用上下文相关方法进行观点评价对象抽取相对于上下文无关的方法性能提高10%以上。 相似文献
18.
通过数据分析进行异常检测,有助于准确识别异常行为,从而提高服务质量和决策能力。然而,由于多维时序数据的时空依赖性以及异常事件发生的随机性,现有方法仍然存在一定的局限性。针对上述问题,提出一种融合新型统计方法和双向卷积LSTM的多维时序数据异常检测方法MBCLE。该方法引入堆叠的中值滤波处理输入数据中的点异常并平滑数据波动;设计双向卷积长短期记忆网络(Bi-ConvLSTM)和双向长短期记忆网络(Bi-LSTM)相结合的预测器进行数据建模和预测;通过双向循环指数加权移动平均(BrEWMA)平滑预测误差;使用动态阈值方法计算阈值以检测上下文异常。实验结果表明,MBCLE具有良好的检测性能,各步骤均对性能提升有所贡献。 相似文献
19.
This article investigates speech feature enhancement based on deep bidirectional recurrent neural networks. The Long Short-Term Memory (LSTM) architecture is used to exploit a self-learnt amount of temporal context in learning the correspondences of noisy and reverberant with undistorted speech features. The resulting networks are applied to feature enhancement in the context of the 2013 2nd Computational Hearing in Multisource Environments (CHiME) Challenge track 2 task, which consists of the Wall Street Journal (WSJ-0) corpus distorted by highly non-stationary, convolutive noise. In extensive test runs, different feature front-ends, network training targets, and network topologies are evaluated in terms of frame-wise regression error and speech recognition performance. Furthermore, we consider gradually refined speech recognition back-ends from baseline ‘out-of-the-box’ clean models to discriminatively trained multi-condition models adapted to the enhanced features. In the result, deep bidirectional LSTM networks processing log Mel filterbank outputs deliver best results with clean models, reaching down to 42% word error rate (WER) at signal-to-noise ratios ranging from −6 to 9 dB (multi-condition CHiME Challenge baseline: 55% WER). Discriminative training of the back-end using LSTM enhanced features is shown to further decrease WER to 22%. To our knowledge, this is the best result reported for the 2nd CHiME Challenge WSJ-0 task yet. 相似文献
20.
Published online: 18 September 2001 相似文献
|