期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A statistical parametric approach to video-realistic text-driven talking avatar

Lei Xie Naicai Sun Bo Fan 《Multimedia Tools and Applications》2014,73(1):377-396

This paper proposes a statistical parametric approach to video-realistic text-driven talking avatar. We follow the trajectory HMM approach where audio and visual speech are jointly modeled by HMMs and continuous audiovisual speech parameter trajectories are synthesized based on the maximum likelihood criterion. Previous trajectory HMM approaches only focus on mouth animation, which synthesizes simple geometric mouth shapes or video-realistic effects of the lip motion. Our approach uses trajectory HMM to generate visual parameters of the lower face and it realizes video-realistic animation of the whole face. Specifically, we use active appearance model (AAM) to model the visual speech, which offers a convenient and compact statistical model of both the shape and the appearance variations of the face. To realize video-realistic effects with high fidelity, we use Poisson image editing technique to stitch the synthesized lower-face image to a whole face image seamlessly. Objective and subjective experiments show that the proposed approach can produce natural facial animation. 相似文献

2.

A coupled HMM approach to video-realistic speech animation

Lei Xie^{Author Vitae} Zhi-Qiang Liu Author Vitae 《Pattern recognition》2007,40(8):2325-2340

We propose a coupled hidden Markov model (CHMM) approach to video-realistic speech animation, which realizes realistic facial animations driven by speaker independent continuous speech. Different from hidden Markov model (HMM)-based animation approaches that use a single-state chain, we use CHMMs to explicitly model the subtle characteristics of audio-visual speech, e.g., the asynchrony, temporal dependency (synchrony), and different speech classes between the two modalities. We derive an expectation maximization (EM)-based A/V conversion algorithm for the CHMMs, which converts acoustic speech into decent facial animation parameters. We also present a video-realistic speech animation system. The system transforms the facial animation parameters to a mouth animation sequence, refines the animation with a performance refinement process, and finally stitches the animated mouth with a background facial sequence seamlessly. We have compared the animation performance of the CHMM with the HMMs, the multi-stream HMMs and the factorial HMMs both objectively and subjectively. Results show that the CHMMs achieve superior animation performance. The ph-vi-CHMM system, which adopts different state variables (phoneme states and viseme states) in the audio and visual modalities, performs the best. The proposed approach indicates that explicitly modelling audio-visual speech is promising for speech animation. 相似文献

3.

Tool wear prediction using convolutional bidirectional LSTM networks

Chan Yu-Wei Kang Tsan-Ching Yang Chao-Tung Chang Chih-Hung Huang Shih-Meng Tsai Yin-Te 《The Journal of supercomputing》2022,78(1):810-832

The Journal of Supercomputing - Machine health monitoring systems are vital components of modern manufacturing industries. As advanced sensors collecting machine health-related data become... 相似文献

4.

基于双向LSTM的军事命名实体识别

李健龙王盼卿韩琪羽《计算机工程与科学》2019,41(4):711-718

为了减少传统的命名实体识别需要人工制定特征的大量工作,通过无监督训练获得军事领域语料的分布式向量表示,采用双向LSTM递归神经网络模型解决军事领域命名实体的识别问题,并且通过添加字词结合的输入向量和注意力机制对双向LSTM递归神经网络模型进行扩展和改进,进而提高军事领域命名实体识别。实验结果表明,提出的方法能够完成军事领域命名实体的识别,并且在测试集语料上的F-值达到了87.38%。相似文献

5.

A novel solution of using deep learning for early prediction cardiac arrest in Sepsis patient: enhanced bidirectional long short-term memory (LSTM)

Baral Samit Alsadoon Abeer Prasad P. W. C. Al Aloussi Sarmad Alsadoon Omar Hisham 《Multimedia Tools and Applications》2021,80(21-23):32639-32664

Multimedia Tools and Applications - Cardiac arrest is a common issue in Intensive Care Units (ICU) with low survival rate. Deep learning algorithms have been used to predict cardiac arrest... 相似文献

6.

基于CNN与双向LSTM的行为识别算法

吴潇颖李锐吴胜昔《计算机工程与设计》2020,41(2):361-366

针对传统行为识别依赖手工提取特征,智能化程度不高,识别精度低的问题,提出一种基于3D骨骼数据的卷积神经网络(CNN)与双向长短期记忆网络(Bi-LSTM)的混合模型。使用3D骨骼数据作为网络输入,CNN提取每个时间步的3D输入数据间的空间特征,Bi-LSTM更深层地提取3D数据序列的时间特征。该混合模型自动提取特征完成分类,实现骨骼数据到识别结果的端对端学习。在UTKinect-Action3D标准数据集上,模型的识别率达到97.5%,在自制Kinect数据集上的准确率达到98.6%,实验结果表明,该网络有效提高了分类准确率,具备可用性和有效性。相似文献

7.

Attention-based multimodal contextual fusion for sentiment and emotion classification using bidirectional LSTM

Huddar Mahesh G. Sannakki Sanjeev S. Rajpurohit Vijay S. 《Multimedia Tools and Applications》2021,80(9):13059-13076

Multimedia Tools and Applications - Due to the availability of an enormous amount of multimodal content on the social web and its applications, automatic sentiment analysis, and emotion detection... 相似文献

8.

HMM trajectory-guided sample selection for photo-realistic talking head

Lijuan Wang Frank K. Soong 《Multimedia Tools and Applications》2015,74(22):9849-9869

相似文献

9.

An hybrid deep learning approach for depression prediction from user tweets using feature-rich CNN and bi-directional LSTM

Kour Harnain Gupta Manoj K. 《Multimedia Tools and Applications》2022,81(17):23649-23685

Multimedia Tools and Applications - Depression has become one of the most widespread mental health disorders across the globe. Depression is a state of mind which affects how we think, feel, and... 相似文献

10.

RNN / LSTM with modified Adam optimizer in deep learning approach for automobile spare parts demand forecasting

Chandriah Kiran Kumar Naraganahalli Raghavendra V. 《Multimedia Tools and Applications》2021,80(17):26145-26159

The spare parts demand forecasting is very much essential for the organizations to minimize the cost and prevent the stock outs. The demand of spare parts/ car sales distribution is an important factor in inventory control. The valuation of the demand is challenging as the automobile spare parts/car sales demand are often recurrent. The renowned empirical method adopts historical demand data to create the distribution of lead time demand. Although it works reasonably well when service requirements are relatively low, it has difficulty reaching high target service levels. In this paper, we proposed Recurrent Neural Networks/ Long-Short Term Memory (RNN / LSTM) with modified Adam optimizer to predict the demand for spare parts. In this LSTM, weight vectors are generated respectively. These weights are optimized using the Modified-Adam algorithm. The accuracy of the forecast and the performance of the inventory are considered in the experimental result. Experimental results confirm that RNN / LSTM with a Modified-Adam works well with minimal error compared to other existing methods. We conclude that the proposed RNN/LSTM with Modified-Adam algorithm is well suited for the prediction of automobile spare parts.

相似文献

11.

Ensemble application of bidirectional LSTM and GRU for aspect category detection with imbalanced data

Kumar J. Ashok Abirami S. 《Neural computing & applications》2021,33(21):14603-14621

Neural Computing and Applications - E-commerce websites produce a large number of online reviews, posts, and comments about a product or service. These reviews are used to assist consumers in... 相似文献

12.

基于双向LSTM和GBDT的中医文本关系抽取模型

罗计根杜建强聂斌熊旺平刘蕾贺佳《计算机应用研究》2019,36(12)

为解决采用softmax作为长短期记忆网络分类器导致实体关系识别模型泛化能力不足,不能较好适用中医实体关系抽取等问题,提出一种融合梯度提升树的双向长短期记忆网络的关系识别算法（BILSTM-GBDT）。先采用word2vec对中医文本进行向量化表示,再利用基于注意力机制的双向长短期记忆网络提取高阶特征,最后采用集成分类模型梯度提升树作为特征分类器,提高关系识别效果。在中医等多个关系语料库上的实验结果表明,该模型与传统SVM方法、GBDT方法及其深度学习方法相比,均有更高的精确率、召回率和◢F◣值。相似文献

13.

Correction to: Attention-based multimodal contextual fusion for sentiment and emotion classification using bidirectional LSTM

Huddar Mahesh G. Sannakki Sanjeev S. Rajpurohit Vijay S. 《Multimedia Tools and Applications》2021,80(9):13077-13077

Multimedia Tools and Applications - A Correction to this paper has been published: https://doi.org/10.1007/s11042-021-10591-y 相似文献

14.

MVDLSTM: MultiView deep LSTM framework for online ride-hailing order prediction

Wu Yonghao Zhang Huyin Li Cong Tao Shiming Yang Fei 《The Journal of supercomputing》2022,78(6):8531-8559

The Journal of Supercomputing - Online ride-hailing order forecasting is a very important part of the intelligent traffic dispatch system. Accurate order forecasting can reduce the flow of invalid... 相似文献

15.

Sichuan dialect speech recognition with deep LSTM network

Wangyang YING Lei ZHANG Hongli DENG 《Frontiers of Computer Science》2020,14(2):378-387

In speech recognition research,because of the variety of languages,corresponding speech recognition systems need to be constructed for different languages.Especially in a dialect speech recognition system,there are many special words and oral language features.In addition,dialect speech data is very scarce.Therefore,constructing a dialect speech recognition system is difficult.This paper constructs a speech recognition system for Sichuan dialect by combining a hidden Markov model(HMM)and a deep long short-term memory(LSTM)network.Using the HMM-LSTM architecture,we created a Sichuan dialect dataset and implemented a speech recognition system for this dataset.Compared with the deep neural network(DNN),the LSTM network can overcome the problem that the DNN only captures the context of a fixed number of information items.Moreover,to identify polyphone and special pronunciation vocabularies in Sichuan dialect accurately,we collect all the characters in the dataset and their common phoneme sequences to form a lexicon.Finally,this system yields a 11.34%character error rate on the Sichuan dialect evaluation dataset.As far as we know,it is the best performance for this corpus at present. 相似文献

16.

ResLNet: deep residual LSTM network with longer input for action recognition

Tian WANG Jiakun LI Huai-Ning WU Ce LI Hichem SNOUSSI Yang WU 《Frontiers of Computer Science》2022,16(6):166334

Action recognition is an important research topic in video analysis that remains very challenging. Effective recognition relies on learning a good representation of both spatial information (for appearance) and temporal information (for motion). These two kinds of information are highly correlated but have quite different properties, leading to unsatisfying results of both connecting independent models (e.g., CNN-LSTM) and direct unbiased co-modeling (e.g., 3DCNN). Besides, a long-lasting tradition on this task with deep learning models is to just use 8 or 16 consecutive frames as input, making it hard to extract discriminative motion features. In this work, we propose a novel network structure called ResLNet (Deep Residual LSTM network), which can take longer inputs (e.g., of 64 frames) and have convolutions collaborate with LSTM more effectively under the residual structure to learn better spatial-temporal representations than ever without the cost of extra computations with the proposed embedded variable stride convolution. The superiority of this proposal and its ablation study are shown on the three most popular benchmark datasets: Kinetics, HMDB51, and UCF101. The proposed network could be adopted for various features, such as RGB and optical flow. Due to the limitation of the computation power of our experiment equipment and the real-time requirement, the proposed network is tested on the RGB only and shows great performance. 相似文献

17.

上下文相关的双向自举观点评价对象抽取方法

下载免费PDF全文

杨晓燕徐戈廖祥文《计算机工程与应用》2015,51(15):143-147

从大量的产品评论中进行观点评价对象的自动抽取是观点挖掘研究的重要课题,然而目前观点评价对象抽取结果只提供少量信息,因此提出一种基于上下文相关的双向自举方法同时获取产品名称和产品属性。该方法利用初始种子集、词性模板集获得候选观点评价对象,采用上下文相关的方法对文中所有包含候选观点评价对象的语句抽取出观点评价对象并进行边界识别,同时抽取观点评价对象的词性模板并计算分数,将分值高的模板加入模板集,这样重复迭代直到没有出现新的观点评价对象为止。实验结果表明采用上下文相关方法进行观点评价对象抽取相对于上下文无关的方法性能提高10%以上。相似文献

18.

融合统计方法和双向卷积LSTM的多维时序数据异常检测

夏英韩星雨《计算机应用研究》2022,39(5)

通过数据分析进行异常检测,有助于准确识别异常行为,从而提高服务质量和决策能力。然而,由于多维时序数据的时空依赖性以及异常事件发生的随机性,现有方法仍然存在一定的局限性。针对上述问题,提出一种融合新型统计方法和双向卷积LSTM的多维时序数据异常检测方法MBCLE。该方法引入堆叠的中值滤波处理输入数据中的点异常并平滑数据波动;设计双向卷积长短期记忆网络（Bi-ConvLSTM）和双向长短期记忆网络（Bi-LSTM）相结合的预测器进行数据建模和预测;通过双向循环指数加权移动平均（BrEWMA）平滑预测误差;使用动态阈值方法计算阈值以检测上下文异常。实验结果表明,MBCLE具有良好的检测性能,各步骤均对性能提升有所贡献。相似文献

19.

Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments

《Computer Speech and Language》2014,28(4):888-902

This article investigates speech feature enhancement based on deep bidirectional recurrent neural networks. The Long Short-Term Memory (LSTM) architecture is used to exploit a self-learnt amount of temporal context in learning the correspondences of noisy and reverberant with undistorted speech features. The resulting networks are applied to feature enhancement in the context of the 2013 2nd Computational Hearing in Multisource Environments (CHiME) Challenge track 2 task, which consists of the Wall Street Journal (WSJ-0) corpus distorted by highly non-stationary, convolutive noise. In extensive test runs, different feature front-ends, network training targets, and network topologies are evaluated in terms of frame-wise regression error and speech recognition performance. Furthermore, we consider gradually refined speech recognition back-ends from baseline ‘out-of-the-box’ clean models to discriminatively trained multi-condition models adapted to the enhanced features. In the result, deep bidirectional LSTM networks processing log Mel filterbank outputs deliver best results with clean models, reaching down to 42% word error rate (WER) at signal-to-noise ratios ranging from −6 to 9 dB (multi-condition CHiME Challenge baseline: 55% WER). Discriminative training of the back-end using LSTM enhanced features is shown to further decrease WER to 22%. To our knowledge, this is the best result reported for the 2nd CHiME Challenge WSJ-0 task yet. 相似文献

20.

A feature-based approach for individualized human head modeling 总被引：4，自引：0，他引：4

Y.-J. Liu M.M.-F. Yuen S. Xiong 《The Visual computer》2002,18(5-6):368-381

Published online: 18 September 2001 相似文献