期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

刘大运房国志骆天依魏华杰王倩李修政李骜《计算技术与自动化》2020,39(1):150-155

为了解决唇语识别中唇部特征提取和时序关系识别存在的问题,提出了一种双向长短时记忆网络(BiLSTM)和注意力机制(Attention Mechanism)相结合的深度学习模型。首先将唇部20个关键点得到的唇部不同位置的高度和宽度作为唇部的特征,使用BiLSTM对唇部特征序列进行时序编码,然后利用注意力机制来发掘不同时刻唇部时序特征对于整体唇语识别的不同权重,最后利用Softmax进行分类。在公开的唇语识别数据集GRID和MIRACL-VC上与传统的唇语识别模型进行实验对比。在GRID数据集上准确率至少提高了13.4%,在MIRACL-VC单词数据集上准确率至少提高了15.3%,短语数据集上准确率至少提高了9.2%。同时还与其他编码模型进行了实验对比,实验结果表明该模型能有效地提高唇语识别的准确率。相似文献

2.

HLR-Net: A Hybrid Lip-Reading Model Based on Deep Convolutional Neural Networks

Amany M. Sarhan Nada M. Elshennawy Dina M. Ibrahim 《计算机、材料和连续体（英文）》2021,68(2):1531-1549

Lip reading is typically regarded as visually interpreting the speaker’s lip movements during the speaking. This is a task of decoding the text from the speaker’s mouth movement. This paper proposes a lip-reading model that helps deaf people and persons with hearing problems to understand a speaker by capturing a video of the speaker and inputting it into the proposed model to obtain the corresponding subtitles. Using deep learning technologies makes it easier for users to extract a large number of different features, which can then be converted to probabilities of letters to obtain accurate results. Recently proposed methods for lip reading are based on sequence-to-sequence architectures that are designed for natural machine translation and audio speech recognition. However, in this paper, a deep convolutional neural network model called the hybrid lip-reading (HLR-Net) model is developed for lip reading from a video. The proposed model includes three stages, namely, pre-processing, encoder, and decoder stages, which produce the output subtitle. The inception, gradient, and bidirectional GRU layers are used to build the encoder, and the attention, fully-connected, activation function layers are used to build the decoder, which performs the connectionist temporal classification (CTC). In comparison with the three recent models, namely, the LipNet model, the lip-reading model with cascaded attention (LCANet), and attention-CTC (A-ACA) model, on the GRID corpus dataset, the proposed HLR-Net model can achieve significant improvements, achieving the CER of 4.9%, WER of 9.7%, and Bleu score of 92% in the case of unseen speakers, and the CER of 1.4%, WER of 3.3%, and Bleu score of 99% in the case of overlapped speakers.

相似文献

3.

高安全性人脸识别系统中的唇语识别算法研究

任玉强田国栋周祥东吕江靖周曦《计算机应用研究》2017,34(4)

针对目前人脸识别系统面临的图片和视频攻击,构建了一种将人脸识别与口令密码相结合,并采用唇语识别技术进行活体检测的高安全性身份认证系统。首先由于汉语唇语数据的缺少,建立了CNLIP1和CNLIP2两个较大的汉语唇语数据库;其次,为了保留唇语的时序性,采用堆叠卷积独立子空间分析（ISA）深度神经网络模型来实现唇动时序特征的提取;最后提出使用迁移学习算法来训练特定人唇语识别模型。实验证明,唇动时序特征能更好的表征出数字串唇语,迁移学习训练的特定人唇语模型能够满足活体检测的需要,所构建的高安全性人脸识别系统具有较好的防攻击效果。相似文献

4.

基于唇语识别的特征鉴别力分析 总被引：1，自引：0，他引：1

吕品轩王士林李生红《信息安全与通信保密》2008,1(5):60-62

唇语识别能够为语音识别提供辅助的视觉信息,大大提高了系统在噪声背景下的识别率。论文对基于唇语识别的特征辨别力进行了研究,选取有效的特征来表征嘴唇模型,所用嘴唇特征均来源于14点ASM模型,各种特征及组合采用HMM进行处理。实验表明,所采用的特征及特征组合能获得可观的识别率。相似文献

5.

基于句子级的唇语识别技术 总被引：1，自引：0，他引：1

徐铭辉姚鸿勋《计算机工程与应用》2005,41(8):86-88

唇读是通过口型变化判断话者内容,唇读研究属于人机交互范畴。识别在生物界大体包括两个方面,人物身份的鉴别和基本内容的认知。现阶段唇读的研究主要是基于说话者内容的识别,对采集的话者唇动图像序列进行有效的预处理(包括视频切割,图像增强,唇边缘定位),预处理后如何选择合适的特征是解决识别准确性的关键问题,在唇读识别研究中采用了基于口型变化序列的形状特征和图象特征进行实验。最后采用半连续hmm模型完成计算机唇语识别。相似文献

6.

基于色度分布差异性的唇部检测算法 总被引：1，自引：0，他引：1

张志文沈海斌《浙江大学学报(工学版)》2008,42(8):1355-1359

为使唇读系统准确定位唇部图像，对彩色人脸图像的肤色和唇色进行了色度分布研究.利用直方图分析R、G、B色度分量在肤色和唇色中的分布特性，在此基础上提出了一种唇部检测算法.该算法同时考虑G、B分量以及R、G分量的分布差异，并将两种差异相互加强用于唇部判别.通过2组实验验证了算法效果.将提出的算法和Red Exclusion算法进行比较，结果表明，该方法在有效性、鲁棒性以及不同人种的肤色支持等方面有明显的改进. 相似文献

7.

Multimodal interfaces

Alex Waibel Minh Tue Vo Paul Duchnowski Stefan Manke 《Artificial Intelligence Review》1996,10(3-4):299-319

In this paper, we present an overview of research in our laboratories on Multimodal Human Computer Interfaces. The goal for such interfaces is to free human computer interaction from the limitations and acceptance barriers due to rigid operating commands and keyboards as the only/main I/O-device. Instead we move to involve all available human communication modalities. These human modalities include Speech, Gesture and Pointing, Eye-Gaze, Lip Motion and Facial Expression, Handwriting, Face Recognition, Face Tracking, and Sound Localization. 相似文献

8.

计算机唇读研究进展

王晓平郝玉峰付德刚袁春伟《数据采集与处理》2007,22(3):353-359

计算机唇读是利用计算机对说话者的唇动等视觉语音信息进行分析以识别出其所说内容的过程，并可与听觉语音信息相融合以进一步提高计算机的识别率，从而使人机交互更加自然。本文从计算机唇读系统的各环节入手综述了该领域的研究进展，并讨论了现有诸方法的优缺点，最后提出了有待进一步研究的问题。相似文献

9.

基于内唇特征提取的唇语识别

王晓钟潘保昌郑胜林《计算机与现代化》2009,(2)

将现有唇读识别常用的双唇线口型模板简化优化,创建了单唇线即内唇口型模板,并结合内唇的灰度特征,利用相关函数的相似性匹配实现唇语识别.该方法在准确有效提取特征的同时降低了运算复杂度,实验数据为单纯的视觉信息,集合为单个发音时,识别率可达90%.实验证明新的尝试具可行性. 相似文献

10.

一种面向汉语语音识别的口形形状识别方法^* 总被引：3，自引：0，他引：3

钟晓周昌乐俞瑞钊《软件学报》1999,10(2):205-209

借助汉语发音口形的生理特点,在音素识别这一水平上进行汉语语音的辅助识别,具体给出了一种口形形状识别和灰度的统计方法及其具体实现.实验结果基本与理论估算相吻合,对5个元音的口形区别正确率在80%以上,为语言的声波识别提供了一种有利的辅助手段. 相似文献