基于时空注意力网络的中国手语识别 Chinese Sign Language Recognition Based on Spatial-Temporal Attention Network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于时空注意力网络的中国手语识别

引用本文：	罗元,李丹,张毅.基于时空注意力网络的中国手语识别[J].半导体光电,2020,41(3):414-419.

作者姓名：	罗元李丹张毅

作者单位：	重庆邮电大学光电工程学院重庆 400065;重庆邮电大学信息无障碍与服务机器人工程技术研究中心, 重庆 400065

基金项目：	国家自然科学基金项目(61801061); 重庆市教委科学技术研究项目(KJQN201800607).

摘要：	手语识别广泛应用于聋哑人与正常人之间的交流中。针对手语识别任务中时空特征提取不充分而导致识别率低的问题,提出了一种新颖的基于时空注意力的手语识别模型。首先提出了基于残差3D卷积网络(Residual 3DConvolutional Neural Network,Res3DCNN)的空间注意力模块,用来自动关注空间中的显著区域;随后提出了基于卷积长短时记忆网络(Convolutional Long Short-Term Memory,ConvLSTM)的时间注意力模块,用来衡量视频帧的重要性。所提算法的关键在于在空间中关注显著区域,并且在时间上自动选择关键帧。最后,在CSL手语数据集上验证了算法的有效性。
关键词：	手语识别时空注意力残差3D网络卷积LSTM网络
收稿时间：	2019/12/30 0:00:00
Chinese Sign Language Recognition Based on Spatial-Temporal Attention Network

LUO Yuan,LI Dan,ZHANG Yi.Chinese Sign Language Recognition Based on Spatial-Temporal Attention Network[J].Semiconductor Optoelectronics,2020,41(3):414-419.

Authors:	LUO Yuan LI Dan ZHANG Yi

Affiliation:	Institute of Photoelectric Engin.; Engin.Research Center for Information Accessibility and Service Robots, Chongqing University of Posts and Telecommunications, Chongqing 400065, CHN

Abstract:	Sign language recognition is widely used in communication between deaf-mute and ordinary people. In adequate extraction of spatial-temporal features in sign language recognition task is likely to result in low recognition rate. In this paper, proposed is a novel sign language recognition model based on spatial-temporal attention which can learn more discriminative spatial-temporal features. Specially, a new spatial attention module based on residual 3D convolutional neural network (Res3DCNN) is proposed, which automatically focus on the salient areas in the spatial region. Then, to measure the importance of video frames, a new temporal attention module based on convolutional long short-term memory (ConvLSTM) is introduced. The crucial purpose of the proposed model is to focus on the salient areas spatially and pay attention to the key video frames temporally. Lastly, experimental results demonstrate the efficiency of the proposed method on the Chinese sign language (CSL) dataset.

Keywords:	sign language recognition spatial-temporal attention Res3DCNN ConvLSTM
本文献已被维普等数据库收录！
	点击此处可从《半导体光电》浏览原始摘要信息
	点击此处可从《半导体光电》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏