融合注意力机制和连接时序分类的多模态手语识别 Modelling Long-Term Temporal Relationship and Spatial Attention for Multi-Modal Sign Language Recognition期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

融合注意力机制和连接时序分类的多模态手语识别

引用本文：	王军,鹿姝,李云伟.融合注意力机制和连接时序分类的多模态手语识别[J].信号处理,2020,36(9):1429-1439.

作者姓名：	王军鹿姝李云伟

作者单位：	中国矿业大学信息与控制工程学院

基金项目：	国家自然科学基金(61876184)；中国矿业大学重大项目培育专项（2020ZDPY0217）

摘要：	连续手语识别的难点之一是手语数据中存在时空维度的冗余信息，以及手语数据与给定标签序列的对齐问题。因此，本文提出一种融合注意力机制和连接时序分类的连续手语识别模型，可以提取手语数据中彩色和深度视频片段的短期时空特征以及手部运动轨迹特征，将三种模态的特征融合后使用空间注意力加权并按照时间顺序输入到双向长短期记忆网络中进行时序建模，以获取长期时空特征，最后利用融合注意力机制和连接时序分类模型的解码网络以端到端的方式实现连续手语的准确识别。本模型在自行采集的中国手语数据集上进行测试，得到了高达0.935的准确率。
关键词：	手语识别三维卷积神经网络长短期记忆网络注意力机制连接时序分类
收稿时间：	2020-03-04
Modelling Long-Term Temporal Relationship and Spatial Attention for Multi-Modal Sign Language Recognition

Affiliation:	College of Information and Control Engineering, China University of Mining and Technology

Abstract:	One of the difficulties in continuous sign language recognition is the redundant information in the spatio-temporal dimension of the sign language data, and the alignment of the sign language data with a given label sequence . Therefore, we propose a sign language sentence recognition model that combines attention mechanism and connected temporal classification, which can extract short-term spatio-temporal features of color and depth video segments and hand motion trajectories in sign language data. To obtain the long-term spatio-temporal features, the features of the three modals are fused and weighted using spatial attention, then input into the bidirectional long short term memory network in time sequence for time series modeling. Finally, decoder network that integrates the attention mechanism and the connection temporal classification model is used end-to-end to achieve accurate recognition of continuous sign language. This model was tested on a Chinese sign language data set collected by ourselves, and obtained a high accuracy rate 0.943.

Keywords:

	点击此处可从《信号处理》浏览原始摘要信息
	点击此处可从《信号处理》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏