基于混合式注意力机制的语音识别研究 Research on speech recognition based on hybrid attention mechanism期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于混合式注意力机制的语音识别研究

引用本文：	李业良,张二华,唐振民.基于混合式注意力机制的语音识别研究[J].计算机应用研究,2020,37(1):131-134.

作者姓名：	李业良张二华唐振民

作者单位：	南京理工大学计算机科学与工程学院,南京210094;南京理工大学计算机科学与工程学院,南京210094;南京理工大学计算机科学与工程学院,南京210094

基金项目：	军委装备发展部"十三五"装备预研基金资助项目

摘要：	为了解决语音识别中基于卷积位置信息的混合式注意力机制无法提取长期有效位置信息的问题，提出了一种捕捉长期有效位置信息的新型混合式注意力机制。首先，对当前时刻生成的注意力得分作卷积来提取多通道特征图，并通过全局平均池化来得到恒定维度的特征向量；接着，引入长短期记忆网络（long short-term memory，LSTM）单元作为外部记忆模块，并以生成的特征向量作为输入，生成下一时刻的位置信息向量；最后，结合经典的LAS（listen，attend and spell）模型来验证提出方案的有效性。实验结果表明，该方案能充分考虑过去多个时刻的注意力得分。相对于基于卷积位置信息的LAS模型，该方案在纯净和含噪语音数据集上取得的标签错误率分别减少了1.8%和2.21%。
关键词：	卷积注意力机制全局平均池化长短期记忆网络 LAS模型
收稿时间：	2018/6/3 0:00:00
修稿时间：	2019/11/30 0:00:00
Research on speech recognition based on hybrid attention mechanism

Li Yeliang,Zhang Erhua and Tang Zhenmin.Research on speech recognition based on hybrid attention mechanism[J].Application Research of Computers,2020,37(1):131-134.

Authors:	Li Yeliang Zhang Erhua and Tang Zhenmin

Affiliation:	Nanjing University of Science & Technology,,

Abstract:	In speech recognition, the convolution location-based hybrid attention mechanism can not extract location information that can be valid over long term. This paper proposed a new hybrid attention mechanism to solve this problem. Firstly it convolved with the attention score generated for the current time to extract multi-channel features, followed by obtaining the feature vectors of constant dimensions via global average pooling. Then it introduced a LSTM unit as the external memory module and used the generated feature vectors as input to generate the location vectors for the next time point. Finally this paper used the classic LAS model to verify the effect of the new hybrid attention mechanism. Experiment results show that the new hybrid attention mechanism can take full consideration of the attention scores at many past time points. Compared to the convolution location-based LAS model, the label error rate of the proposed scheme on pure and noisy speech datasets is reduced by 1.8% and 2.21%, respectively.

Keywords:	convolution attention mechanism global average pooling LSTM LAS model
本文献已被万方数据等数据库收录！
	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏