首页 | 官方网站   微博 | 高级检索  
     

基于残差网络和门控卷积网络的语音识别研究
引用本文:朱学超,张飞,高鹭,任晓颖,郝斌.基于残差网络和门控卷积网络的语音识别研究[J].计算机工程与应用,2022,58(7):185-191.
作者姓名:朱学超  张飞  高鹭  任晓颖  郝斌
作者单位:内蒙古科技大学 信息工程学院,内蒙古 包头 014000
基金项目:内蒙古自治区科技计划项目;政府间国际科技创新合作重点专项子项目
摘    要:由于传统循环神经网络具有复杂的结构,需要大量的数据才能在连续语音识别中进行正确训练,并且训练需要耗费大量的时间,对硬件性能要求很大。针对以上问题,提出了基于残差网络和门控卷积神经网络的算法,并结合联结时序分类算法,构建端到端中文语音识别模型。该模型将语谱图作为输入,通过残差网络提取高层抽象特征,然后通过堆叠门控卷积神经网络捕获有效的长时间记忆,摆脱了传统循环神经网络对上下文相关性建模的依赖,加快了模型的训练速度。对残差网络进行了优化,并在门控卷积神经网络中加入了前馈神经网络,极大提高了模型的性能。实验结果表明,在Aishell-1中文数据集上,该模型的字错误率降低至11.43%;并且在?5?dB低信噪比环境下,字错误率达到了19.77%。

关 键 词:残差网络  门控卷积神经网络  联结时序分类  Swish激活函数  

Research on Speech Recognition Based on Residual Network and Gated Convolution Network
ZHU Xuechao,ZHANG Fei,GAO Lu,REN Xiaoying,HAO Bin.Research on Speech Recognition Based on Residual Network and Gated Convolution Network[J].Computer Engineering and Applications,2022,58(7):185-191.
Authors:ZHU Xuechao  ZHANG Fei  GAO Lu  REN Xiaoying  HAO Bin
Affiliation:School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou, Inner Mongolia 014000, China
Abstract:Due to the complex structure of the traditional recurrent neural network, a large amount of data are needed to correctly train in continuous speech recognition, and the training takes a lot of time and requires a lot of hardware performance. In response to the above problems, an algorithm based on residual network and gated convolutional neural network is proposed, and combined with the connection sequence classification algorithm, an end-to-end Chinese speech recognition model is constructed. The model takes the spectrogram as input, extracts high-level abstract features through the residual network, and then captures effective long-term memory through the stacked gated convolutional neural network, getting rid of the traditional recurrent neural network’s dependence on contextual relevance modeling, and speeds up training speed of the model. Among them, the residual network is optimized, and the feedforward neural network is added to the gated convolutional neural network, which greatly improves the performance of the model. Experimental results show that on the Aishell-1 Chinese data set, the word error rate of the model is reduced to 11.43%; and in the environment of ?5?dB low signal-to-noise ratio, the word error rate reaches 19.77%.
Keywords:residual network  gated convolutional neural network  connectionist temporal classification  Swish activation function  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号