首页 | 本学科首页   官方微博 | 高级检索  
     

基于残差网络和门控卷积网络的语音识别研究
引用本文:朱学超,张飞,高鹭,任晓颖,郝斌. 基于残差网络和门控卷积网络的语音识别研究[J]. 计算机工程与应用, 2022, 58(7): 185-191. DOI: 10.3778/j.issn.1002-8331.2108-0265
作者姓名:朱学超  张飞  高鹭  任晓颖  郝斌
作者单位:内蒙古科技大学 信息工程学院,内蒙古 包头 014000
基金项目:内蒙古自治区科技计划项目;政府间国际科技创新合作重点专项子项目
摘    要:由于传统循环神经网络具有复杂的结构,需要大量的数据才能在连续语音识别中进行正确训练,并且训练需要耗费大量的时间,对硬件性能要求很大.针对以上问题,提出了基于残差网络和门控卷积神经网络的算法,并结合联结时序分类算法,构建端到端中文语音识别模型.该模型将语谱图作为输入,通过残差网络提取高层抽象特征,然后通过堆叠门控卷积神经...

关 键 词:残差网络  门控卷积神经网络  联结时序分类  Swish激活函数

Research on Speech Recognition Based on Residual Network and Gated Convolution Network
ZHU Xuechao,ZHANG Fei,GAO Lu,REN Xiaoying,HAO Bin. Research on Speech Recognition Based on Residual Network and Gated Convolution Network[J]. Computer Engineering and Applications, 2022, 58(7): 185-191. DOI: 10.3778/j.issn.1002-8331.2108-0265
Authors:ZHU Xuechao  ZHANG Fei  GAO Lu  REN Xiaoying  HAO Bin
Affiliation:School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou, Inner Mongolia 014000, China
Abstract:Due to the complex structure of the traditional recurrent neural network, a large amount of data are needed to correctly train in continuous speech recognition, and the training takes a lot of time and requires a lot of hardware performance. In response to the above problems, an algorithm based on residual network and gated convolutional neural network is proposed, and combined with the connection sequence classification algorithm, an end-to-end Chinese speech recognition model is constructed. The model takes the spectrogram as input, extracts high-level abstract features through the residual network, and then captures effective long-term memory through the stacked gated convolutional neural network, getting rid of the traditional recurrent neural network’s dependence on contextual relevance modeling, and speeds up training speed of the model. Among them, the residual network is optimized, and the feedforward neural network is added to the gated convolutional neural network, which greatly improves the performance of the model. Experimental results show that on the Aishell-1 Chinese data set, the word error rate of the model is reduced to 11.43%; and in the environment of ?5?dB low signal-to-noise ratio, the word error rate reaches 19.77%.
Keywords:residual network   gated convolutional neural network   connectionist temporal classification   Swish activation function  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号