首页 | 本学科首页   官方微博 | 高级检索  
     

基于卷积循环网络与非局部模块的语音增强方法
引用本文:李辉,景浩,严康华,徐良浩.基于卷积循环网络与非局部模块的语音增强方法[J].电子科技,2022,35(3):8-15.
作者姓名:李辉  景浩  严康华  徐良浩
作者单位:1. 河南理工大学 物理与电子信息学院,河南 焦作 4540002. 河南理工大学 电气工程与自动化学院,河南 焦作 454000
基金项目:国家自然科学基金;河南省基础与前沿技术研究计划
摘    要:现有的深度神经网络语音增强方法忽视了相位谱学习的重要性,从而造成增强语音质量不理想。针对这一问题,文中提出了一种基于卷积循环网络与非局部模块的语音增强方法。通过设计一种编解码网络,将语音信号的时域表示作为编码端的输入进行深层特征提取,从而充分利用语音信号的幅值信息以及相位信息。在编码端和解码端的卷积层中加入非局部模块,在提取语音序列关键特征的同时,抑制无用特征,并引入门控循环单元网络捕捉语音序列间的时序相关性信息。在ST-CMDS中文语音数据集上实验结果表明,与未处理的含噪语音相比,使用文中方法生成的增强语音质量和可懂度平均提升了61%和7.93%。

关 键 词:语音增强  深度神经网络  卷积循环网络  非局部模块  监督学习  门控循环单元  幅值谱  相位谱  
收稿时间:2020-11-16

Speech Enhancement Method Based on Convolutional Recurrent Network and Non-Local Module
LI Hui,JING Hao,YAN Kanghua,XU Lianghao.Speech Enhancement Method Based on Convolutional Recurrent Network and Non-Local Module[J].Electronic Science and Technology,2022,35(3):8-15.
Authors:LI Hui  JING Hao  YAN Kanghua  XU Lianghao
Affiliation:1. School of Physics and Electronic Information Engineering,Henan Polytechnic University,Jiaozuo 454000,China2. School of Electrical Engineering and Automation,Henan Polytechnic University,Jiaozuo 454000,China
Abstract:The existing deep neural network speech enhancement methods ignore the importance of phase spectrum learning and cause the enhanced speech quality to be unsatisfactory. In view of this problem, a speech enhancement method based on convolutional recurrent network and non-local modules is proposed in the present study. By designing an encoder-decoder network, the time-domain representation of the speech signal is used as the input of the encoding end for deep feature extraction, so as to make full use of the amplitude information and phase information of the speech signal. Non-local modules are added to the convolutional layers of the encoder and decoder to extract key features of the speech sequence while suppressing useless features. A gated loop unit network is introduced to capture the timing correlation information between the speech sequences. The experimental results on the ST-CMDS Chinese speech dataset show that compared with the unprocessed noisy speech, the quality and intelligibility of the enhanced speech are improved by 61% and 7.93% on average.
Keywords:speech enhancement  deep neural network  convolutional recurrent network  non-local module  supervised learning  gated recurrent unit  magnitude spectrum  phase spectrum  
本文献已被 万方数据 等数据库收录!
点击此处可从《电子科技》浏览原始摘要信息
点击此处可从《电子科技》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号