基于卷积循环网络与非局部模块的语音增强方法 Speech Enhancement Method Based on Convolutional Recurrent Network and Non-Local Module期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于卷积循环网络与非局部模块的语音增强方法

引用本文：	李辉,景浩,严康华,徐良浩.基于卷积循环网络与非局部模块的语音增强方法[J].电子科技,2022,35(3):8-15.

作者姓名：	李辉景浩严康华徐良浩

作者单位：	1. 河南理工大学物理与电子信息学院,河南焦作 4540002. 河南理工大学电气工程与自动化学院,河南焦作 454000

基金项目：	国家自然科学基金;河南省基础与前沿技术研究计划

摘要：	现有的深度神经网络语音增强方法忽视了相位谱学习的重要性,从而造成增强语音质量不理想。针对这一问题,文中提出了一种基于卷积循环网络与非局部模块的语音增强方法。通过设计一种编解码网络,将语音信号的时域表示作为编码端的输入进行深层特征提取,从而充分利用语音信号的幅值信息以及相位信息。在编码端和解码端的卷积层中加入非局部模块,在提取语音序列关键特征的同时,抑制无用特征,并引入门控循环单元网络捕捉语音序列间的时序相关性信息。在ST-CMDS中文语音数据集上实验结果表明,与未处理的含噪语音相比,使用文中方法生成的增强语音质量和可懂度平均提升了61%和7.93%。
关键词：	语音增强深度神经网络卷积循环网络非局部模块监督学习门控循环单元幅值谱相位谱
收稿时间：	2020-11-16
Speech Enhancement Method Based on Convolutional Recurrent Network and Non-Local Module

LI Hui,JING Hao,YAN Kanghua,XU Lianghao.Speech Enhancement Method Based on Convolutional Recurrent Network and Non-Local Module[J].Electronic Science and Technology,2022,35(3):8-15.

Authors:	LI Hui JING Hao YAN Kanghua XU Lianghao

Affiliation:	1. School of Physics and Electronic Information Engineering,Henan Polytechnic University,Jiaozuo 454000,China2. School of Electrical Engineering and Automation,Henan Polytechnic University,Jiaozuo 454000,China

Abstract:	The existing deep neural network speech enhancement methods ignore the importance of phase spectrum learning and cause the enhanced speech quality to be unsatisfactory. In view of this problem, a speech enhancement method based on convolutional recurrent network and non-local modules is proposed in the present study. By designing an encoder-decoder network, the time-domain representation of the speech signal is used as the input of the encoding end for deep feature extraction, so as to make full use of the amplitude information and phase information of the speech signal. Non-local modules are added to the convolutional layers of the encoder and decoder to extract key features of the speech sequence while suppressing useless features. A gated loop unit network is introduced to capture the timing correlation information between the speech sequences. The experimental results on the ST-CMDS Chinese speech dataset show that compared with the unprocessed noisy speech, the quality and intelligibility of the enhanced speech are improved by 61% and 7.93% on average.

Keywords:	speech enhancement deep neural network convolutional recurrent network non-local module supervised learning gated recurrent unit magnitude spectrum phase spectrum
本文献已被万方数据等数据库收录！
	点击此处可从《电子科技》浏览原始摘要信息
	点击此处可从《电子科技》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏