基于门控残差卷积编解码网络的单通道语音增强方法 Single-channel Speech Enhancement Method Based on Gated Residual Convolution Encoder-and-Decoder Network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于门控残差卷积编解码网络的单通道语音增强方法

引用本文：	张天骐,柏浩钧,叶绍鹏,刘鉴兴. 基于门控残差卷积编解码网络的单通道语音增强方法[J]. 信号处理, 2021, 37(10): 1986-1995. DOI: 10.16798/j.issn.1003-0530.2021.10.023

作者姓名：	张天骐柏浩钧叶绍鹏刘鉴兴

作者单位：	重庆邮电大学通信与信息工程学院, 信号与信息处理重庆市重点实验室

基金项目：	国家自然科学基金项目(61671095，61702065, 61701067, 61771085) ;信号与信息处理重庆市市级重点实验室建设项目(CSTC2009CA2003);重庆市研究生科研创新项目(CYS19248 );重庆市教育委员会科研项目(KJ1600427, KJ1600429)

摘要：	针对卷积编解码网络（CED, Convolution encoder-and-decoder）对语音时序相关信息捕获困难的问题，本文提出了一种基于门控残差卷积编解码网络的语音增强方法。该方法在卷积编解码网络的基础上引入了门控机制、膨胀卷积与残差连接:门控机制能够很好地处理序列前后相关信息；膨胀卷积使得卷积过程获得更大的感受野，提取更加丰富的全局信息；残差连接能够防止梯度消失与梯度爆炸，提升网络精度。此外，采用频域损失函数与时域评价指标联合优化的策略对网络进行训练，以进一步提升网络增强效果。实验表明，在匹配噪声和不匹配噪声下，相比于基线CED与其他对比方法，本文方法取得了更高的PESQ、STOI与SI-SDR，对语音的清浊音都有较好恢复效果，且具有较强的泛化能力。
关键词：	语音增强门控机制卷积编解码网络残差连接
收稿时间：	2021-04-08
Single-channel Speech Enhancement Method Based on Gated Residual Convolution Encoder-and-Decoder Network

Affiliation:	School of Communication and Information Engineering, Chongqing Key Laboratory of Signal and Information Processing (CQKLS&IP), Chongqing University of Posts and Telecommunications (CQUPT)

Abstract:	In order to solve the problem that it is difficult for Convolution Encoder-and-Decoder (CED) network to capture temporal related contexts of speech, a speech enhancement method based on gated residuals convolution encoder-and-decoder network is proposed. Based on CED, this proposed method introduces the gating mechanism, dilated convolution and residual connection to the network: The gating mechanism can well handle the relevant contexts of sequence; Dilated convolution makes the convolution process obtain larger receptive field and extract more abundant global information; Residual connection can prevent vanishing gradient and exploding gradient and improve network accuracy. In addition, the combined optimization strategy of frequency-domain loss function and time-domain evaluation index is adopted to train the network to further improve the enhancement effect of propose network. Experimental results show that, compared with the baseline CED and other comparison methods, the proposed method achieves higher PESQ, STOI and SI-SDR under matched noise and mismatched noise, and it has a good recovery effect on the voiceless and voiced sounds of speech and has strong generalization ability.

Keywords:

	点击此处可从《信号处理》浏览原始摘要信息
	点击此处可从《信号处理》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏