基于卷积编解码器和门控循环单元的语音分离算法 Speech separation algorithm based on convolutional encoder decoder and gated recurrent unit期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于卷积编解码器和门控循环单元的语音分离算法

引用本文：	陈修凯,陆志华,周宇. 基于卷积编解码器和门控循环单元的语音分离算法[J]. 计算机应用, 2020, 40(7): 2137-2141. DOI: 10.11772/j.issn.1001-9081.2019111968

作者姓名：	陈修凯陆志华周宇

作者单位：	宁波大学信息科学与工程学院, 浙江宁波 315211

基金项目：	国家自然科学基金青年科学基金资助项目（61801255）。

摘要：	在大部分基于深度学习的语音分离和语音增强算法中，把傅里叶变换后的频谱特征作为神经网络的输入特征，并未考虑到语音信号中的相位信息。然而过去的一些研究表明，尤其是在低信噪比（SNR）条件下，相位信息对于提高语音质量是必不可少的。针对这个问题，提出了一种基于卷积编解码器网络和门控循环单元（CED-GRU）的语音分离算法。首先，利用原始波形既包含幅值信息也包含相位信息的特点，在输入端以混合语音信号的原始波形作为输入特征；其次，通过结合卷积编解码器（CED）网络和门控循环单元（GRU）网络，可以有效解决语音信号中存在的时序问题。提出的改进算法在男性和男性、男性和女性、女性和女性的语音质量的感知评价（PESQ）和短时目标可懂度（STOI）方面，与基于排列不变训练（PIT）算法、基于深度聚类（DC）算法、基于深度吸引网络（DAN）算法相比，分别提高了1.16和0.29、1.37和0.27、1.08和0.3；0.87和0.21、1.11和0.22、0.81和0.24；0.64和0.24、1.01和0.34、0.73和0.29个百分点。实验结果表明，基于CED-GRU的语音分离系统在实际应用中具有较大的价值。
关键词：	卷积神经网络卷积编解码器门控循环单元端到端语音分离
收稿时间：	2019-11-19
修稿时间：	2020-03-10
Speech separation algorithm based on convolutional encoder decoder and gated recurrent unit

CHEN Xiukai,LU Zhihua,ZHOU Yu. Speech separation algorithm based on convolutional encoder decoder and gated recurrent unit[J]. Journal of Computer Applications, 2020, 40(7): 2137-2141. DOI: 10.11772/j.issn.1001-9081.2019111968

Authors:	CHEN Xiukai LU Zhihua ZHOU Yu

Affiliation:	College of Information Science and Engineering, Ningbo University, Ningbo Zhejiang 315211, China

Abstract:	In most speech separation and speech enhancement algorithms based on deep learning, the spectrum feature after Fourier transform is used as the input feature of the neural network, without considering the phase information in the speech signal. However, some previous studies show that phase information is essential to improve speech quality, especially at low Signal-to-Noise Ratio (SNR). To solve this problem, a speech separation algorithm based on Convolutional Encoder Decoder network and Gated Recurrent Unit (CED-GRU) network was proposed. Firstly, based on the characteristic that the original waveform contains both amplitude information and phase information, the original waveform of the mixed speech signal was used as the input feature. Secondly, the timing problem in speech signal was able to be effectively solved by combining the Convolutional Encoder Decoder (CED) network and the Gated Recurrent Unit (GRU) network. Compared with Permutation Invariant Training (PIT) algorithm, DC (Deep Clustering) algorithm, Deep Attractor Network (DAN) algorithm, the improved algorithm has the Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI) of men and men, men and women, women and women increased by 1.16 and 0.29, 1.37 and 0.27, 1.08 and 0.3; 0.87 and 0.21, 1.11 and 0.22, 0.81 and 0.24; 0.64 and 0.24, 1.01 and 0.34, 0.73 and 0.29 percentage points. The experimental results show that the speech separation system based on CED-GRU has great value in practical application.

Keywords:	Convolutional Neural Network (CNN) Convolutional Encoder Decoder (CED) Gated Recurrent Unit (GRU) end-to-end speech separation
本文献已被万方数据等数据库收录！
	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏