Improving Deep Attractor Network by BGRU and GMM for Speech Separation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Improving Deep Attractor Network by BGRU and GMM for Speech Separation

Authors:	Rawad Melhem Assef Jafar Riad Hamadeh

Affiliation:	Higher Institute for Applied Sciences and Technology, Damascus, Syria

Abstract:	Deep Attractor Network (DANet) is the state-of-the-art technique in speech separation field, which uses Bidirectional Long Short-Term Memory (BLSTM), but the complexity of the DANet model is very high. In this paper, a simplified and powerful DANet model is proposed using Bidirectional Gated neural network (BGRU) instead of BLSTM. The Gaussian Mixture Model (GMM) other than the k-means was applied in DANet as a clustering algorithm to reduce the complexity and increase the learning speed and accuracy. The metrics used in this paper are Signal to Distortion Ratio (SDR), Signal to Interference Ratio (SIR), Signal to Artifact Ratio (SAR), and Perceptual Evaluation Speech Quality (PESQ) score. Two speaker mixture datasets from TIMIT corpus were prepared to evaluate the proposed model, and the system achieved 12.3 dB and 2.94 for SDR and PESQ scores respectively, which were better than the original DANet model. Other improvements were 20.7% and 17.9% in the number of parameters and time training respectively. The model was applied on mixed Arabic speech signals and the results were better than that in English.

Keywords:	attractor network speech separation gated recurrent units

	点击此处可从《哈尔滨工业大学学报(英文版)》浏览原始摘要信息
	点击此处可从《哈尔滨工业大学学报(英文版)》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏