语音增强与检测的多任务学习方法研究 Multi-task Learning for Speech Enhancement and Detection期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

语音增强与检测的多任务学习方法研究

引用本文：	王师琦,曾庆宁,龙超,熊松龄,祁潇潇.语音增强与检测的多任务学习方法研究[J].计算机工程与应用,2021,57(20):197-202.

作者姓名：	王师琦曾庆宁龙超熊松龄祁潇潇

作者单位：	桂林电子科技大学信息与通信学院，广西桂林 541004

摘要：	在许多语音信号处理的实际应用中，都要求系统能够低延迟地实时处理多个任务，并且对噪声要有很强的鲁棒性。针对上述问题，提出了一种语音增强和语音活动检测（Voice Activity Detection，VAD）的多任务深度学习模型。该模型通过引入长短时记忆（Long Short-Term Memory，LSTM）网络，构建了一个适合于实时在线处理的因果系统。基于语音增强和VAD的强相关性，该模型以硬参数共享的方式连接了两个任务的输出层，不仅减少了计算量，还通过多任务学习提高了任务的泛化能力。实验结果表明，相较串行处理两个任务的基线模型，多任务模型在语音增强结果非常相近、VAD结果更优的情况下，其速度快了44.2%，这对于深度学习模型的实际应用和部署将具有重要的意义。
关键词：	多任务学习深度学习语音增强语音活动检测
Multi-task Learning for Speech Enhancement and Detection

WANG Shiqi,ZENG Qingning,LONG Chao,XIONG Songling,QI Xiaoxiao.Multi-task Learning for Speech Enhancement and Detection[J].Computer Engineering and Applications,2021,57(20):197-202.

Authors:	WANG Shiqi ZENG Qingning LONG Chao XIONG Songling QI Xiaoxiao

Affiliation:	School of Information and Communication, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China

Abstract:	In many real-world applications of speech signal processing, real-time multi-task processing with low latency and strong robustness to noise is highly required. To solve the problem, a multi-task deep learning model of speech enhancement and Voice Activity Detection（VAD） is proposed. This model constructs a causal system suitable for real-time online processing by introducing a Long Short-Term Memory（LSTM） network. Based on the strong correlation between speech enhancement and VAD, the output layers of two tasks are connected using hard parameter sharing which lead a reduction of the number of parameters and an improvement of generalization ability of tasks through multi-task learning. Experimental results show that, processing speed of multi-task model improves considerably to 44.2% compared with the serial processing of baseline models with similar speech enhancement results and better VAD results, which is a great significance for the application and deployment of the deep learning model.

Keywords:	multi-task learning deep learning speech enhancement voice activity detection
本文献已被万方数据等数据库收录！
	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏