首页 | 本学科首页   官方微博 | 高级检索  
     

语音增强与检测的多任务学习方法研究
引用本文:王师琦,曾庆宁,龙超,熊松龄,祁潇潇.语音增强与检测的多任务学习方法研究[J].计算机工程与应用,2021,57(20):197-202.
作者姓名:王师琦  曾庆宁  龙超  熊松龄  祁潇潇
作者单位:桂林电子科技大学 信息与通信学院,广西 桂林 541004
摘    要:在许多语音信号处理的实际应用中,都要求系统能够低延迟地实时处理多个任务,并且对噪声要有很强的鲁棒性。针对上述问题,提出了一种语音增强和语音活动检测(Voice Activity Detection,VAD)的多任务深度学习模型。该模型通过引入长短时记忆(Long Short-Term Memory,LSTM)网络,构建了一个适合于实时在线处理的因果系统。基于语音增强和VAD的强相关性,该模型以硬参数共享的方式连接了两个任务的输出层,不仅减少了计算量,还通过多任务学习提高了任务的泛化能力。实验结果表明,相较串行处理两个任务的基线模型,多任务模型在语音增强结果非常相近、VAD结果更优的情况下,其速度快了44.2%,这对于深度学习模型的实际应用和部署将具有重要的意义。

关 键 词:多任务学习  深度学习  语音增强  语音活动检测  

Multi-task Learning for Speech Enhancement and Detection
WANG Shiqi,ZENG Qingning,LONG Chao,XIONG Songling,QI Xiaoxiao.Multi-task Learning for Speech Enhancement and Detection[J].Computer Engineering and Applications,2021,57(20):197-202.
Authors:WANG Shiqi  ZENG Qingning  LONG Chao  XIONG Songling  QI Xiaoxiao
Affiliation:School of Information and Communication, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China
Abstract:In many real-world applications of speech signal processing, real-time multi-task processing with low latency and strong robustness to noise is highly required. To solve the problem, a multi-task deep learning model of speech enhancement and Voice Activity Detection(VAD) is proposed. This model constructs a causal system suitable for real-time online processing by introducing a Long Short-Term Memory(LSTM) network. Based on the strong correlation between speech enhancement and VAD, the output layers of two tasks are connected using hard parameter sharing which lead a reduction of the number of parameters and an improvement of generalization ability of tasks through multi-task learning. Experimental results show that, processing speed of multi-task model improves considerably to 44.2% compared with the serial processing of baseline models with similar speech enhancement results and better VAD results, which is a great significance for the application and deployment of the deep learning model.
Keywords:multi-task learning  deep learning  speech enhancement  voice activity detection  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号