首页 | 本学科首页   官方微博 | 高级检索  
     

基于CNN-SVM性别组合分类的单通道语音分离
引用本文:孙林慧,张蒙,梁文清.基于CNN-SVM性别组合分类的单通道语音分离[J].信号处理,2022,38(12):2519-2531.
作者姓名:孙林慧  张蒙  梁文清
作者单位:南京邮电大学通信与信息工程学院,江苏 南京 210003
基金项目:国家自然科学基金61901227中国国家留学基金资助202008320043
摘    要:实际语音分离时,混合语音的说话人性别组合相关信息往往是未知的。若直接在普适的模型上进行分离,语音分离效果欠佳。为了更好地进行语音分离,本文提出一种基于卷积神经网络-支持向量机(CNN-SVM)的性别组合判别模型,来确定混合语音的两个说话人是男-男、男-女还是女-女组合,以便选用相应性别组合的分离模型进行语音分离。为了弥补传统单一特征表征性别组合信息不足的问题,本文提出一种挖掘深度融合特征的策略,使分类特征包含更多性别组合类别的信息。本文的基于CNN-SVM性别组合分类的单通道语音分离方法,首先使用卷积神经网络挖掘梅尔频率倒谱系数和滤波器组特征的深度特征,融合这两种深度特征作为性别组合的分类特征,然后利用支持向量机对混合语音性别组合进行识别,最后选择对应性别组合的深度神经网络/卷积神经网络(DNN/CNN)模型进行语音分离。实验结果表明,与传统的单一特征相比,本文所提的深度融合特征可以有效提高混合语音性别组合的识别率;本文所提的语音分离方法在主观语音质量评估(PESQ)、短时客观可懂度(STOI)、信号失真比(SDR)指标上均优于普适的语音分离模型。 

关 键 词:??性别组合识别    卷积神经网络-支持向量机    单通道语音分离    深度特征
收稿时间:2022-05-18

CNN-SVM Gender Combination Classification Based Single-channel Speech Separation
Affiliation:College of Telecommunications & Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing,Jiangsu 210003,China
Abstract:? ?In actual speech separation, the information related to the speaker gender combination of mixed speech is often unknown. If the mixed speech is separated directly on the universal model, the performance of speech separation is not satisfactory. In order to better carry out speech separation, a gender combination discrimination model based on convolutional neural network (CNN)-support vector machine (SVM) was proposed in this paper, which determined that the gender group of mixture speech is male-male, male-female or female-female, so as to select the corresponding gender separation model for speech separation task. To make up for the lack of gender combination information represented by traditional single feature, a strategy of mining deep fusion features was also proposed, so that the classification features contained more information of gender combination categories. The proposed single-channel speech separation method based on CNN-SVM gender combination classification first used CNN to mine the deep features of Mel frequency cepstrum coefficients and filter bank features, and fused these two deep features as gender combination classification features. Then, SVM was used to recognize the gender combination of mixed speech. Finally, the deep neural network (DNN) or CNN model corresponding to gender combination was selected for speech separation. The experimental results show that compared with the traditional single feature, the deep fusion feature proposed can effectively improve the recognition rate of gender combination of mixed speech. In signal distortion ratio (SDR), perceptual evaluation of speech quality (PESQ) and short-time target intelligibility (STOI), the proposed speech separation method is superior to the universal speech separation model. 
Keywords:
点击此处可从《信号处理》浏览原始摘要信息
点击此处可从《信号处理》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号