基于非线性堆叠双向网络的端到端声纹识别 End to End Voiceprint Recognition Based on Nonlinear Stacked Bidirectional Network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于非线性堆叠双向网络的端到端声纹识别

引用本文：	王芷悦,崔琳. 基于非线性堆叠双向网络的端到端声纹识别[J]. 计算机与现代化, 2022, 0(3): 13-17. DOI: 10.3969/j.issn.1006-2475.2022.03.003

作者姓名：	王芷悦崔琳

作者单位：	西安工程大学电子信息学院,陕西西安 710699,西安工程大学电子信息学院,陕西西安 710699;西北工业大学航海学院,陕西西安 710072

摘要：	传统声纹识别方法过程繁琐且识别率低，现有的深度学习方法所使用的神经网络对语音信号没有针对性从而导致识别精度不够。针对上述问题，本文提出一种基于非线性堆叠双向LSTM的端到端声纹识别方法。首先，对原始语音文件提取出Fbank特征用于网络模型的输入。然后，针对语音信号连续且前后关联性强的特点，构建双向长短时记忆网络处理语音数据提取深度特征，为进一步增强网络的非线性表达能力，利用堆叠多层双向LSTM层和多层非线性层实现对语音信号更深层次抽象特征的提取。最后，使用SGD优化器优化训练方式。实验结果表明提出的方法能够充分利用语音序列信号特征，具有较强的时序全面性和非线性表达能力，所构造模型整体性强，比GRU和LSTM等模型具有更好的识别效果。
关键词：	声纹识别端到端时序特征长短时记忆堆叠网络非线性
收稿时间：	2022-04-29
End to End Voiceprint Recognition Based on Nonlinear Stacked Bidirectional Network

WANG Zhi-yue,CUI Lin. End to End Voiceprint Recognition Based on Nonlinear Stacked Bidirectional Network[J]. Computer and Modernization, 2022, 0(3): 13-17. DOI: 10.3969/j.issn.1006-2475.2022.03.003

Authors:	WANG Zhi-yue CUI Lin

Abstract:	The traditional voiceprint recognition method is cumbersome and has a low recognition rate. The neural network used in the existing deep learning method is not specific to the speech signal, resulting in insufficient recognition accuracy. To solve the above problems, this paper proposes an end-to-end voiceprint recognition method based on nonlinear stacked bidirectional LSTM. Firstly, the Fbank features are extracted from the original voice files for the input of the network model. Then, in view of the continuous and strong relevance of the voice signal, a bidirectional long and short-term memory network is constructed to process the voice data to extract deep features. In order to further enhance the nonlinear expression ability of the network, stacking multi-layer bidirectional LSTM layer and multi-layer nonlinear layer are used to extract the deeper abstract features of the speech signal. Finally, the SGD optimizer is used to optimize the training mode. The experimental results show that the proposed method can make full use of the characteristics of the speech sequence signal and has strong time series comprehensiveness and nonlinear expression ability. The constructed model has strong integrity and better recognition effect than GRU and LSTM models.

Keywords:	voiceprint recognition end to end sequential characteristic long short-term memory stacked network nonlinear
本文献已被万方数据等数据库收录！
	点击此处可从《计算机与现代化》浏览原始摘要信息
	点击此处可从《计算机与现代化》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏