首页 | 本学科首页   官方微博 | 高级检索  
     

基于CNN-LSTM网络的声纹识别研究
引用本文:闫河,董莺艳,王鹏,罗成,李焕. 基于CNN-LSTM网络的声纹识别研究[J]. 计算机应用与软件, 2019, 36(4): 166-170
作者姓名:闫河  董莺艳  王鹏  罗成  李焕
作者单位:重庆理工大学计算机科学与工程学院 重庆400054;重庆理工大学两江人工智能学院 重庆400020;重庆理工大学计算机科学与工程学院 重庆400054
基金项目:国家自然科学基金;重庆市自然科学基金
摘    要:传统声纹识别方法过程复杂,模型识别准确率低,是声纹识别应用发展的关键问题。利用深度学习具有自主特征提取及分类的特点,结合卷积神经网络(CNN)和长短期记忆网络(LSTM),提出一种结合的网络模型学习声纹识别特征及对其进行身份认证。将原始语音转换为固定长度语谱图,顺序进入CNN、LSTM,结合网络进行训练以及声纹特征学习。通过对比CNN、LSTM以及DNN网络,验证CNN-LSTM网络在声纹识别中具有较少迭代次数情况下高准确率的特性。经实验结果可以得出,语音空间特征及时序特征均是声纹识别中重要的影响因素,实验中的CNN-LSTM网络模型准确率达到95.42%,损失低值达到0.097 3。该方法有利于实际声纹识别的应用。

关 键 词:声纹识别  CNN-LSTM网络  语谱图  时序特征

VOICEPRINT RECOGNITION BASED ON CNN-LSTM NETWORK
Yan He,Dong Yingyan,Wang Peng,Luo Cheng,Li Huan. VOICEPRINT RECOGNITION BASED ON CNN-LSTM NETWORK[J]. Computer Applications and Software, 2019, 36(4): 166-170
Authors:Yan He  Dong Yingyan  Wang Peng  Luo Cheng  Li Huan
Affiliation:(College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China;College of Artificial Intelligence, Chongqing University of Technology, Chongqing 400020, China)
Abstract:The traditional voiceprint recognition method is complex with low recognition accuracy, which is a key issue in the development of voiceprint recognition applications. In this paper, we used deep learning with autonomous feature extraction and classification, combining with convolutional neural network ( CNN) and long-term and short-term memory network ( LSTM). A combined network model was proposed to learn the features of voiceprint recognition and identity authentication. The original speech was converted into a fixed-length spectrogram, and sequentially entered into the combined network CNN and LSTM for training, and learning voiceprint feature. By comparing CNN, LSTM and DNN, We verified the high accuracy of the CNN-LSTM network in voiceprint recognition with fewer iterations. The experimental results show that the speech space features and time series features are important factors in voiceprint recognition. The accuracy of CNN-LSTM network model in the experiment reaches 95. 42%, and the loss value is 0. 0973. The method is benefical to the practical application of voiceprint recognition.
Keywords:Voiceprint recognition  CNN-LSTM  Network  Spectrogram  Timing features
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号