首页 | 本学科首页   官方微博 | 高级检索  
     

基于深度双向LSTM网络的说话人识别
引用本文:王华朋.基于深度双向LSTM网络的说话人识别[J].计算机工程与设计,2020,41(6):1768-1772.
作者姓名:王华朋
作者单位:中国刑事警察学院声像资料检验技术系,辽宁沈阳110854
基金项目:国家重点研发计划;重点实验室开放基金;国家社会科学基金;重庆市高校刑事科学技术重点实验室(西南政法大学)开放基金;软科学基金项目;公安部公安理论;辽宁省重点研发计划基金项目
摘    要:为进一步提高说话人识别的准确率,提出一种基于深度双向长短时记忆(long short-term memory,LSTM)网络的说话人识别方法,实现文本无关端到端的说话人身份识别。双向利用语音的序列数据,通过记忆单元,增强上下层之间的联系,提高对语音序列数据的分类能力。实验结果表明,在实验室环境下,对5 s时长的短语音,正确识别率达到97.92%,对噪声干扰具有良好的鲁棒性。该方法能学习语音序列信号特征,应用序列变化信息,可有效进行说话人识别。

关 键 词:长短时记忆  端到端  说话人识别  深度学习  循环神经网络

Speaker recognition based on deep bidirectional LSTM network
WANG Hua-peng.Speaker recognition based on deep bidirectional LSTM network[J].Computer Engineering and Design,2020,41(6):1768-1772.
Authors:WANG Hua-peng
Affiliation:(Video and Audio Material Examination Department,Criminal Investigation Police University of China,Shenyang 110854,China)
Abstract:To improve the accuracy of speaker recognition further,a speaker recognition method based on deep bidirectional LSTM network was proposed to realize end to end text-independent speaker recognition,which learnt long-term dependencies between time steps of voice sequence data in both forward and backward directions and enhanced relation between upper and lower layers through memory unit to improve the discriminant performance for voice data.Experimental results indicate that,the proposed network has 97.92%correct recognition rate for audio files with 5 s duration recorded in laboratory environment,and has good robustness against noise interference.In conclusion,the proposed method can learn the sequence features of speech and apply the changing information between sequences to effectively discriminate speakers by their voices.
Keywords:LSTM  end to end  speaker recognition  deep learning  recurrent neural networks
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号