首页 | 本学科首页   官方微博 | 高级检索  
     

基于CRNN混合神经网络的多语种识别
引用本文:王瑶,龙华,邵玉斌,杜庆治,王延凯.基于CRNN混合神经网络的多语种识别[J].光电子.激光,2022,33(6):620-628.
作者姓名:王瑶  龙华  邵玉斌  杜庆治  王延凯
作者单位:昆明理工大学 信息工程与自动化学院,云南 昆明 650500,昆明理工大学 信息工程与自动化学院,云南 昆明 650500,昆明理工大学 信息工程与自动化学院,云南 昆明 650500,昆明理工大学 信息工程与自动化学院,云南 昆明 650500,昆明理工大学 信息工程与自动化学院,云南 昆明 650500
基金项目:国家自然科学基金(61761025)资助项目
摘    要:在语种识别过程中,为提取语音信号中的空间特 征以及时序特征,从而达到提高多语 种识别准确率的目的,提出了一种利用卷积循环神经网络(convolutional recurrent neural network,CRNN)混合神经网络的多语种识别模型。该模型首先提 取语音信号的声学特征;然后将特征输入到卷积神经网络(convolutional neural network,CNN) 提取低维度的空间特征;再通过空 间金字塔池化层(spatial pyramid pooling layer,SPP layer) 对空间特征进行规整,得到固定长度的一维特征;最后将其输入到循环神经 网络(recurrenrt neural network,CNN) 来判别语种信息。为验证模型的鲁棒性,实验分别在3个数据集上进行,结果表明:相 比于传统的CNN和RNN,CRNN混合神经网络对不同数据集的语种识别 准确率均有提高,其中在8语种数据集中时长为5 s的语音上最为明显,分别提高了 5.3% 和6.1%。

关 键 词:语种识别  卷积循环神经网络混合神经网络  卷积神经网络  循环神经网络
收稿时间:2021/9/6 0:00:00
修稿时间:2021/9/28 0:00:00

Multilingual recognition based on CRNN hybrid neural network
WANG Yao,LONG Hu,SHAO Yubin,DU Qingzhi and WANG Yankai.Multilingual recognition based on CRNN hybrid neural network[J].Journal of Optoelectronics·laser,2022,33(6):620-628.
Authors:WANG Yao  LONG Hu  SHAO Yubin  DU Qingzhi and WANG Yankai
Affiliation:Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming,Yunnan 650500,China,Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming,Yunnan 650500,China,Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming,Yunnan 650500,China,Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming,Yunnan 650500,China and Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming,Yunnan 650500,China
Abstract:In the process of language recognition,a m ultilingual recognition model using convolutional recurrent neural network (CRNN) hybrid neural network is proposed to extract spatial features and temporal features of speech signals and improve the accuracy of multilingual recognition.The model firstly extracts the acoustic features of speech signals.Then the features are input into the convolutional neural networ k (CNN) to extract the low- dimensional spatial features.Next the spatial features are structured through t he spatial pyramid pooling layer (SPP layer),and the fixed length one-dimensional features are obtained.Fina lly,it is input into the recurrent neural network (RNN) to identify the language information.To test and verif y the robustness of the model,experiments are conducted on three data sets,the results show that compared with the conventional convolution neural network and recurrent neural network,CRNN hybrid neur al network language recognition accuracy of different data sets are increased,the data se ts in eight languages with time about 5 s have the voice of the most obvious,which are increased by 5.3% and 6.1 % respectively.
Keywords:language recognition  convolutional recurrent neural network (CRNN) hybrid neural network  convolutional neural net work (CNN)  recurrent neural network (RNN)
点击此处可从《光电子.激光》浏览原始摘要信息
点击此处可从《光电子.激光》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号