基于CRNN混合神经网络的多语种识别 Multilingual recognition based on CRNN hybrid neural network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于CRNN混合神经网络的多语种识别

引用本文：	王瑶,龙华,邵玉斌,杜庆治,王延凯.基于CRNN混合神经网络的多语种识别[J].光电子．激光,2022,33(6):620-628.

作者姓名：	王瑶龙华邵玉斌杜庆治王延凯

作者单位：	昆明理工大学信息工程与自动化学院,云南昆明 650500,昆明理工大学信息工程与自动化学院,云南昆明 650500,昆明理工大学信息工程与自动化学院,云南昆明 650500,昆明理工大学信息工程与自动化学院,云南昆明 650500,昆明理工大学信息工程与自动化学院,云南昆明 650500

基金项目：	国家自然科学基金(61761025)资助项目

摘要：	在语种识别过程中,为提取语音信号中的空间特征以及时序特征,从而达到提高多语种识别准确率的目的,提出了一种利用卷积循环神经网络(convolutional recurrent neural network,CRNN)混合神经网络的多语种识别模型。该模型首先提取语音信号的声学特征；然后将特征输入到卷积神经网络(convolutional neural network,CNN) 提取低维度的空间特征；再通过空间金字塔池化层(spatial pyramid pooling layer,SPP layer) 对空间特征进行规整,得到固定长度的一维特征；最后将其输入到循环神经网络(recurrenrt neural network,CNN) 来判别语种信息。为验证模型的鲁棒性,实验分别在3个数据集上进行,结果表明:相比于传统的CNN和RNN,CRNN混合神经网络对不同数据集的语种识别准确率均有提高,其中在8语种数据集中时长为5 s的语音上最为明显,分别提高了 5.3% 和6.1%。
关键词：	语种识别卷积循环神经网络混合神经网络卷积神经网络循环神经网络
收稿时间：	2021/9/6 0:00:00
修稿时间：	2021/9/28 0:00:00
Multilingual recognition based on CRNN hybrid neural network

WANG Yao,LONG Hu,SHAO Yubin,DU Qingzhi and WANG Yankai.Multilingual recognition based on CRNN hybrid neural network[J].Journal of Optoelectronics·laser,2022,33(6):620-628.

Authors:	WANG Yao LONG Hu SHAO Yubin DU Qingzhi and WANG Yankai

Affiliation:	Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming,Yunnan 650500,China,Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming,Yunnan 650500,China,Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming,Yunnan 650500,China,Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming,Yunnan 650500,China and Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming,Yunnan 650500,China

Abstract:	In the process of language recognition,a m ultilingual recognition model using convolutional recurrent neural network (CRNN) hybrid neural network is proposed to extract spatial features and temporal features of speech signals and improve the accuracy of multilingual recognition.The model firstly extracts the acoustic features of speech signals.Then the features are input into the convolutional neural networ k (CNN) to extract the low- dimensional spatial features.Next the spatial features are structured through t he spatial pyramid pooling layer (SPP layer),and the fixed length one-dimensional features are obtained.Fina lly,it is input into the recurrent neural network (RNN) to identify the language information.To test and verif y the robustness of the model,experiments are conducted on three data sets,the results show that compared with the conventional convolution neural network and recurrent neural network,CRNN hybrid neur al network language recognition accuracy of different data sets are increased,the data se ts in eight languages with time about 5 s have the voice of the most obvious,which are increased by 5.3% and 6.1 % respectively.

Keywords:	language recognition convolutional recurrent neural network (CRNN) hybrid neural network convolutional neural net work (CNN) recurrent neural network (RNN)

	点击此处可从《光电子．激光》浏览原始摘要信息
	点击此处可从《光电子．激光》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏