首页 | 本学科首页   官方微博 | 高级检索  
     

基于卷积神经网络的语种识别系统
引用本文:金马,宋彦,戴礼荣.基于卷积神经网络的语种识别系统[J].数据采集与处理,2019,34(2):322-330.
作者姓名:金马  宋彦  戴礼荣
作者单位:中国科学技术大学语音及语言信息处理国家工程实验室,合肥,230027
基金项目:国家自然科学基金U1613211国家自然科学基金(U1613211)资助项目。
摘    要:从给定语音中提取有效语音段表示是语种识别的关键点。近年来深度学习在语种识别应用中有重要的进展,通过深度神经网络可以提取音素相关特征,并有效提升系统性能。基于深度学习的端对端语种识别系统也表现出其优异的识别性能。本文针对语种识别任务提出了基于卷积神经网络的端对端语种识别系统,利用神经网络强大的特征提取能力及区分性建模能力,提取具有语种区分性的基本单元,再通过池化层得到有效语音段表示,最后输入全连接层得到识别结果。实验表明,在NIST LRE 2009数据集上,相比于现阶段国际主流语种识别系统,提出的系统在30 s,10 s和3 s等语音段上错误率分别相对下降了1.35%,12.79%和29.84%,且平均错误代价在3种时长上均相对下降30%以上。

关 键 词:语种识别  卷积神经网络  语音段表示  语种区分性基本单元  端对端机制
收稿时间:2017/3/22 0:00:00
修稿时间:2017/5/25 0:00:00

Language Identification Based on Convolutional Neural Network
Jin M,Song Yan,Dai Lirong.Language Identification Based on Convolutional Neural Network[J].Journal of Data Acquisition & Processing,2019,34(2):322-330.
Authors:Jin M  Song Yan  Dai Lirong
Affiliation:National Engineering Laboratory of Speech and Language Information Processing, University of Science and Technology of China, Hefei, 230027, China
Abstract:A key problem of language identification (LID) is how to design effective representations which are specific to language information. Recent advances in deep neural networks (DNNs) have led to significant improvements in language identification. The acoustic feature extracted from a structured DNN which is discriminative to phoneme or tri-phone states can significantly improve the performance. End-to-end schemes also show its strong capability of modelling in recent years. A novel end-to-end convolutional neural network (CNN) LID system is proposed, called language identification network (LID-net), taking advantage of neural networks (NNs) with the capability in feature extraction and discriminative modelling, which can extract units that discriminant to languages, and we call them LID-senones, thus can extract effective utterance representation with pooling layer. Evaluations on NIST LRE 2009 show improved performance compared to current state-of-the-art deep bottleneck feature with total variability (DBF-TV) method, can achieve 1.35%, 12.79% and 29.84% relative equal error rate (EER) improvement on 30, 10 and 3 s utterances and receive over 30% relative gain in Cavg on all durations.
Keywords:language identification  convolutional neural network  utterance representation  language identification (LID)-senone  end-to-end scheme
点击此处可从《数据采集与处理》浏览原始摘要信息
点击此处可从《数据采集与处理》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号