首页 | 本学科首页   官方微博 | 高级检索  
     

低资源语音识别中融合多流特征的卷积神经网络声学建模方法
引用本文:秦楚雄,张连海.低资源语音识别中融合多流特征的卷积神经网络声学建模方法[J].计算机应用,2016,36(9):2609-2615.
作者姓名:秦楚雄  张连海
作者单位:信息工程大学 信息系统工程学院, 郑州 450001
基金项目:国家自然科学基金资助项目(61175017,61403415)。
摘    要:针对卷积神经网络(CNN)声学建模参数在低资源训练数据条件下的语音识别任务中存在训练不充分的问题,提出一种利用多流特征提升低资源卷积神经网络声学模型性能的方法。首先,为了在低资源声学建模过程中充分利用有限训练数据中更多数量的声学特征,先对训练数据提取几类不同的特征;其次,对每一类类特征分别构建卷积子网络,形成一个并行结构,使得多特征数据在概率分布上得以规整;然后通过在并行卷积子网络之上加入全连接层进行融合,从而得到一种新的卷积神经网络声学模型;最后,基于该声学模型搭建低资源语音识别系统。实验结果表明,并行卷积层子网络可以将不同特征空间规整得更为相似,且该方法相对传统多特征拼接方法和单特征CNN建模方法分别提升了3.27%和2.08%的识别率;当引入多语言训练时,该方法依然适用,且识别率分别相对提升了5.73%和4.57%。

关 键 词:低资源语音识别  卷积神经网络  特征规整  多流特征  
收稿时间:2016-02-02
修稿时间:2016-03-29

Acoustic modeling approach of multi-stream feature incorporated convolutional neural network for low-resource speech recognition
QIN Chuxiong,ZHANG Lianhai.Acoustic modeling approach of multi-stream feature incorporated convolutional neural network for low-resource speech recognition[J].journal of Computer Applications,2016,36(9):2609-2615.
Authors:QIN Chuxiong  ZHANG Lianhai
Affiliation:School of Information System Engineering, Information Engineering University, Zhengzhou Henan 450001, China
Abstract:Aiming at solving the problem of insufficient training of Convolutional Neural Network (CNN) acoustic modeling parameters under the low-resource training data condition in speech recognition tasks, a method for improving CNN acoustic modeling performance in low-resource speech recognition was proposed by utilizing multi-stream features. Firstly, in order to make use of enough acoustic information of features from limited data to build acoustic model, multiple features of low-resource data were extracted from training data. Secondly, convolutional subnetworks were built for each type of features to form a parallel structure, and to regularize distributions of multiple features. Then, some fully connected layers were added above the parallel convolutional subnetworks to incorporate multi-stream features, and to form a new CNN acoustic model. Finally, a low-resource speech recognition system was built based on this acoustic model. Experimental results show that parallel convolutional subnetworks normalize different feature spaces more similar, and it gains 3.27% and 2.08% recognition accuracy improvement respectively compared with traditional multi-feature splicing training approach and baseline CNN system. Furthermore, when multilingual training is introduced, the proposed method is still applicable, and the recognition accuracy is improved by 5.73% and 4.57% respectively.
Keywords:low-resource speech recognition                                                                                                                        Convolutional Neural Network (CNN)                                                                                                                        feature normalization                                                                                                                        multi-stream feature
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号