首页 | 本学科首页   官方微博 | 高级检索  
     

基于深度自编码网络语音识别噪声鲁棒性研究
引用本文:黄丽霞,王亚楠,张雪英,王洪翠.基于深度自编码网络语音识别噪声鲁棒性研究[J].计算机工程与应用,2017,53(13):49-54.
作者姓名:黄丽霞  王亚楠  张雪英  王洪翠
作者单位:1.太原理工大学 信息工程学院,太原 030024 2.天津大学 计算机科学与技术学院,天津 300072
摘    要:为了解决传统径向基(Radial basis function,RBF)神经网络在语音识别任务中基函数中心值和半径随机初始化的问题,从人脑对语音感知的分层处理机理出发,提出利用大量无标签数据初始化网络参数的无监督预训练方式代替传统随机初始化方法,使用深度自编码网络作为语音识别的声学模型,分析梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)和基于Gammatone听觉滤波器频率倒谱系数(Gammatone Frequency Cepstrum Coefficient,GFCC)下非特定人小词汇量孤立词的抗噪性能。实验结果表明,深度自编码网络在MFCC特征下较径向基神经网络表现出更优越的抗噪性能;而与经典的MFCC特征相比,GFCC特征在深度自编码网络下平均识别率相对提升1.87%。

关 键 词:语音识别  鲁棒性  深度自编码网络  GFCC特征  MFCC特征  

Research on noise robustness of speech recognition based on deep auto-encoder neural network
HUANG Lixia,WANG Yanan,ZHANG Xueying,WANG Hongcui.Research on noise robustness of speech recognition based on deep auto-encoder neural network[J].Computer Engineering and Applications,2017,53(13):49-54.
Authors:HUANG Lixia  WANG Yanan  ZHANG Xueying  WANG Hongcui
Affiliation:1.College of Information Engineering, Taiyuan University of Technology, Taiyuan 030024, China 2.College of Computer Science and Technology, Tianjin University, Tianjin 300072, China
Abstract:To solve the problem of the center and the radius determined by randomly in the speech recognition tasks based on traditional Radial Basis Function(RBF) neural network, an unsupervised pre-training method which uses a large number of unlabeled data to initialize the network parameters is proposed to replace the traditional random initialization method based on the layered mechanism of human brain on speech recognition. This paper introduces the Deep Auto-Encoder(DAE) neural network as acoustical model and further analyzes robustness of speaker-independent isolated speech recognition on small size vocabulary database. The experimental results show that DAE outperforms RBF with MFCC(Mel Frequency Cepstrum Coefficient) feature extraction. In addition, compared to MFCC, GFCC(Gammatone Frequency Cepstrum Coefficient) gives more attribution on anti-noise property with a relative accuracy improvement of 1.87% in collaborate with DAE network.
Keywords:speech recognition  robustness  Deep Auto-Encoder(DAE) neural network  Gammatone Frequency Cepstrum Coefficient(GFCC)  Mel Frequency Cepstrum Coefficient(MFCC)  
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号