首页 | 本学科首页   官方微博 | 高级检索  
     

深度神经网络的语音深度特征提取方法
引用本文:李涛,曹辉,郭乐乐.深度神经网络的语音深度特征提取方法[J].声学技术,2018,37(4):367-371.
作者姓名:李涛  曹辉  郭乐乐
作者单位:陕西师范大学物理学与信息技术学院
基金项目:国家自然科学基金资助(1202020368、11074159、11374199)。
摘    要:为了提升连续语音识别系统性能,将深度自编码器神经网络应用于语音信号特征提取。通过堆叠稀疏自编码器组成深度自编码器(Deep Auto-Encoding,DAE),经过预训练和微调两个步骤提取语音信号的本质特征,使用与上下文相关的三音素模型,以音素错误率大小为系统性能的评判标准。仿真结果表明相对于传统梅尔频率倒谱系数(Mel-Frequency Cepstral Coefficient,MFCC)特征以及优化后的MFCC特征,基于深度自编码器提取的深度特征更具优越性。

关 键 词:语音识别  深度自编码器  梅尔频率倒谱系数
收稿时间:2017/8/4 0:00:00
修稿时间:2017/10/18 0:00:00

Speech deep feature extraction method for deep neural network
LI Tao,CAO Hui and GUO Le-le.Speech deep feature extraction method for deep neural network[J].Technical Acoustics,2018,37(4):367-371.
Authors:LI Tao  CAO Hui and GUO Le-le
Affiliation:School of Physics and Information Technology, Shaanxi Normal University, Xian, 710100, Shaanxi, China,School of Physics and Information Technology, Shaanxi Normal University, Xian, 710100, Shaanxi, China and School of Physics and Information Technology, Shaanxi Normal University, Xian, 710100, Shaanxi, China
Abstract:In order to improve the performance of continuous speech recognition system, this paper applies the deep auto-encoder neural network to the speech signal feature extraction process. The deep auto-encoder is formed by stacking sparsely the auto-encoder. The neural networks based on deep learning introduce the greedy layer-wise learning algorithm by pre-training and fine-tuning. The context-dependent three-phoneme model is used in the continuous speech recognition system, and the phoneme error rate is taken as the criterion of system performance. The simulation results show that the deep auto-encoder based deep feature is more advantageous than the traditional MFCC features and optimized MFCC features.
Keywords:speech recognition  Deep Auto-Encoding (DAE)  Mel-Frequency Cepstral Coefficient (MFCC)
本文献已被 CNKI 等数据库收录!
点击此处可从《声学技术》浏览原始摘要信息
点击此处可从《声学技术》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号