基于DNN的低资源语音识别特征提取技术 Deep Neural Network Based Feature Extraction for Low-resource Speech Recognition期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于DNN的低资源语音识别特征提取技术

引用本文：	秦楚雄,张连海.基于DNN的低资源语音识别特征提取技术[J].自动化学报,2017,43(7):1208-1219.

作者姓名：	秦楚雄张连海

作者单位：	1.信息工程大学信息系统工程学院郑州 450001

基金项目：	国家自然科学基金（61673395，61302107，61403415）资助

摘要：	针对低资源训练数据条件下深层神经网络（Deep neural network，DNN）特征声学建模性能急剧下降的问题，提出两种适合于低资源语音识别的深层神经网络特征提取方法.首先基于隐含层共享训练的网络结构，借助资源较为丰富的语料实现对深层瓶颈神经网络的辅助训练，针对BN层位于共享层的特点，引入Dropout，Maxout，Rectified linear units等技术改善多流训练样本分布不规律导致的过拟合问题，同时缩小网络参数规模、降低训练耗时；其次为了改善深层神经网络特征提取方法，提出一种基于凸非负矩阵分解（Convex-non-negative matrix factorization，CNMF）算法的低维高层特征提取技术，通过对网络的权值矩阵分解得到基矩阵作为特征层的权值矩阵，然后从该层提取一种新的低维特征.基于Vystadial 2013的1小时低资源捷克语训练语料的实验表明，在26.7小时的英语语料辅助训练下，当使用Dropout和Rectified linear units时，识别率相对基线系统提升7.0%；当使用Dropout和Maxout时，识别率相对基线系统提升了12.6%，且网络参数数量相对其他系统降低了62.7%，训练时间降低了25%.而基于矩阵分解的低维特征在单语言训练和辅助训练的两种情况下都取得了优于瓶颈特征（Bottleneck features，BNF）的识别率，且在辅助训练的情况下优于深层神经网络隐马尔科夫识别系统，提升幅度从0.8%~3.4%不等.
关键词：	低资源语音识别深层神经网络瓶颈特征凸非负矩阵分解
收稿时间：	2015-10-16
Deep Neural Network Based Feature Extraction for Low-resource Speech Recognition

QIN Chu-Xiong,ZHANG Lian-Hai.Deep Neural Network Based Feature Extraction for Low-resource Speech Recognition[J].Acta Automatica Sinica,2017,43(7):1208-1219.

Authors:	QIN Chu-Xiong ZHANG Lian-Hai

Affiliation:	1.Department of Information and System Engineering, Information Engineering University, Zhengzhou 450001

Abstract:	To alleviate the performance degradation that deep neural network (DNN) based features suffer from transcribed training data is insufficient, two deep neural network based feature extraction approaches to low-resource speech recognition are proposed. Firstly, some high-resource corpuses are used to help train a bottleneck deep neural network using a shared-hidden-layer network structure and dropout, maxout, and rectified linear units methods are exploited in order to enhance the training effect and reduce the number of network parameters, so that the overfitting problem by irregular distributions of multi-stream training samples can be solved and multilingual training time can be reduced. Secondly, a convex-non-negative matrix factorization (CNMF) based low-dimensional high-level feature extraction approach is proposed. The weight matrix of hidden layer is factorized to obtain the basis matrix as the weight matrix of the newly formed feature-layer, from which a new type of feature is extracted. Experiments on 1 hour's Vystadial 2013 Czech low-resource training data show that with the help of 26.7 hours' English training data, the recognition system obtains a 7.0% relative word error rate reduction from the baseline system when dropout and rectified linear units are applied, and obtains a 12.6% relative word error rate reduction while reduces 62.7% relative network parameters and 25% training time as compared to other proposed systems when dropout and maxout are applied. Matrix factorization based features perform better than bottleneck features (BNF) in both low-resource monolingual and multilingual training situations. They also gain better word accuracies than the state-of-art deep neural network hidden Markov models hybrid systems, by from 0.8% to 3.4%.

Keywords:	ow-resource speech recognition deep neural network (DNN) bottleneck features (BNF) convexnonnegative matrix factorization (CNMF)

	点击此处可从《自动化学报》浏览原始摘要信息
	点击此处可从《自动化学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏