一种采用机器学习的氦语音识别方法 A helium speech recognition method using machine learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种采用机器学习的氦语音识别方法

引用本文：	李冬梅,李明,郭莉莉,张士兵.一种采用机器学习的氦语音识别方法[J].电讯技术,2022(9).

作者姓名：	李冬梅李明郭莉莉张士兵

作者单位：	南通大学信息科学技术学院，江苏南通226019;南通大学信息科学技术学院，江苏南通226020;南通大学信息科学技术学院，江苏南通226021;南通大学信息科学技术学院，江苏南通226022

基金项目：	国家自然科学基金资助项目(61871241)；江苏省科研与实践创新计划项目(KYCX20_2828)

摘要：	为了解决传统氦语音处理技术存在的处理速度慢、计算复杂、操作困难等问题，提出了一种采用机器学习的氦语音识别方法，通过深层网络学习高维信息、提取多种特征，不但解决了过拟合问题，同时也具备了字错率(Word Error Rate,WER)低、收敛速度快的优点。首先自建氦语音孤立词和连续氦语音数据库，对氦语音数据预处理，提取的语音特征主要包括共振峰特征、基音周期特征和FBank(Filter Bank)特征。之后将语音特征输入到由深度卷积神经网络(Deep Convolutional Neural Network,DCNN)和连接时序分类(Connectionist Temporal Classification,CTC)组成的声学模型进行语音到拼音的建模，最后应用Transformer语言模型得到汉字输出。提取共振峰特征、基音周期特征和FBank特征的氦语音孤立词识别模型相比于仅提取FBank特征的识别模型的WER降低了7.91%，连续氦语音识别模型的WER降低了14.95%。氦语音孤立词识别模型的最优WER为1.53%，连续氦语音识别模型的最优WER为36.89%。结果表明，所提方法可有效识别氦语音。
关键词：	氦语音语音识别机器学习深度卷积神经网络（DCNN) 连接时序分类(CTC)
A helium speech recognition method using machine learning

LI Dongmei,LI Ming,GUO Lili,ZHANG Shibing.A helium speech recognition method using machine learning[J].Telecommunication Engineering,2022(9).

Authors:	LI Dongmei LI Ming GUO Lili ZHANG Shibing

Abstract:	In traditional helium speech processing techniques,the helium speech processing is slow in speed,complex in computation and difficult in operation.In order to solve these problems,a helium speech recognition method based on machine learning is proposed.With learning of high dimensional and extracting multiple features based on deep network,the method not only solves the problem of over-fitting,but also has the advantages of low word error rate(WER) and fast convergence speed.In the method,the isolated word helium speech database and continuous helium speech database are established,the helium speech is preprocessed to extract the phonetic characteristics including the formant features,pitch and Filter Bank(FBank) features.Then,the phonetic features are input into the acoustic model,which consists of deep convolution neural network(DCNN) and the connectionist temporal classification(CTC),for speech-to-pinyin modeling.Finally,Transformer language model is applied to get Chinese word output.The WER of the helium speech isolated word recognition model extracting formant features,pitch and FBank features,is 7.91% lower than that of the recognition model only extracting FBank features.The WER of the continuous helium speech recognition model is reduced by 14.95%.The optimal WER of helium speech isolated word recognition model is 1.53%,and the optimal WER of continuous helium speech recognition model is 36.89%.The results show that the proposed method can recognize the helium speech effectively.

Keywords:

	点击此处可从《电讯技术》浏览原始摘要信息
	点击此处可从《电讯技术》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏