一种基于段级特征和自动标识的语言辨识算法 A Language Identification algorithm based on segmental feature and automatic tokenization期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于段级特征和自动标识的语言辨识算法

引用本文：	张文林,屈丹,李弼程,王波,王炳锡.一种基于段级特征和自动标识的语言辨识算法[J].信号处理,2008,24(4).

作者姓名：	张文林屈丹李弼程王波王炳锡

作者单位：	信息工程大学信息工程学院,河南省郑州市,450002

摘要：	本文研究了一种结合"声学信息"和"音素配位学信息"进行语言辨识的新算法,首先在预处理中对语音进行自动分段,在特征层上引入带有长时信息的段级特征参数--段级移位差分倒谱,在模型层上利用高斯混合模型(Gaussi-an Mixture Model,GMM)将语音信号自动标识为符号序列,进而引入多元语言模型(Multi-gram Language Model,MLM)来对"音素配位学信息"进行建模,最后将"GMM得分"和"MLM得分"送入后端多分类支持向量机模型得到最终识别结果.相关实验表明,新系统不需手工标识的语料,识别速度快,对OGI标准语料库中的五种语言获得了开集正识率为78.84%的结果.
关键词：	语言辨识移位差分倒谱段级特征参数高斯混合模型多元语言模型支持向量机
A Language Identification algorithm based on segmental feature and automatic tokenization

ZHANG Wen-lin,QU Dan,LI Bi-cheng,WANG Bo,WANG Bing-xi.A Language Identification algorithm based on segmental feature and automatic tokenization[J].Signal Processing,2008,24(4).

Authors:	ZHANG Wen-lin QU Dan LI Bi-cheng WANG Bo WANG Bing-xi

Abstract:	we present a new framework for language identification using acoustic and phonotactics information of specch.First,an automatic speech segmentation algorithm is performed in the preprocessing stage,then at the feature stage the segmental shift delta ceps- turm feature which carry long-term information is introduced,at the model stage a muhigram language model is developed based on the u- sing of traditional GMM for speech tokenization.A multi-class support vector machine is used for the backend classification.Experiment results demonstrate that the new system yields good performance in the language identification task of five languages in the OGI- TS database.

Keywords:	language identification shift delta cepstrum segmental parameter Gaussian mixture model multigram language model support vector machine
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏