基于Mellin变换的语音新特征与频率归正说话人自适应技术 SPEAKER NORMALIZATION AND NOVEL ROBUST SPEECH FEATURE BASED ON MELLIN TRANSFORM期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Mellin变换的语音新特征与频率归正说话人自适应技术

引用本文：	陈景东,徐波,黄泰翼.基于Mellin变换的语音新特征与频率归正说话人自适应技术[J].自动化学报,2000,26(4):478-484.

作者姓名：	陈景东徐波黄泰翼

作者单位：	1.中国科学院自动化研究所模式识别国家实验室

摘要：	为了减小由于说话人之间声道形状的差异而引起的非特定人语音识别系统性能的下降,研究了两种方法,一种是基于最大似然估计的频率归正说话人自适应方法,另一种是基于Mellin变换的语音新特征.在非特定人孤立词语音识别系统上的初步实验表明,这两种方法都可以提高系统对不同说话人的鲁棒性,相比之下,基于Mellin变换的语音新特征具有更好的性能,它不仅提高了系统对不同话者的识别性能,而且也使系统对不同话者的误识率的离散程度大大减小.
关键词：	Mellin变换频率归正自适应
收稿时间：	1998-8-17
修稿时间：	1998年8月17日
SPEAKER NORMALIZATION AND NOVEL ROBUST SPEECH FEATURE BASED ON MELLIN TRANSFORM

Chen Jingdong,Xu Bo,Huang Taiyi.SPEAKER NORMALIZATION AND NOVEL ROBUST SPEECH FEATURE BASED ON MELLIN TRANSFORM[J].Acta Automatica Sinica,2000,26(4):478-484.

Authors:	Chen Jingdong Xu Bo Huang Taiyi

Affiliation:	1.Natronal Laboratory of Pattern Recognition,Institute of Automation,Chinese Academy of Sciences,P.O.Box 2728,Beijing

Abstract:	One major source of interspeaker variability in speaker independent (SI) speech recognition is the variation of the vocal tract shape, especially the vocal tract length (VTL) among individual speakers. If the model of the vocal tract is assumed to be a uniform tube with length L , then the formant frequencies of utterances of a given sound are inversely proportional to L . Since the VTL can vary from approximately 13cm for females to over 18cm for males, formant center frequencies can vary by as much as 25% among speakers. This source of variability results in state of the art SI speech recognizers working poorly for outlier speakers whose vocal tract shapes differ significantly from those of speakers in the training set. In an effort to reduce the degradation in speech recognition performance caused by variation of the VTL among speakers, two methods are investigated in this paper. One is to remove the variability with a technique of speaker normalization. Another is to extract new feature based on the Mellin transform (MT). Because of the scale invariance property of the MT, the new feature is insensitive to variation of VTL among different speakers. Experiments show that both methods can improve the performance of an SI recognizer, while the latter approach is more effective than the former one.

Keywords:	Mellin transform frequency normalization adaptation
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《自动化学报》浏览原始摘要信息
	点击此处可从《自动化学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏