Improved automatic speech recognition through speaker normalization期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Improved automatic speech recognition through speaker normalization

Affiliation:	1. Aragon Institute for Engineering Research (I3A), University of Zaragoza, Zaragoza, Spain;2. \nFaculty of Science and Technology, University of the Basque Country (UPV/EHU), Leioa, Spain;1. University Grenoble Alps, CNRS, Grenoble INP, LIG, Grenoble F-38000, France;2. LIM Laboratory, Sidi Mohamed Ben Abdellah University, Faculty of Sciences Dhar el Mahraz, Fez, Morocco;1. Neutron Scattering Science Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA;2. Electrochemical Energy Laboratory & Materials Science and Engineering Program, The University of Texas at Austin, Austin, TX 78712, USA;3. Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA;1. Speech and Audio Research Laboratory, Queensland University of Technology, Brisbane, Australia;2. Center for Language and Speech Technology, Radboud University Nijmegen, Netherlands

Abstract:	In this paper, speaker adaptive acoustic modeling is investigated by using a novel method for speaker normalization and a well known vocal tract length normalization method. With the novel normalization method, acoustic observations of training and testing speakers are mapped into a normalized acoustic space through speaker-specific transformations with the aim of reducing inter-speaker acoustic variability. For each speaker, an affine transformation is estimated with the goal of reducing the mismatch between the acoustic data of the speaker and a set of target hidden Markov models. This transformation is estimated through constrained maximum likelihood linear regression and then applied to map the acoustic observations of the speaker into the normalized acoustic space.Recognition experiments made use of two corpora, the first one consisting of adults’ speech, the second one consisting of children’s speech. Performing training and recognition with normalized data resulted in a consistent reduction of the word error rate with respect to the baseline systems trained on unnormalized data. In addition, the novel method always performed better than the reference vocal tract length normalization method adopted in this work.When unsupervised static speaker adaptation was applied in combination with each of the two speaker normalization methods, a different behavior was observed on the two corpora: in one case performance became very similar while in the other case the difference remained significant.

Keywords:
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏