Continuous speech recognition using linear dynamic models |
| |
Authors: | Tao Ma Sundararajan Srinivasan Georgios Lazarou Joseph Picone |
| |
Affiliation: | 1. Siri at Apple Inc, 2 Infinite Loop, mailstop 302-4APP, Cupertino, CA, 95014, USA 2. Nuance Communications Inc., 1198 East Arques Avenue, Sunnyvale, CA, 94085, USA 3. The New York City Transit Authority, 30-74 38th Street, Apt 1A, Astoria, New York, NY, 11103, USA 4. Department of Electrical and Computer Engineering, Temple University, 1947 North 12th Street, Philadelphia, PA, 19027, USA
|
| |
Abstract: | Hidden Markov models (HMMs) with Gaussian mixture distributions rely on an assumption that speech features are temporally uncorrelated, and often assume a diagonal covariance matrix where correlations between feature vectors for adjacent frames are ignored. A Linear Dynamic Model (LDM) is a Markovian state-space model that also relies on hidden state modeling, but explicitly models the evolution of these hidden states using an autoregressive process. An LDM is capable of modeling higher order statistics and can exploit correlations of features in an efficient and parsimonious manner. In this paper, we present a hybrid LDM/HMM decoder architecture that postprocesses segmentations derived from the first pass of an HMM-based recognition. This smoothed trajectory model is complementary to existing HMM systems. An Expectation-Maximization (EM) approach for parameter estimation is presented. We demonstrate a 13 % relative WER reduction on the Aurora-4 clean evaluation set, and a 13 % relative WER reduction on the babble noise condition. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|