首页 | 本学科首页   官方微博 | 高级检索  
     


Data-Driven Temporal Filters and Alternatives to GMM in Speaker Verification
Affiliation:1. Oregon Graduate Institute of Science and Technology, Portland, Oregon;2. International Computer Science Institute, Berkeley, California;3. Indian Institute of Technology Madras, Chennai, India;1. Dept. of Signal Theory, Telematics and Communications, University of Granada, Granada, Spain;2. Dept. of Computer Science, University of Sheffield, Sheffield, UK
Abstract:Malayath, Narendranath, Hermansky, Hynek, Kajarekar, Sachin, and Yegnanarayana, B., Data-Driven Temporal Filters and Alternatives to GMM in Speaker Verification, Digital Signal Processing10(2000), 55–74.This paper discusses the research directions pursued jointly at the Anthropic Signal Processing Group of the Oregon Graduate Institute and at the Speech and Vision Laboratory of the Indian Institute of Technology Madras. Current methods for speaker verification are based on modeling the speaker characteristics using Gaussian mixture models (GMM). The performance of these systems significantly degrades if the target speakers use a telephone handset that is different from that used while training. Conventional methods for channel normalization include utterance-based mean subtraction (MS) and RelAtive SpecTrAl (RASTA) filtering. In this paper we introduce a novel method for designing filters that are capable of normalizing the variability introduced by different telephone handsets. The design of the filter is based on the estimated second-order statistics of handset variability. This filter is applied on the logarithmic energy outputs of Mel spaced filter banks. We also demonstrate the effectiveness of the proposed channel normalizing filter in improving speaker verification performance in mismatched conditions. GMM-based systems often use thousands of mixture components and hence require a large number of parameters to characterize each target speaker. In order to address this issue we propose an alternative to GMM for modeling speaker characteristics. The alternative is based on speaker-specific mapping and it relies on a speaker-independent representation of speech.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号