首页 | 本学科首页   官方微博 | 高级检索  
     


Environment adaptation for robust speaker verification by cascading maximum likelihood linear regression and reinforced learning
Affiliation:1. Department of Electronic and Information Engineering, Center for Multimedia Signal Processing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong;2. Department of Electrical Engineering, Princeton University, United States;1. Department of Industrial Engineering, Faculty of Engineering, Bu-Ali Sina University, Hamedan, Iran;2. Department of Industrial Engineering, Amirkabir University of Technology, 424 Hafez Avenue, 15916-34311 Tehran, Iran;1. Heisley Family Professor of Global Manufacturing, McDonough School of Business, Georgetown University, Washington, DC 20057, USA;2. Vlerick Business School and Ghent University, Reep 1, 9000 Gent, Belgium;3. Singapore Management University, 81 Victoria Street, 188065, Singapore;1. Industrial and Systems Engineering Department, Federal University of Santa Catarina, Campus UFSC, 88010-970, Florianópolis, SC, Brazil;2. BIBA - Bremer Institut für Produktion und Logistik GmbH, University of Bremen, Hochschulring 20, 28359, Bremen, Germany;3. University of Bremen, Bibliothekstraße 1, 28359, Bremen, Germany
Abstract:In speaker verification over public telephone networks, utterances can be obtained from different types of handsets. Different handsets may introduce different degrees of distortion to the speech signals. This paper attempts to combine a handset selector with (1) handset-specific transformations, (2) reinforced learning, and (3) stochastic feature transformation to reduce the effect caused by the acoustic distortion. Specifically, during training, the clean speaker models and background models are firstly transformed by MLLR-based handset-specific transformations using a small amount of distorted speech data. Then reinforced learning is applied to adapt the transformed models to handset-dependent speaker models and handset-dependent background models using stochastically transformed speaker patterns. During a verification session, a GMM-based handset classifier is used to identify the most likely handset used by the claimant; then the corresponding handset-dependent speaker and background model pairs are used for verification. Experimental results based on 150 speakers of the HTIMIT corpus show that environment adaptation based on the combination of MLLR, reinforced learning and feature transformation outperforms CMS, Hnorm, Tnorm, and speaker model synthesis.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号