首页 | 本学科首页   官方微博 | 高级检索  
     


A bilevel framework for joint optimization of session compensation and classification for speaker identification
Affiliation:1. School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China;2. School of Computer Science and Technology, Harbin Institute of Technology, Weihai, Shandong 264209, China;3. School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, Heilongjiang 150080, China
Abstract:The i-vector framework based system is one of the most popular systems in speaker identification (SID). In this system, session compensation is usually employed first and then the classifier. For any session-compensated representation of i-vector, there is a corresponding identification result, so that both the stages are related. However, in current SID systems, session compensation and classifier are usually optimized independently. An incomplete knowledge about the session compensation to the identification task may lead to involving uncertainties. In this paper, we propose a bilevel framework to jointly optimize session compensation and classifier to enhance the relationship between the two stages. In this framework, we use the sparse coding (SC) to obtain the session-compensated feature by learning an overcomplete dictionary, and employ the softmax classifier and support vector machine (SVM) in classifying respectively. Moreover, we present a joint optimization of the dictionary and classifier parameters under a discriminative criterion for classifier with conditions for SC. In addition, the proposed methods are evaluated on the King-ASR-010, VoxCeleb and RSR2015 databases. Compared with typical session compensation techniques, such as linear discriminant analysis (LDA) and nonparametric discriminant analysis (NDA), our methods can be more robust to complex session variability. Moreover, compared with the typical classifiers in i-vector framework, i.e. the cosine distance scoring (CDS) and probabilistic linear discriminant analysis (PLDA), our methods can be more suitable for SID (multiclass task).
Keywords:Speaker identification  Session compensation  Joint optimization  Bilevel framework
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号