A bilevel framework for joint optimization of session compensation and classification for speaker identification期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A bilevel framework for joint optimization of session compensation and classification for speaker identification

Affiliation:	1. School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China;2. School of Computer Science and Technology, Harbin Institute of Technology, Weihai, Shandong 264209, China;3. School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, Heilongjiang 150080, China

Abstract:	The i-vector framework based system is one of the most popular systems in speaker identification (SID). In this system, session compensation is usually employed first and then the classifier. For any session-compensated representation of i-vector, there is a corresponding identification result, so that both the stages are related. However, in current SID systems, session compensation and classifier are usually optimized independently. An incomplete knowledge about the session compensation to the identification task may lead to involving uncertainties. In this paper, we propose a bilevel framework to jointly optimize session compensation and classifier to enhance the relationship between the two stages. In this framework, we use the sparse coding (SC) to obtain the session-compensated feature by learning an overcomplete dictionary, and employ the softmax classifier and support vector machine (SVM) in classifying respectively. Moreover, we present a joint optimization of the dictionary and classifier parameters under a discriminative criterion for classifier with conditions for SC. In addition, the proposed methods are evaluated on the King-ASR-010, VoxCeleb and RSR2015 databases. Compared with typical session compensation techniques, such as linear discriminant analysis (LDA) and nonparametric discriminant analysis (NDA), our methods can be more robust to complex session variability. Moreover, compared with the typical classifiers in i-vector framework, i.e. the cosine distance scoring (CDS) and probabilistic linear discriminant analysis (PLDA), our methods can be more suitable for SID (multiclass task).

Keywords:	Speaker identification Session compensation Joint optimization Bilevel framework
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏